Overview
Questions
How can machine learning models accelerate adsorption energy calculations
What are the best practices for computing reliable adsorption energies using ML
Objectives
Understand the structure and purpose of the OC22 dataset
Use the UMA model to predict total energies of adsorbate-slab systems
Compute adsorption energies using thermodynamic cycles and reference states
Analyze trends in oxygen adsorption across late transition metals
Perform convergence tests for slab thickness and relaxation depth
Evaluate model predictions against DFT and literature data
The search for efficient electrocatalysts relies heavily on accurate predictions of how molecules bind to surfaces. The strength of this interaction is often quantified by the adsorption energy, a key descriptor in activity volcano plots and scaling relations. Traditionally, these energies are computed using density functional theory, a method that is accurate but computationally expensive.
The Open Catalyst Project addresses this bottleneck by releasing large datasets and pretrained machine learning interatomic potentials. The Open Catalyst 2022 (OC22) dataset was introduced to expand training data for oxide electrocatalysts, which are essential for the oxygen evolution reaction (OER). It contains 62,331 DFT relaxations and approximately 9.85 million single-point calculations across diverse oxide materials, adsorbates, and surface coverages.
Unlike OC20, which focused on predicting adsorption energies directly, OC22 emphasizes generalized total energy prediction. This enables applications beyond static binding, including molecular dynamics, phonon calculations, and modeling of reconstructed or charged surfaces.
In this tutorial, we use the Unified Multimodal Architecture (UMA) model, trained on OC20, OC22, and other FAIR chemistry datasets, to compute adsorption energies of atomic oxygen on Pt(111) and across a range of late transition metals. We follow a workflow analogous to DFT: build a surface, place an adsorbate, relax the system using ML, and compute the energy change relative to reference states.
We also perform convergence studies and compare our results to published DFT data.
Before we begin, we need to install the required packages and gain access to the UMA model, which is hosted on Hugging Face with gated access.
First, install the core packages from the FairChem suite
! pip install fairchem-core fairchem-data-oc fairchem-applications-cattsunami
Next, request access to the UMA model
You can authenticate using the CLI or by setting an environment variable
# :tags: [skip-execution]
# Option 1: CLI login
! huggingface-cli login
# Option 2: Set token in environment
import os
os.environ["HF_TOKEN"] = "your_hf_token_here"
Once authenticated, load the UMA-S-1 model for the OC20 total energy task
from __future__ import annotations
from fairchem.core import FAIRChemCalculator, pretrained_mlip
predictor = pretrained_mlip.get_predict_unit("uma-s-1")
calc = FAIRChemCalculator(predictor, task_name="oc20")
This model predicts RPBE-level total energies, allowing us to compute adsorption energies using thermodynamic cycles.
We begin by constructing a Pt(111) surface and placing an oxygen atom in the fcc hollow site. We use the experimental lattice constant by default, though you can substitute a DFT-optimized value if preferred.
from ase.build import add_adsorbate, fcc111
from ase.optimize import BFGS
slab = fcc111("Pt", size=(2, 2, 5), vacuum=20.0)
slab.pbc = True
adslab = slab.copy()
add_adsorbate(adslab, "O", height=1.2, position="fcc")
We now relax both the clean slab and the adsorbed system using the BFGS optimizer
slab.set_calculator(calc)
opt = BFGS(slab)
opt.run(fmax=0.05, steps=100)
slab_e = slab.get_potential_energy()
adslab.set_calculator(calc)
opt = BFGS(adslab)
opt.run(fmax=0.05, steps=100)
adslab_e = adslab.get_potential_energy()
To compute the adsorption energy, we must define a reference state for atomic oxygen. DFT performs poorly on gas-phase O₂, so we use a thermochemical cycle based on water formation
Formation of water from hydrogen and oxygen (experimental enthalpy change): \(\mathrm{H}_2 + \frac{1}{2}\mathrm{O}_2 \rightarrow \mathrm{H}_2\mathrm{O} \quad \Delta H = -3.03\, \text{eV}\)
Dissociation of atomic oxygen into half a molecule of oxygen gas (with given dissociation energy): \(\mathrm{O} \rightarrow \frac{1}{2} \mathrm{O}_2 \quad \Delta H = -2.58\, \text{eV}\)
We also use atomic reference energies from the OC20 dataset
atomic_reference_energies = {
"H": -3.477,
"N": -8.083,
"O": -7.204,
"C": -7.282
}
The adsorption energy is then
re1 = -3.03
re3 = -2.58
adsorption_energy = (adslab_e - slab_e
- atomic_reference_energies["O"]
+ re1 + re3)
print(f"Adsorption energy of O on Pt(111): {adsorption_energy:.3f} eV")
This gives a value of approximately -1.47 eV.
Xu and Kitchin reported a PBE-calculated adsorption energy of -4.264 eV for O + * → O* on Pt(111). Converting to the dissociative scale using the O₂ dissociation energy gives -1.684 eV.
Our ML prediction is about 0.21 eV higher. This difference is expected and primarily due to
These systematic shifts are common and can be corrected with a small set of DFT calculations if needed.
It is good practice to inspect the relaxed geometries
import matplotlib.pyplot as plt
from ase.visualize.plot import plot_atoms
fig, axs = plt.subplots(1, 2, figsize=(10, 4))
plot_atoms(slab, axs[0], rotation="0x,0y,0z")
plot_atoms(adslab, axs[1], rotation="0x,0y,0z")
axs[0].set_title("Clean Pt(111)")
axs[1].set_title("O on fcc site")
for ax in axs:
ax.set_axis_off()
plt.tight_layout()
plt.show()
We now extend our analysis to a set of late transition metals: Cu, Ag, Pd, Pt, Rh, and Ir. We compare oxygen adsorption in fcc and hcp sites at 0.25 ML coverage, using structures and energies from Xu and Kitchin.
Load the reference data
import json
with open("energies.json") as f:
edata = json.load(f)
with open("structures.json") as f:
sdata = json.load(f)
For each metal and site, we reconstruct the structure, relax it using UMA, and compute the adsorption energy
data = {"fcc": [], "hcp": []}
refdata = {"fcc": [], "hcp": []}
for metal in ["Cu", "Ag", "Pd", "Pt", "Rh", "Ir"]:
for site in ["fcc", "hcp"]:
entry = sdata[metal]["O"][site]["0.25"]
adslab = Atoms(entry["symbols"], positions=entry["pos"], cell=entry["cell"], pbc=True)
slab = adslab[adslab.arrays["numbers"] == adslab.arrays["numbers"][0]]
slab.set_calculator(calc)
BFGS(slab).run(fmax=0.05)
adslab.set_calculator(calc)
BFGS(adslab).run(fmax=0.05)
re2 = (adslab.get_potential_energy()
- slab.get_potential_energy()
- atomic_reference_energies["O"])
energy = re2 + re1 + re3
data[site].append(energy)
refdata[site].append(edata[metal]["O"][site]["0.25"])
Plot the results against DFT values
plt.figure(figsize=(6, 6))
plt.plot(refdata["fcc"], data["fcc"], "r.", label="fcc", ms=10)
plt.plot(refdata["hcp"], data["hcp"], "b.", label="hcp", ms=10)
plt.plot([-5.5, -3.5], [-5.5, -3.5], "k-", lw=2, label="y = x")
plt.xlabel("DFT (PBE) [eV]")
plt.ylabel("UMA-OC20 Prediction [eV]")
plt.legend()
plt.title("O Adsorption Energy: DFT vs ML")
plt.axis("equal")
plt.grid(True, alpha=0.3)
plt.show()
The model captures the trend well, with a systematic offset due to the XC functional difference.
We test convergence with respect to the number of layers in the Pt(111) slab
for nlayers in [3, 4, 5, 6, 7, 8]:
slab = fcc111("Pt", size=(2, 2, nlayers), vacuum=10.0)
slab.pbc = True
slab.set_calculator(calc)
BFGS(slab).run(fmax=0.05)
slab_e = slab.get_potential_energy()
adslab = slab.copy()
add_adsorbate(adslab, "O", height=1.2, position="fcc")
adslab.set_calculator(calc)
BFGS(adslab).run(fmax=0.05)
adslab_e = adslab.get_potential_energy()
energy = adslab_e - slab_e - atomic_reference_energies["O"] + re1
print(f"nlayers = {nlayers}: {energy:.2f} eV")
The energy converges to within 0.02 eV by 5–6 layers.
We also test the effect of lateral coverage using larger unit cells
for size in [1, 2, 3, 4, 5]:
slab = fcc111("Pt", size=(size, size, 5), vacuum=10.0)
slab.set_calculator(calc)
BFGS(slab).run(fmax=0.05)
slab_e = slab.get_potential_energy()
adslab = slab.copy()
add_adsorbate(adslab, "O", height=1.2, position="fcc")
adslab.set_calculator(calc)
BFGS(adslab).run(fmax=0.05)
adslab_e = adslab.get_potential_energy()
energy = adslab_e - slab_e - atomic_reference_energies["O"] + re1
print(f"({size}x{size}): {energy:.2f} eV")
Adsorption energies become less favorable at lower coverage, which may indicate the need for fine-tuning at low coverages.
This tutorial demonstrated how to use the UMA model and OC22 dataset to compute adsorption energies efficiently. Key points include
The OC22 dataset and FairChem tools provide a powerful foundation for accelerating catalyst discovery.
Shuaibi M, Liu Z, Goyal P, et al. The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts. arXiv:2206.08917 [cond-mat.mtrl-sci]. 2023.
Xu Z, Kitchin JR. Probing the Coverage Dependence of Site and Adsorbate Configurational Correlations on (111) Surfaces of Late Transition Metals. J. Phys. Chem. C. 2014;118(44):25597–25602.
Hjorth Larsen A, et al. The atomic simulation environment—a Python library for working with atoms. J. Phys.: Condens. Matter. 2017;29(27):273002.
This work uses the OC22 dataset and UMA model from the Open Catalyst Project.
Dataset: OC22 on arXiv
Key Points
Oxide electrocatalysts are critical for the oxygen evolution reaction (OER), but lack sufficient training data for machine learning (ML) models.
Existing datasets like OC20 focus primarily on metallic catalysts and adsorption energies.
OC22 fills this gap by providing a large-scale dataset for oxide materials.
Several graph neural networks (GNNs) were tested, including GemNet-OC, GemNet-dT, SpinConv, Equiformer
The OC22 dataset is a major step toward ML-driven discovery of oxide electrocatalysts.
By shifting from adsorption-energy-specific tasks to generalized total energy prediction, it enables broader scientific applications and better model generalization, especially when combined with prior datasets like OC20.