# FAIRChem

FAIRChem is Meta FAIR Chemistry's centralized repository for state-of-the-art machine learning models, datasets, and tools for materials science and quantum chemistry. Built around the Universal Model for Atoms (UMA) - an equivariant graph neural network trained on 500M+ DFT calculations - FAIRChem enables fast and accurate atomistic simulations across diverse domains including heterogeneous catalysis, inorganic materials, molecules, polymers, metal-organic frameworks (MOFs), and molecular crystals.

The library provides seamless integration with the Atomic Simulation Environment (ASE) through the `FAIRChemCalculator`, enabling researchers to perform single-point calculations, structure relaxations, molecular dynamics, and advanced property predictions with minimal code changes. FAIRChem supports multi-GPU inference for large-scale simulations, LAMMPS integration for molecular dynamics, fine-tuning of pretrained models on custom datasets, and training models from scratch using a modern Hydra-based configuration system.

## Loading Pretrained Models

Load UMA models from Hugging Face and create a predictor for inference.

```python
from fairchem.core import pretrained_mlip, FAIRChemCalculator

# Available models: "uma-s-1p2" (latest small), "uma-s-1p1", "uma-m-1p1" (medium)
predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")

# Create ASE calculator with task-specific prediction head
# Tasks: "oc20" (catalysis), "oc22" (oxide catalysis), "oc25" (electrocatalysis),
#        "omat" (inorganic materials), "omol" (molecules/polymers),
#        "odac" (MOFs), "omc" (molecular crystals)
calc = FAIRChemCalculator(predictor, task_name="oc20")
```

## Relaxing Adsorbates on Catalytic Surfaces

Perform geometry optimization of molecules adsorbed on metal surfaces.

```python
from ase.build import fcc100, add_adsorbate, molecule
from ase.optimize import LBFGS
from fairchem.core import pretrained_mlip, FAIRChemCalculator

predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="oc20")

# Build Cu(100) slab with CO adsorbate
slab = fcc100("Cu", (3, 3, 3), vacuum=8, periodic=True)
adsorbate = molecule("CO")
add_adsorbate(slab, adsorbate, 2.0, "bridge")
slab.calc = calc

# Optimize structure
opt = LBFGS(slab)
opt.run(fmax=0.05, steps=100)

print(f"Final energy: {slab.get_potential_energy():.4f} eV")
print(f"Max force: {max(abs(slab.get_forces().flatten())):.4f} eV/Å")
```

## Relaxing Inorganic Crystals with Cell Optimization

Optimize both atomic positions and unit cell parameters for bulk materials.

```python
from ase.build import bulk
from ase.optimize import FIRE
from ase.filters import FrechetCellFilter
from fairchem.core import pretrained_mlip, FAIRChemCalculator

predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="omat")

# Create bulk Fe structure
atoms = bulk("Fe")
atoms.calc = calc

# Optimize with cell relaxation using FrechetCellFilter
opt = FIRE(FrechetCellFilter(atoms))
opt.run(fmax=0.05, steps=100)

print(f"Optimized lattice constant: {atoms.cell[0, 0]:.4f} Å")
print(f"Final energy: {atoms.get_potential_energy():.4f} eV")
```

## Running Molecular Dynamics Simulations

Perform Langevin dynamics with trajectory recording.

```python
import numpy as np
from ase import units
from ase.io import Trajectory
from ase.md.langevin import Langevin
from ase.build import molecule
from fairchem.core import pretrained_mlip, FAIRChemCalculator

# Use random seed for reproducible but unique trajectories
seed = np.random.randint(0, np.iinfo(np.int32).max, dtype=int)
predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda", seed=seed)
calc = FAIRChemCalculator(predictor, task_name="omol")

atoms = molecule("H2O")
atoms.calc = calc

# Set up Langevin dynamics at 400K
dyn = Langevin(
    atoms,
    timestep=0.1 * units.fs,
    temperature_K=400,
    friction=0.001 / units.fs,
)

# Record trajectory
trajectory = Trajectory("water_md.traj", "w", atoms)
dyn.attach(trajectory.write, interval=1)
dyn.run(steps=1000)

print(f"MD completed: {dyn.get_number_of_steps()} steps")
```

## Calculating Spin Gaps for Molecules

Compute energy differences between spin states using charge and spin multiplicity.

```python
from ase.build import molecule
from fairchem.core import pretrained_mlip, FAIRChemCalculator

predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")

# Singlet CH2 (closed-shell)
singlet = molecule("CH2_s1A1d")
singlet.info.update({"spin": 1, "charge": 0})
singlet.calc = FAIRChemCalculator(predictor, task_name="omol")

# Triplet CH2 (open-shell)
triplet = molecule("CH2_s3B1d")
triplet.info.update({"spin": 3, "charge": 0})
triplet.calc = FAIRChemCalculator(predictor, task_name="omol")

# Calculate singlet-triplet gap
gap = triplet.get_potential_energy() - singlet.get_potential_energy()
print(f"Singlet-triplet gap: {gap:.4f} eV")
```

## Batch Inference on Multiple Structures

Efficiently predict properties for multiple structures in a single batch.

```python
from ase.build import bulk, molecule
from fairchem.core import pretrained_mlip
from fairchem.core.datasets.atomic_data import AtomicData, atomicdata_list_to_batch

# Create multiple structures
atoms_list = [
    bulk("Pt"),
    bulk("Cu"),
    bulk("NaCl", crystalstructure="rocksalt", a=2.0)
]

# Convert to AtomicData with task assignment
atomic_data_list = [
    AtomicData.from_ase(atoms, task_name="omat") for atoms in atoms_list
]
batch = atomicdata_list_to_batch(atomic_data_list)

# Run batch prediction
predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")
preds = predictor.predict(batch)

# Access results
for i, atoms in enumerate(atoms_list):
    energy = preds["energy"][i].item()
    forces = preds["forces"][batch.batch == i].cpu().numpy()
    print(f"{atoms.get_chemical_formula()}: E = {energy:.4f} eV, max|F| = {abs(forces).max():.4f} eV/Å")
```

## Heterogeneous Batch Inference (Multiple Tasks)

Batch systems with different task types (molecules, materials, surfaces) together.

```python
from ase.build import bulk, molecule, fcc100, add_adsorbate
from fairchem.core import pretrained_mlip
from fairchem.core.datasets.atomic_data import AtomicData, atomicdata_list_to_batch

predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")

# Molecule with charge/spin
h2o = molecule("H2O")
h2o.info.update({"charge": 0, "spin": 1})

# Bulk material
pt = bulk("Pt")

# Catalytic surface
slab = fcc100("Cu", (3, 3, 3), vacuum=8, periodic=True)
add_adsorbate(slab, molecule("CO"), 2.0, "bridge")

# Create batch with different tasks
atomic_data_list = [
    AtomicData.from_ase(h2o, task_name="omol", r_data_keys=["spin", "charge"], molecule_cell_size=12),
    AtomicData.from_ase(pt, task_name="omat"),
    AtomicData.from_ase(slab, task_name="oc20"),
]
batch = atomicdata_list_to_batch(atomic_data_list)

predictions = predictor.predict(batch)
print(f"H2O energy: {predictions['energy'][0].item():.4f} eV")
print(f"Pt energy: {predictions['energy'][1].item():.4f} eV")
print(f"Cu+CO energy: {predictions['energy'][2].item():.4f} eV")
```

## Multi-GPU Inference for Large Systems

Scale to large systems using graph-parallel inference with Ray.

```python
# Requires: pip install fairchem-core[extras]
import time
import numpy as np
from ase import units
from ase.md.langevin import Langevin
from fairchem.core import pretrained_mlip, FAIRChemCalculator
from fairchem.core.datasets.common_structures import get_fcc_crystal_by_num_atoms

seed = np.random.randint(0, np.iinfo(np.int32).max, dtype=int)
predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2",
    inference_settings="turbo",  # Optimized for MD/relaxations
    device="cuda",
    workers=8,  # Number of GPUs
    seed=seed,
)
calc = FAIRChemCalculator(predictor, task_name="omat")

# Large 8000-atom system
atoms = get_fcc_crystal_by_num_atoms(8000)
atoms.calc = calc

dyn = Langevin(
    atoms,
    timestep=0.1 * units.fs,
    temperature_K=400,
    friction=0.001 / units.fs,
)

# Warmup and benchmark
dyn.run(steps=10)
start_time = time.time()
dyn.run(steps=100)
qps = 100 / (time.time() - start_time)
print(f"Performance: {qps:.2f} queries per second on 8 GPUs")
```

## Turbo Mode for Fast MD and Relaxations

Use optimized inference settings for single-system trajectories.

```python
from fairchem.core import pretrained_mlip, FAIRChemCalculator
from fairchem.core.units.mlip_unit.api.inference import InferenceSettings

# Preset turbo mode (1.5-2x faster, fixed composition required)
predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", device="cuda", inference_settings="turbo"
)

# Or custom settings for advanced use
settings = InferenceSettings(
    tf32=True,                      # Use TensorFloat-32 for speed
    activation_checkpointing=False,  # Disable for small systems (<1000 atoms)
    merge_mole=True,                 # Pre-merge MoLE weights (fixed composition)
    compile=True,                    # Use torch.compile for speed
    external_graph_gen=False,
    internal_graph_gen_version=2,
)

predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", device="cuda", inference_settings=settings
)
calc = FAIRChemCalculator(predictor, task_name="omat")
```

## Enabling Stress and Hessian Predictions

Compute untrained derivatives via autograd.

```python
from fairchem.core import pretrained_mlip, FAIRChemCalculator
from fairchem.core.units.mlip_unit.api.inference import InferenceSettings
from ase.build import molecule

# Enable stress and Hessian for omol task
settings = InferenceSettings(
    predict_untrained_stress={"omol"},
    predict_untrained_hessian={"omol"}
)

predictor = pretrained_mlip.get_predict_unit(
    "uma-s-1p2", device="cuda", inference_settings=settings
)
calc = FAIRChemCalculator(predictor, task_name="omol")

atoms = molecule("H2O")
atoms.calc = calc

energy = atoms.get_potential_energy()
forces = atoms.get_forces()
hessian = atoms.calc.results.get("hessian")  # 3N x 3N matrix

print(f"Energy: {energy:.4f} eV")
print(f"Forces shape: {forces.shape}")
if hessian is not None:
    print(f"Hessian shape: {hessian.shape}")
```

## Computing Formation Energies

Calculate formation energies with elemental references and optional corrections.

```python
from ase.build import bulk
from fairchem.core import pretrained_mlip, FAIRChemCalculator
from fairchem.core.calculate.ase_calculator import FormationEnergyCalculator

predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")
base_calc = FAIRChemCalculator(predictor, task_name="omat")

# Wrap with formation energy calculator (auto-loads elemental references)
calc = FormationEnergyCalculator(
    base_calc,
    apply_corrections=True,      # Apply MP-style corrections for OMat task
    correction_type="OMat24"     # Use OMat24 corrections
)

atoms = bulk("NaCl", crystalstructure="rocksalt", a=5.64)
atoms.calc = calc

formation_energy = atoms.get_potential_energy()
formation_energy_per_atom = formation_energy / len(atoms)
print(f"Formation energy: {formation_energy:.4f} eV ({formation_energy_per_atom:.4f} eV/atom)")
```

## Batched Concurrent Simulations with InferenceBatcher

Run many independent relaxations/MD simulations with batched GPU inference.

```python
from functools import partial
import numpy as np
from ase.build import bulk, make_supercell
from ase.filters import FrechetCellFilter
from ase.optimize import LBFGS
from fairchem.core import pretrained_mlip
from fairchem.core.calculate import FAIRChemCalculator, InferenceBatcher

def run_relaxation(atoms, predict_unit):
    """Run structure relaxation and return final energy."""
    calc = FAIRChemCalculator(predict_unit, task_name="omat")
    atoms.calc = calc
    opt = LBFGS(FrechetCellFilter(atoms), logfile=None)
    opt.run(fmax=0.02, steps=100)
    return atoms.get_potential_energy()

# Create batcher with concurrent workers
predict_unit = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda")
batcher = InferenceBatcher(
    predict_unit,
    concurrency_backend_options=dict(max_workers=32)
)

# Create structures to relax
prim_atoms = [bulk("Cu"), bulk("Ag"), bulk("Au"), bulk("Ni")]
atoms_list = [make_supercell(a, 3 * np.eye(3)) for a in prim_atoms]
for atoms in atoms_list:
    atoms.rattle(0.1)

# Run all relaxations in parallel with batched inference
run_fn = partial(run_relaxation, predict_unit=batcher.batch_predict_unit)
energies = list(batcher.executor.map(run_fn, atoms_list))

for atoms, energy in zip(prim_atoms, energies):
    print(f"{atoms.get_chemical_formula()}: E = {energy:.4f} eV")
```

## Creating Fine-tuning Datasets

Generate ASE-LMDB datasets from various input formats for fine-tuning.

```bash
# Clone repo and install
git clone git@github.com:facebookresearch/fairchem.git
pip install -e fairchem/packages/fairchem-core[dev]

# Create fine-tuning dataset from ASE-readable files
python src/fairchem/core/scripts/create_uma_finetune_dataset.py \
    --train-dir /path/to/train/structures/ \
    --val-dir /path/to/val/structures/ \
    --output-dir /path/to/output \
    --uma-task omat \
    --regression-task ef  # Options: e (energy), ef (energy+forces), efs (energy+forces+stress)
```

## Fine-tuning Pretrained Models

Fine-tune UMA models on custom datasets using the fairchem CLI.

```bash
# Run fine-tuning with generated config
fairchem -c /path/to/output/uma_sm_finetune_template.yaml

# Override parameters on command line
fairchem -c /path/to/output/uma_sm_finetune_template.yaml \
    epochs=10 \
    lr=2e-4 \
    batch_size=4 \
    job.run_dir=/path/to/runs \
    +job.timestamp_id=my_finetune_run

# Resume from checkpoint
fairchem -c /path/to/runs/my_finetune_run/checkpoints/final/resume.yaml
```

## Using Fine-tuned Models

Load and use a fine-tuned checkpoint for inference.

```python
from fairchem.core.units.mlip_unit import load_predict_unit
from fairchem.core import FAIRChemCalculator
from ase.build import bulk

# Load fine-tuned checkpoint
predictor = load_predict_unit("/path/to/runs/my_run/checkpoints/final/inference_ckpt.pt")

# Must use same task as fine-tuning
calc = FAIRChemCalculator(predictor, task_name="omat")

atoms = bulk("Fe")
atoms.calc = calc
print(f"Energy: {atoms.get_potential_energy():.4f} eV")
```

## LAMMPS Integration

Run large-scale MD with FAIRChem models via LAMMPS fix external.

```bash
# Install LAMMPS integration
conda install lammps  # or build from source
pip install fairchem-core[extras] fairchem-lammps

# Run LAMMPS with UMA potential
lmp_fc lmp_in="simulation.in" task_name="omat"

# Multi-GPU parallel inference
lmp_fc lmp_in="simulation.in" task_name="omat" predict_unit='${parallel_predict_unit}'
```

Example LAMMPS input file (simulation.in):
```
# LAMMPS input for UMA - remove pair_style, bond_style etc.
units metal
atom_style atomic
boundary p p p

read_data structure.data

mass 1 26.98  # Al
mass 2 63.55  # Cu

timestep 0.001  # 1 fs
thermo 100

fix 1 all nvt temp 300 300 0.1
run 10000
```

## Training Models from Scratch

Train UMA or custom models using the FAIRChem training framework.

```bash
# Local training with debug dataset
fairchem -c configs/uma/training_release/uma_sm_direct_pretrain.yaml \
    cluster=h100_local \
    dataset=uma_debug

# Multi-node SLURM training (16 nodes)
fairchem -c configs/uma/training_release/uma_sm_conserve_finetune.yaml \
    cluster=h100 \
    job.scheduler.num_nodes=16 \
    run_name="uma_conserve_train"
```

## Using Custom ASE Datasets

Configure training with ASE database or file-based datasets.

```yaml
# config.yaml for ASE database dataset
dataset:
  format: ase_db
  train:
    src: /path/to/train.db
    a2g_args:
      r_energy: True
      r_forces: True
      task_name: omat
    keep_in_memory: True  # Faster for small datasets
  val:
    src: /path/to/val.db
    a2g_args:
      r_energy: True
      r_forces: True
      task_name: omat

# For file-based datasets (CIF, VASP, extxyz, etc.)
dataset:
  format: ase_read
  train:
    src: /path/to/structures/
    pattern: "**/*.cif"  # Recursive glob pattern
    a2g_args:
      r_energy: True
      r_forces: True
      task_name: omat
```

## Summary

FAIRChem provides a comprehensive toolkit for machine learning-accelerated atomistic simulations. The primary workflow involves loading a pretrained UMA model via `pretrained_mlip.get_predict_unit()`, wrapping it in a `FAIRChemCalculator` with the appropriate task name (oc20 for catalysis, omat for materials, omol for molecules, etc.), and using it with standard ASE interfaces for relaxations, molecular dynamics, and property calculations. For high-throughput screening, the library supports batch inference on multiple structures and concurrent simulations via `InferenceBatcher`, while large-scale simulations benefit from multi-GPU inference with the Ray-based `ParallelMLIPPredictUnit` and LAMMPS integration.

For domain-specific research, FAIRChem enables fine-tuning of pretrained models on custom datasets using simple CLI commands and Hydra YAML configurations, supporting distributed training on SLURM clusters. Advanced users can customize inference settings for optimal speed (turbo mode) or enable gradient-based property predictions (stress, Hessian) via `InferenceSettings`. The library integrates seamlessly with the broader Python scientific stack including ASE, pymatgen, and standard ML frameworks, making it suitable for applications ranging from catalyst screening and materials discovery to drug design and polymer simulation.