# FAIRChem FAIRChem is Meta FAIR Chemistry's centralized repository for state-of-the-art machine learning models, datasets, and tools for materials science and quantum chemistry. Built around the Universal Model for Atoms (UMA) - an equivariant graph neural network trained on 500M+ DFT calculations - FAIRChem enables fast and accurate atomistic simulations across diverse domains including heterogeneous catalysis, inorganic materials, molecules, polymers, metal-organic frameworks (MOFs), and molecular crystals. The library provides seamless integration with the Atomic Simulation Environment (ASE) through the `FAIRChemCalculator`, enabling researchers to perform single-point calculations, structure relaxations, molecular dynamics, and advanced property predictions with minimal code changes. FAIRChem supports multi-GPU inference for large-scale simulations, LAMMPS integration for molecular dynamics, fine-tuning of pretrained models on custom datasets, and training models from scratch using a modern Hydra-based configuration system. ## Loading Pretrained Models Load UMA models from Hugging Face and create a predictor for inference. ```python from fairchem.core import pretrained_mlip, FAIRChemCalculator # Available models: "uma-s-1p2" (latest small), "uma-s-1p1", "uma-m-1p1" (medium) predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") # Create ASE calculator with task-specific prediction head # Tasks: "oc20" (catalysis), "oc22" (oxide catalysis), "oc25" (electrocatalysis), # "omat" (inorganic materials), "omol" (molecules/polymers), # "odac" (MOFs), "omc" (molecular crystals) calc = FAIRChemCalculator(predictor, task_name="oc20") ``` ## Relaxing Adsorbates on Catalytic Surfaces Perform geometry optimization of molecules adsorbed on metal surfaces. ```python from ase.build import fcc100, add_adsorbate, molecule from ase.optimize import LBFGS from fairchem.core import pretrained_mlip, FAIRChemCalculator predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") calc = FAIRChemCalculator(predictor, task_name="oc20") # Build Cu(100) slab with CO adsorbate slab = fcc100("Cu", (3, 3, 3), vacuum=8, periodic=True) adsorbate = molecule("CO") add_adsorbate(slab, adsorbate, 2.0, "bridge") slab.calc = calc # Optimize structure opt = LBFGS(slab) opt.run(fmax=0.05, steps=100) print(f"Final energy: {slab.get_potential_energy():.4f} eV") print(f"Max force: {max(abs(slab.get_forces().flatten())):.4f} eV/Å") ``` ## Relaxing Inorganic Crystals with Cell Optimization Optimize both atomic positions and unit cell parameters for bulk materials. ```python from ase.build import bulk from ase.optimize import FIRE from ase.filters import FrechetCellFilter from fairchem.core import pretrained_mlip, FAIRChemCalculator predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") calc = FAIRChemCalculator(predictor, task_name="omat") # Create bulk Fe structure atoms = bulk("Fe") atoms.calc = calc # Optimize with cell relaxation using FrechetCellFilter opt = FIRE(FrechetCellFilter(atoms)) opt.run(fmax=0.05, steps=100) print(f"Optimized lattice constant: {atoms.cell[0, 0]:.4f} Å") print(f"Final energy: {atoms.get_potential_energy():.4f} eV") ``` ## Running Molecular Dynamics Simulations Perform Langevin dynamics with trajectory recording. ```python import numpy as np from ase import units from ase.io import Trajectory from ase.md.langevin import Langevin from ase.build import molecule from fairchem.core import pretrained_mlip, FAIRChemCalculator # Use random seed for reproducible but unique trajectories seed = np.random.randint(0, np.iinfo(np.int32).max, dtype=int) predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda", seed=seed) calc = FAIRChemCalculator(predictor, task_name="omol") atoms = molecule("H2O") atoms.calc = calc # Set up Langevin dynamics at 400K dyn = Langevin( atoms, timestep=0.1 * units.fs, temperature_K=400, friction=0.001 / units.fs, ) # Record trajectory trajectory = Trajectory("water_md.traj", "w", atoms) dyn.attach(trajectory.write, interval=1) dyn.run(steps=1000) print(f"MD completed: {dyn.get_number_of_steps()} steps") ``` ## Calculating Spin Gaps for Molecules Compute energy differences between spin states using charge and spin multiplicity. ```python from ase.build import molecule from fairchem.core import pretrained_mlip, FAIRChemCalculator predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") # Singlet CH2 (closed-shell) singlet = molecule("CH2_s1A1d") singlet.info.update({"spin": 1, "charge": 0}) singlet.calc = FAIRChemCalculator(predictor, task_name="omol") # Triplet CH2 (open-shell) triplet = molecule("CH2_s3B1d") triplet.info.update({"spin": 3, "charge": 0}) triplet.calc = FAIRChemCalculator(predictor, task_name="omol") # Calculate singlet-triplet gap gap = triplet.get_potential_energy() - singlet.get_potential_energy() print(f"Singlet-triplet gap: {gap:.4f} eV") ``` ## Batch Inference on Multiple Structures Efficiently predict properties for multiple structures in a single batch. ```python from ase.build import bulk, molecule from fairchem.core import pretrained_mlip from fairchem.core.datasets.atomic_data import AtomicData, atomicdata_list_to_batch # Create multiple structures atoms_list = [ bulk("Pt"), bulk("Cu"), bulk("NaCl", crystalstructure="rocksalt", a=2.0) ] # Convert to AtomicData with task assignment atomic_data_list = [ AtomicData.from_ase(atoms, task_name="omat") for atoms in atoms_list ] batch = atomicdata_list_to_batch(atomic_data_list) # Run batch prediction predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") preds = predictor.predict(batch) # Access results for i, atoms in enumerate(atoms_list): energy = preds["energy"][i].item() forces = preds["forces"][batch.batch == i].cpu().numpy() print(f"{atoms.get_chemical_formula()}: E = {energy:.4f} eV, max|F| = {abs(forces).max():.4f} eV/Å") ``` ## Heterogeneous Batch Inference (Multiple Tasks) Batch systems with different task types (molecules, materials, surfaces) together. ```python from ase.build import bulk, molecule, fcc100, add_adsorbate from fairchem.core import pretrained_mlip from fairchem.core.datasets.atomic_data import AtomicData, atomicdata_list_to_batch predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") # Molecule with charge/spin h2o = molecule("H2O") h2o.info.update({"charge": 0, "spin": 1}) # Bulk material pt = bulk("Pt") # Catalytic surface slab = fcc100("Cu", (3, 3, 3), vacuum=8, periodic=True) add_adsorbate(slab, molecule("CO"), 2.0, "bridge") # Create batch with different tasks atomic_data_list = [ AtomicData.from_ase(h2o, task_name="omol", r_data_keys=["spin", "charge"], molecule_cell_size=12), AtomicData.from_ase(pt, task_name="omat"), AtomicData.from_ase(slab, task_name="oc20"), ] batch = atomicdata_list_to_batch(atomic_data_list) predictions = predictor.predict(batch) print(f"H2O energy: {predictions['energy'][0].item():.4f} eV") print(f"Pt energy: {predictions['energy'][1].item():.4f} eV") print(f"Cu+CO energy: {predictions['energy'][2].item():.4f} eV") ``` ## Multi-GPU Inference for Large Systems Scale to large systems using graph-parallel inference with Ray. ```python # Requires: pip install fairchem-core[extras] import time import numpy as np from ase import units from ase.md.langevin import Langevin from fairchem.core import pretrained_mlip, FAIRChemCalculator from fairchem.core.datasets.common_structures import get_fcc_crystal_by_num_atoms seed = np.random.randint(0, np.iinfo(np.int32).max, dtype=int) predictor = pretrained_mlip.get_predict_unit( "uma-s-1p2", inference_settings="turbo", # Optimized for MD/relaxations device="cuda", workers=8, # Number of GPUs seed=seed, ) calc = FAIRChemCalculator(predictor, task_name="omat") # Large 8000-atom system atoms = get_fcc_crystal_by_num_atoms(8000) atoms.calc = calc dyn = Langevin( atoms, timestep=0.1 * units.fs, temperature_K=400, friction=0.001 / units.fs, ) # Warmup and benchmark dyn.run(steps=10) start_time = time.time() dyn.run(steps=100) qps = 100 / (time.time() - start_time) print(f"Performance: {qps:.2f} queries per second on 8 GPUs") ``` ## Turbo Mode for Fast MD and Relaxations Use optimized inference settings for single-system trajectories. ```python from fairchem.core import pretrained_mlip, FAIRChemCalculator from fairchem.core.units.mlip_unit.api.inference import InferenceSettings # Preset turbo mode (1.5-2x faster, fixed composition required) predictor = pretrained_mlip.get_predict_unit( "uma-s-1p2", device="cuda", inference_settings="turbo" ) # Or custom settings for advanced use settings = InferenceSettings( tf32=True, # Use TensorFloat-32 for speed activation_checkpointing=False, # Disable for small systems (<1000 atoms) merge_mole=True, # Pre-merge MoLE weights (fixed composition) compile=True, # Use torch.compile for speed external_graph_gen=False, internal_graph_gen_version=2, ) predictor = pretrained_mlip.get_predict_unit( "uma-s-1p2", device="cuda", inference_settings=settings ) calc = FAIRChemCalculator(predictor, task_name="omat") ``` ## Enabling Stress and Hessian Predictions Compute untrained derivatives via autograd. ```python from fairchem.core import pretrained_mlip, FAIRChemCalculator from fairchem.core.units.mlip_unit.api.inference import InferenceSettings from ase.build import molecule # Enable stress and Hessian for omol task settings = InferenceSettings( predict_untrained_stress={"omol"}, predict_untrained_hessian={"omol"} ) predictor = pretrained_mlip.get_predict_unit( "uma-s-1p2", device="cuda", inference_settings=settings ) calc = FAIRChemCalculator(predictor, task_name="omol") atoms = molecule("H2O") atoms.calc = calc energy = atoms.get_potential_energy() forces = atoms.get_forces() hessian = atoms.calc.results.get("hessian") # 3N x 3N matrix print(f"Energy: {energy:.4f} eV") print(f"Forces shape: {forces.shape}") if hessian is not None: print(f"Hessian shape: {hessian.shape}") ``` ## Computing Formation Energies Calculate formation energies with elemental references and optional corrections. ```python from ase.build import bulk from fairchem.core import pretrained_mlip, FAIRChemCalculator from fairchem.core.calculate.ase_calculator import FormationEnergyCalculator predictor = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") base_calc = FAIRChemCalculator(predictor, task_name="omat") # Wrap with formation energy calculator (auto-loads elemental references) calc = FormationEnergyCalculator( base_calc, apply_corrections=True, # Apply MP-style corrections for OMat task correction_type="OMat24" # Use OMat24 corrections ) atoms = bulk("NaCl", crystalstructure="rocksalt", a=5.64) atoms.calc = calc formation_energy = atoms.get_potential_energy() formation_energy_per_atom = formation_energy / len(atoms) print(f"Formation energy: {formation_energy:.4f} eV ({formation_energy_per_atom:.4f} eV/atom)") ``` ## Batched Concurrent Simulations with InferenceBatcher Run many independent relaxations/MD simulations with batched GPU inference. ```python from functools import partial import numpy as np from ase.build import bulk, make_supercell from ase.filters import FrechetCellFilter from ase.optimize import LBFGS from fairchem.core import pretrained_mlip from fairchem.core.calculate import FAIRChemCalculator, InferenceBatcher def run_relaxation(atoms, predict_unit): """Run structure relaxation and return final energy.""" calc = FAIRChemCalculator(predict_unit, task_name="omat") atoms.calc = calc opt = LBFGS(FrechetCellFilter(atoms), logfile=None) opt.run(fmax=0.02, steps=100) return atoms.get_potential_energy() # Create batcher with concurrent workers predict_unit = pretrained_mlip.get_predict_unit("uma-s-1p2", device="cuda") batcher = InferenceBatcher( predict_unit, concurrency_backend_options=dict(max_workers=32) ) # Create structures to relax prim_atoms = [bulk("Cu"), bulk("Ag"), bulk("Au"), bulk("Ni")] atoms_list = [make_supercell(a, 3 * np.eye(3)) for a in prim_atoms] for atoms in atoms_list: atoms.rattle(0.1) # Run all relaxations in parallel with batched inference run_fn = partial(run_relaxation, predict_unit=batcher.batch_predict_unit) energies = list(batcher.executor.map(run_fn, atoms_list)) for atoms, energy in zip(prim_atoms, energies): print(f"{atoms.get_chemical_formula()}: E = {energy:.4f} eV") ``` ## Creating Fine-tuning Datasets Generate ASE-LMDB datasets from various input formats for fine-tuning. ```bash # Clone repo and install git clone git@github.com:facebookresearch/fairchem.git pip install -e fairchem/packages/fairchem-core[dev] # Create fine-tuning dataset from ASE-readable files python src/fairchem/core/scripts/create_uma_finetune_dataset.py \ --train-dir /path/to/train/structures/ \ --val-dir /path/to/val/structures/ \ --output-dir /path/to/output \ --uma-task omat \ --regression-task ef # Options: e (energy), ef (energy+forces), efs (energy+forces+stress) ``` ## Fine-tuning Pretrained Models Fine-tune UMA models on custom datasets using the fairchem CLI. ```bash # Run fine-tuning with generated config fairchem -c /path/to/output/uma_sm_finetune_template.yaml # Override parameters on command line fairchem -c /path/to/output/uma_sm_finetune_template.yaml \ epochs=10 \ lr=2e-4 \ batch_size=4 \ job.run_dir=/path/to/runs \ +job.timestamp_id=my_finetune_run # Resume from checkpoint fairchem -c /path/to/runs/my_finetune_run/checkpoints/final/resume.yaml ``` ## Using Fine-tuned Models Load and use a fine-tuned checkpoint for inference. ```python from fairchem.core.units.mlip_unit import load_predict_unit from fairchem.core import FAIRChemCalculator from ase.build import bulk # Load fine-tuned checkpoint predictor = load_predict_unit("/path/to/runs/my_run/checkpoints/final/inference_ckpt.pt") # Must use same task as fine-tuning calc = FAIRChemCalculator(predictor, task_name="omat") atoms = bulk("Fe") atoms.calc = calc print(f"Energy: {atoms.get_potential_energy():.4f} eV") ``` ## LAMMPS Integration Run large-scale MD with FAIRChem models via LAMMPS fix external. ```bash # Install LAMMPS integration conda install lammps # or build from source pip install fairchem-core[extras] fairchem-lammps # Run LAMMPS with UMA potential lmp_fc lmp_in="simulation.in" task_name="omat" # Multi-GPU parallel inference lmp_fc lmp_in="simulation.in" task_name="omat" predict_unit='${parallel_predict_unit}' ``` Example LAMMPS input file (simulation.in): ``` # LAMMPS input for UMA - remove pair_style, bond_style etc. units metal atom_style atomic boundary p p p read_data structure.data mass 1 26.98 # Al mass 2 63.55 # Cu timestep 0.001 # 1 fs thermo 100 fix 1 all nvt temp 300 300 0.1 run 10000 ``` ## Training Models from Scratch Train UMA or custom models using the FAIRChem training framework. ```bash # Local training with debug dataset fairchem -c configs/uma/training_release/uma_sm_direct_pretrain.yaml \ cluster=h100_local \ dataset=uma_debug # Multi-node SLURM training (16 nodes) fairchem -c configs/uma/training_release/uma_sm_conserve_finetune.yaml \ cluster=h100 \ job.scheduler.num_nodes=16 \ run_name="uma_conserve_train" ``` ## Using Custom ASE Datasets Configure training with ASE database or file-based datasets. ```yaml # config.yaml for ASE database dataset dataset: format: ase_db train: src: /path/to/train.db a2g_args: r_energy: True r_forces: True task_name: omat keep_in_memory: True # Faster for small datasets val: src: /path/to/val.db a2g_args: r_energy: True r_forces: True task_name: omat # For file-based datasets (CIF, VASP, extxyz, etc.) dataset: format: ase_read train: src: /path/to/structures/ pattern: "**/*.cif" # Recursive glob pattern a2g_args: r_energy: True r_forces: True task_name: omat ``` ## Summary FAIRChem provides a comprehensive toolkit for machine learning-accelerated atomistic simulations. The primary workflow involves loading a pretrained UMA model via `pretrained_mlip.get_predict_unit()`, wrapping it in a `FAIRChemCalculator` with the appropriate task name (oc20 for catalysis, omat for materials, omol for molecules, etc.), and using it with standard ASE interfaces for relaxations, molecular dynamics, and property calculations. For high-throughput screening, the library supports batch inference on multiple structures and concurrent simulations via `InferenceBatcher`, while large-scale simulations benefit from multi-GPU inference with the Ray-based `ParallelMLIPPredictUnit` and LAMMPS integration. For domain-specific research, FAIRChem enables fine-tuning of pretrained models on custom datasets using simple CLI commands and Hydra YAML configurations, supporting distributed training on SLURM clusters. Advanced users can customize inference settings for optimal speed (turbo mode) or enable gradient-based property predictions (stress, Hessian) via `InferenceSettings`. The library integrates seamlessly with the broader Python scientific stack including ASE, pymatgen, and standard ML frameworks, making it suitable for applications ranging from catalyst screening and materials discovery to drug design and polymer simulation.