### Install Project Dependencies (Shell)

Source: https://github.com/mcleish7/arithmetic/blob/main/README.md

Commands to clone the repository and install the project's Python dependencies. It also lists optional dependencies that might be required on some systems.

```shell
git clone git@github.com:mcleish7/arithmetic.git
cd arithmetic
pip install .

pip install multiprocess -U
pip install dill -U
pip install apache-beam -U
```

--------------------------------

### Training Configuration Flags (Shell)

Source: https://github.com/mcleish7/arithmetic/blob/main/README.md

Examples of command-line flags used to modify training behavior, such as loss reduction, gradient throttling, masking, skip connections, and multi-GPU setup.

```shell
arch.loss_reduction=none
arch.throttle=True
arch.mask_before_equals=True
arch.forward_only_model_with_skip=True
python torchrun --nproc_per_node=<NUM GPUS> --standalone impl.fullgraph=false
```

--------------------------------

### Train Arithmetic Models via CLI

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Provides command-line interface examples for training models using pretrain.py. Covers standard training, multi-GPU execution with torchrun, and configuration for checkpointing and specific positional embedding strategies.

```bash
python pretrain.py \
    name=addition_abacus_run_1 \
    arch=crammed-depthrecurrent \
    data=arithmetic \
    arch.embedding.pos_embedding=abacus \
    train.optim.lr=0.0001

torchrun --nproc_per_node=4 --standalone pretrain.py \
    name=addition_multigpu \
    impl.fullgraph=false
```

--------------------------------

### Generate Arithmetic Datasets using create_data_split.py

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Provides command-line examples for generating arithmetic datasets (addition, multiplication, sorting) using the `create_data_split.py` script. It covers options like bucket sampling, reversing operands, and tokenization.

```bash
# Generate addition dataset using bucket method (uniform operand length sampling)
# Command line usage:
# python create_data_split.py --bucket --op + --n 20 --m 20 --limit 20000000 \
#     --p 0.0 --dir_name my_addition_dataset --reverse_all

# Generate multiplication dataset
# python create_data_split.py --bucket --op x --n 15 --m 15 --limit 20000000 \
#     --dir_name my_multiplication_dataset --reverse_all --p 0.0

# Generate sorting dataset
# python create_data_split.py --uniform_distribution_sort_data --tokenize \
#     --tokenizer_type sort --test_split_ratio 0.01 --n 10 --m 10 \
#     --limit 20000000 --dir_name my_sorting_dataset --reverse_all

# Tokenize generated dataset
# python create_data_split.py --tokenize --dir_name my_addition_dataset \
#     --tokenizer_type pad --test_split_ratio 0.01
```

--------------------------------

### Programmatically Generate and Tokenize Datasets in Python

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Shows how to programmatically generate arithmetic datasets and tokenize them using the `create_data_split.py` module. It includes examples for generating addition datasets and tokenizing them with padding.

```python
# Example of using the module programmatically
from create_data_split import bucket_method_main, tokenize_main

# Generate 1 million addition samples with operands up to 20 digits
class Flags:
    index_hints = False

dataset, folder, filepath = bucket_method_main(
    n=20,                    # Max digits in first operand
    m=20,                    # Max digits in second operand
    operation='+',           # Operation: +, -, or x
    limit=1000000,           # Number of samples
    dir_name='addition_20x20',
    p=0.0,                   # Probability of random spacing
    no_carry_addition=False,
    reverse_answer=False,
    reverse_all=True,        # Reverse all numbers for better learning
    keep_0_for_len_1=False,
    Flags=Flags()
)
# Sample output: ['4321+8765=68031', '12+987=999', ...]

# Tokenize the generated dataset
tokenize_main(
    dir_name='addition_20x20',
    tokenizer_type='pad',
    test_split_ratio=0.05
)
```

--------------------------------

### Set Cramming Base Directory (Shell)

Source: https://github.com/mcleish7/arithmetic/blob/main/README.md

Instructions to set up the base directory for cramming data, models, and logs. This involves creating a directory and exporting its path to the .bashrc file.

```shell
cd arithmetic
mkdir cramming-data
echo 'export cramming_base_dir=MY_BASE_DIR' >> ~/.bashrc
source ~/.bashrc
```

--------------------------------

### Checkpointing Configuration (Shell)

Source: https://github.com/mcleish7/arithmetic/blob/main/README.md

Command to enable and configure single-GPU training checkpointing, specifying the save interval and the name for intermediate models.

```shell
impl.save_every_n_minutes=60 impl.save_intermediate_model_name='last'
```

--------------------------------

### Construct Arithmetic Transformer Model with Python

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Demonstrates how to initialize a transformer architecture using the cramming library, configure specific arithmetic-focused parameters like hidden sizes and positional embeddings, and perform a forward pass.

```python
import cramming
from cramming.data.tokenizer_preparation import get_tokenizer
import torch

tokenizer = get_tokenizer("pad")
cfg_arch = cramming.get_model_config(arch="crammed-depthrecurrent")

cfg_arch.hidden_size = 1024
cfg_arch.embedding.pos_embedding = "abacus"
cfg_arch.attention.rotary_embedding = "fire"

model = cramming.construct_model(cfg_arch, tokenizer)
model.to("cuda").train()

input_ids = torch.tensor([[5, 6, 7, 14, 8, 9, 10, 17]], device="cuda")
outputs = model(input_ids)
print(f"Loss: {outputs['loss'].item():.4f}")
```

--------------------------------

### Load and Prepare Arithmetic Datasets

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Utility functions to initialize data configurations and prepare PyTorch dataloaders for training or evaluation on arithmetic datasets.

```python
import cramming
from cramming.data import load_pretraining_corpus, prepare_dataloaders

cfg_data = cramming.get_config(overrides=[
    "data=arithmetic",
    "data.sources.arithmetic.tokenized_dataset_path=arithmetic_data/+_bucket_n_20_m_20/hf_tokenized_dataset",
    "data.sources.arithmetic.tokenizer_type=pad"
])

cfg_impl = cramming.get_backend_config()

tokenized_dataset, tokenizer = load_pretraining_corpus(
    cfg_data.data,
    cfg_impl,
    data_dir="./cramming-data/data"
)

dataloaders = prepare_dataloaders(
    tokenized_dataset,
    tokenizer,
    cfg_train=cfg_data.train,
    cfg_impl=cfg_impl
)
```

--------------------------------

### Load Model and Perform Inference

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Demonstrates how to load a trained model checkpoint using the cramming library and perform inference on an arithmetic string. The process includes tokenization, model construction, and decoding the generated output.

```python
import torch
import cramming
from cramming.data.tokenizer_preparation import get_tokenizer
from safetensors.torch import load_file

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = get_tokenizer("pad")
checkpoint_folder = "outputs/addition_abacus_run_1/checkpoints"
tokenizer, cfg_arch, model_file = cramming.utils.find_pretrained_checkpoint(
    checkpoint="FINAL",
    local_checkpoint_folder=checkpoint_folder,
    arch_modifications=None
)

model = cramming.construct_model(cfg_arch, tokenizer)
model = cramming.backend.load_model_checkpoint(model, model_file)
model.to(device)
model.eval()

problem = "321+654="
reversed_problem = problem[::-1]
tokenized = tokenizer(reversed_problem)["input_ids"]
input_ids = torch.tensor([tokenized], device=device)

with torch.no_grad():
    predicted_ids = model._generate(
        input_ids,
        token_limit=10,
        temperature=1.0,
        steps_at_generation_time=cfg_arch.maximal_recurrence,
        greedy=True,
        quick=True
    )

output_tokens = predicted_ids[0].tolist()
answer = tokenizer.decode(output_tokens)
print(f"Input: {problem}")
print(f"Output: {answer[::-1]}")
```

--------------------------------

### Execute Arithmetic Evaluations via CLI

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Run arithmetic evaluation scripts using command-line arguments to specify task parameters, tokenizer types, and evaluation settings. These commands interface with the cramming framework to execute specific training or evaluation runs.

```bash
python arithmetic_eval_quicker.py \
    name=bitwise_or_run_1 \
    base_dir=$cramming_base_dir \
    data=arithmetic \
    max_rec=1 \
    token_limit=105 \
    pos_arth=True \
    remove_padding=False \
    data.sources.arithmetic.tokenizer_type="pad"

python sort_eval.py \
    name=sorting_run_1 \
    base_dir=$cramming_base_dir \
    data=arithmetic \
    max_rec=1 \
    sort_reverse=True \
    data.sources.arithmetic.tokenizer_type='sort' \
    max_size_given=31 \
    start_ind_1_given=1 \
    start_ind_2_given=1

python arithmetic_eval_quicker.py \
    name=addition_abacus_run_1 \
    big_eval_step_1=True \
    extended_eval=True \
    reverse_inputs=True \
    checkerboard=even \
    remove_padding=True
```

--------------------------------

### Positional Embeddings Configuration (Shell)

Source: https://github.com/mcleish7/arithmetic/blob/main/README.md

Demonstrates how to configure different positional embedding strategies (Absolute: Learned, Abacus; Relative: NoPE, FIRE, FIRE randomised, RoPE) using command-line flags.

```shell
arch.embedding.pos_embedding=learned
arch.embedding.pos_embedding=abacus arch.embedding.max_abacus_len=100
arch.embedding.pos_embedding=None
arch.attention.type="self-attention" arch.attention.rotary_embedding="fire"
arch.embedding.pos_embedding=None arch.attention.type="self-attention" arch.attention.rotary_embedding="fire" arch.attention.max_length=128
arch.attention.type="self-attention" arch.attention.rotary_embedding=true
```

--------------------------------

### Initialize and Use Abacus Embeddings in Python

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Demonstrates how to initialize Abacus embeddings, which are a novel positional encoding technique for transformers. It shows the forward pass for generating embeddings and how to switch between training and evaluation modes.

```python
import torch
from abacus import Abacus

# Initialize Abacus embeddings with digit tokens from your tokenizer
# digit_tokens = tokenizer.convert_tokens_to_ids(['0','1','2','3','4','5','6','7','8','9'])
digit_tokens = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]  # Example token IDs for digits 0-9

abacus = Abacus(
    digit_tokens=digit_tokens,
    embedding_dim=1024,
    max_seq_length=1024,
    max_k=99  # Maximum random shift during training
)

# Forward pass with input token IDs
# For input "123+456=579", digit positions are tracked independently
input_ids = torch.tensor([[5, 6, 7, 14, 8, 9, 10, 17, 9, 11, 13]])  # Example tokenized arithmetic
embeddings = abacus(input_ids)
# Shape: [batch_size, seq_length, embedding_dim]
print(f"Embedding shape: {embeddings.shape}")
# Output: Embedding shape: torch.Size([1, 11, 1024])

# During training, random offset k is applied for robustness
abacus.train()
train_embeddings = abacus(input_ids)

# During evaluation, no random offset is applied
abacus.eval()
eval_embeddings = abacus(input_ids)
```

--------------------------------

### Configure Positional Embeddings

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Shows how to modify the architecture configuration to support various positional embedding strategies, including Abacus, RoPE, and FIRE, which are critical for arithmetic generalization.

```python
# Learned Positional Embeddings
cfg_arch.embedding.pos_embedding = "learned"

# Abacus Embeddings
cfg_arch.embedding.pos_embedding = "abacus"
cfg_arch.embedding.max_abacus_len = 100

# RoPE (Rotary Position Embeddings)
cfg_arch.embedding.pos_embedding = None
cfg_arch.attention.type = "self-attention"
cfg_arch.attention.rotary_embedding = True

# Combined: Abacus + FIRE
cfg_arch.embedding.pos_embedding = "abacus"
cfg_arch.attention.type = "self-attention"
cfg_arch.attention.rotary_embedding = "fire"
cfg_arch.attention.max_length = 128
```

--------------------------------

### Evaluate Arithmetic Models with Grid Analysis

Source: https://context7.com/mcleish7/arithmetic/llms.txt

Executes systematic evaluation of trained models across different operand lengths using arithmetic_eval_quicker.py. Supports grid-based testing, parallelized job splitting, and specific arithmetic operations like multiplication.

```bash
python arithmetic_eval_quicker.py \
    name=addition_abacus_run_1 \
    data=arithmetic \
    max_rec=1 \
    token_limit=105 \
    greedy=True

python arithmetic_eval_quicker.py \
    name=multiplication_run_1 \
    mul=True \
    token_limit=30
```

--------------------------------

### Iterate and Decode Training Data

Source: https://context7.com/mcleish7/arithmetic/llms.txt

This snippet demonstrates how to iterate through a PyTorch DataLoader to inspect training batches. It extracts input IDs and uses a tokenizer to decode samples, providing a quick way to verify data formatting and model inputs.

```python
for batch_idx, batch in enumerate(dataloaders["train"]):
    input_ids = batch["input_ids"]
    print(f"Batch {batch_idx}: shape {input_ids.shape}")

    # Decode sample
    sample = tokenizer.decode(input_ids[0].tolist())
    print(f"Sample: {sample}")

    if batch_idx >= 2:
        break
```

--------------------------------

### BibTeX Citation

Source: https://github.com/mcleish7/arithmetic/blob/main/README.md

The BibTeX entry for citing the research paper 'Transformers Can Do Arithmetic with the Right Embeddings'.

```bibtex
@article{mcleish2024transformers,
    title={Transformers Can Do Arithmetic with the Right Embeddings},
    author={Sean McLeish and Arpit Bansal and Alex Stein and Neel Jain and John Kirchenbauer and Brian R. Bartoldson and Bhavya Kailkhura and Abhinav Bhatele and Jonas Geiping and Avi Schwarzschild and Tom Goldstein},
    journal={arXiv preprint arXiv:2405.17399},
    year={2024}
}
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.