### Install Project Dependencies (Shell) Source: https://github.com/mcleish7/arithmetic/blob/main/README.md Commands to clone the repository and install the project's Python dependencies. It also lists optional dependencies that might be required on some systems. ```shell git clone git@github.com:mcleish7/arithmetic.git cd arithmetic pip install . pip install multiprocess -U pip install dill -U pip install apache-beam -U ``` -------------------------------- ### Training Configuration Flags (Shell) Source: https://github.com/mcleish7/arithmetic/blob/main/README.md Examples of command-line flags used to modify training behavior, such as loss reduction, gradient throttling, masking, skip connections, and multi-GPU setup. ```shell arch.loss_reduction=none arch.throttle=True arch.mask_before_equals=True arch.forward_only_model_with_skip=True python torchrun --nproc_per_node= --standalone impl.fullgraph=false ``` -------------------------------- ### Train Arithmetic Models via CLI Source: https://context7.com/mcleish7/arithmetic/llms.txt Provides command-line interface examples for training models using pretrain.py. Covers standard training, multi-GPU execution with torchrun, and configuration for checkpointing and specific positional embedding strategies. ```bash python pretrain.py \ name=addition_abacus_run_1 \ arch=crammed-depthrecurrent \ data=arithmetic \ arch.embedding.pos_embedding=abacus \ train.optim.lr=0.0001 torchrun --nproc_per_node=4 --standalone pretrain.py \ name=addition_multigpu \ impl.fullgraph=false ``` -------------------------------- ### Generate Arithmetic Datasets using create_data_split.py Source: https://context7.com/mcleish7/arithmetic/llms.txt Provides command-line examples for generating arithmetic datasets (addition, multiplication, sorting) using the `create_data_split.py` script. It covers options like bucket sampling, reversing operands, and tokenization. ```bash # Generate addition dataset using bucket method (uniform operand length sampling) # Command line usage: # python create_data_split.py --bucket --op + --n 20 --m 20 --limit 20000000 \ # --p 0.0 --dir_name my_addition_dataset --reverse_all # Generate multiplication dataset # python create_data_split.py --bucket --op x --n 15 --m 15 --limit 20000000 \ # --dir_name my_multiplication_dataset --reverse_all --p 0.0 # Generate sorting dataset # python create_data_split.py --uniform_distribution_sort_data --tokenize \ # --tokenizer_type sort --test_split_ratio 0.01 --n 10 --m 10 \ # --limit 20000000 --dir_name my_sorting_dataset --reverse_all # Tokenize generated dataset # python create_data_split.py --tokenize --dir_name my_addition_dataset \ # --tokenizer_type pad --test_split_ratio 0.01 ``` -------------------------------- ### Programmatically Generate and Tokenize Datasets in Python Source: https://context7.com/mcleish7/arithmetic/llms.txt Shows how to programmatically generate arithmetic datasets and tokenize them using the `create_data_split.py` module. It includes examples for generating addition datasets and tokenizing them with padding. ```python # Example of using the module programmatically from create_data_split import bucket_method_main, tokenize_main # Generate 1 million addition samples with operands up to 20 digits class Flags: index_hints = False dataset, folder, filepath = bucket_method_main( n=20, # Max digits in first operand m=20, # Max digits in second operand operation='+', # Operation: +, -, or x limit=1000000, # Number of samples dir_name='addition_20x20', p=0.0, # Probability of random spacing no_carry_addition=False, reverse_answer=False, reverse_all=True, # Reverse all numbers for better learning keep_0_for_len_1=False, Flags=Flags() ) # Sample output: ['4321+8765=68031', '12+987=999', ...] # Tokenize the generated dataset tokenize_main( dir_name='addition_20x20', tokenizer_type='pad', test_split_ratio=0.05 ) ``` -------------------------------- ### Set Cramming Base Directory (Shell) Source: https://github.com/mcleish7/arithmetic/blob/main/README.md Instructions to set up the base directory for cramming data, models, and logs. This involves creating a directory and exporting its path to the .bashrc file. ```shell cd arithmetic mkdir cramming-data echo 'export cramming_base_dir=MY_BASE_DIR' >> ~/.bashrc source ~/.bashrc ``` -------------------------------- ### Checkpointing Configuration (Shell) Source: https://github.com/mcleish7/arithmetic/blob/main/README.md Command to enable and configure single-GPU training checkpointing, specifying the save interval and the name for intermediate models. ```shell impl.save_every_n_minutes=60 impl.save_intermediate_model_name='last' ``` -------------------------------- ### Construct Arithmetic Transformer Model with Python Source: https://context7.com/mcleish7/arithmetic/llms.txt Demonstrates how to initialize a transformer architecture using the cramming library, configure specific arithmetic-focused parameters like hidden sizes and positional embeddings, and perform a forward pass. ```python import cramming from cramming.data.tokenizer_preparation import get_tokenizer import torch tokenizer = get_tokenizer("pad") cfg_arch = cramming.get_model_config(arch="crammed-depthrecurrent") cfg_arch.hidden_size = 1024 cfg_arch.embedding.pos_embedding = "abacus" cfg_arch.attention.rotary_embedding = "fire" model = cramming.construct_model(cfg_arch, tokenizer) model.to("cuda").train() input_ids = torch.tensor([[5, 6, 7, 14, 8, 9, 10, 17]], device="cuda") outputs = model(input_ids) print(f"Loss: {outputs['loss'].item():.4f}") ``` -------------------------------- ### Load and Prepare Arithmetic Datasets Source: https://context7.com/mcleish7/arithmetic/llms.txt Utility functions to initialize data configurations and prepare PyTorch dataloaders for training or evaluation on arithmetic datasets. ```python import cramming from cramming.data import load_pretraining_corpus, prepare_dataloaders cfg_data = cramming.get_config(overrides=[ "data=arithmetic", "data.sources.arithmetic.tokenized_dataset_path=arithmetic_data/+_bucket_n_20_m_20/hf_tokenized_dataset", "data.sources.arithmetic.tokenizer_type=pad" ]) cfg_impl = cramming.get_backend_config() tokenized_dataset, tokenizer = load_pretraining_corpus( cfg_data.data, cfg_impl, data_dir="./cramming-data/data" ) dataloaders = prepare_dataloaders( tokenized_dataset, tokenizer, cfg_train=cfg_data.train, cfg_impl=cfg_impl ) ``` -------------------------------- ### Load Model and Perform Inference Source: https://context7.com/mcleish7/arithmetic/llms.txt Demonstrates how to load a trained model checkpoint using the cramming library and perform inference on an arithmetic string. The process includes tokenization, model construction, and decoding the generated output. ```python import torch import cramming from cramming.data.tokenizer_preparation import get_tokenizer from safetensors.torch import load_file device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = get_tokenizer("pad") checkpoint_folder = "outputs/addition_abacus_run_1/checkpoints" tokenizer, cfg_arch, model_file = cramming.utils.find_pretrained_checkpoint( checkpoint="FINAL", local_checkpoint_folder=checkpoint_folder, arch_modifications=None ) model = cramming.construct_model(cfg_arch, tokenizer) model = cramming.backend.load_model_checkpoint(model, model_file) model.to(device) model.eval() problem = "321+654=" reversed_problem = problem[::-1] tokenized = tokenizer(reversed_problem)["input_ids"] input_ids = torch.tensor([tokenized], device=device) with torch.no_grad(): predicted_ids = model._generate( input_ids, token_limit=10, temperature=1.0, steps_at_generation_time=cfg_arch.maximal_recurrence, greedy=True, quick=True ) output_tokens = predicted_ids[0].tolist() answer = tokenizer.decode(output_tokens) print(f"Input: {problem}") print(f"Output: {answer[::-1]}") ``` -------------------------------- ### Execute Arithmetic Evaluations via CLI Source: https://context7.com/mcleish7/arithmetic/llms.txt Run arithmetic evaluation scripts using command-line arguments to specify task parameters, tokenizer types, and evaluation settings. These commands interface with the cramming framework to execute specific training or evaluation runs. ```bash python arithmetic_eval_quicker.py \ name=bitwise_or_run_1 \ base_dir=$cramming_base_dir \ data=arithmetic \ max_rec=1 \ token_limit=105 \ pos_arth=True \ remove_padding=False \ data.sources.arithmetic.tokenizer_type="pad" python sort_eval.py \ name=sorting_run_1 \ base_dir=$cramming_base_dir \ data=arithmetic \ max_rec=1 \ sort_reverse=True \ data.sources.arithmetic.tokenizer_type='sort' \ max_size_given=31 \ start_ind_1_given=1 \ start_ind_2_given=1 python arithmetic_eval_quicker.py \ name=addition_abacus_run_1 \ big_eval_step_1=True \ extended_eval=True \ reverse_inputs=True \ checkerboard=even \ remove_padding=True ``` -------------------------------- ### Positional Embeddings Configuration (Shell) Source: https://github.com/mcleish7/arithmetic/blob/main/README.md Demonstrates how to configure different positional embedding strategies (Absolute: Learned, Abacus; Relative: NoPE, FIRE, FIRE randomised, RoPE) using command-line flags. ```shell arch.embedding.pos_embedding=learned arch.embedding.pos_embedding=abacus arch.embedding.max_abacus_len=100 arch.embedding.pos_embedding=None arch.attention.type="self-attention" arch.attention.rotary_embedding="fire" arch.embedding.pos_embedding=None arch.attention.type="self-attention" arch.attention.rotary_embedding="fire" arch.attention.max_length=128 arch.attention.type="self-attention" arch.attention.rotary_embedding=true ``` -------------------------------- ### Initialize and Use Abacus Embeddings in Python Source: https://context7.com/mcleish7/arithmetic/llms.txt Demonstrates how to initialize Abacus embeddings, which are a novel positional encoding technique for transformers. It shows the forward pass for generating embeddings and how to switch between training and evaluation modes. ```python import torch from abacus import Abacus # Initialize Abacus embeddings with digit tokens from your tokenizer # digit_tokens = tokenizer.convert_tokens_to_ids(['0','1','2','3','4','5','6','7','8','9']) digit_tokens = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] # Example token IDs for digits 0-9 abacus = Abacus( digit_tokens=digit_tokens, embedding_dim=1024, max_seq_length=1024, max_k=99 # Maximum random shift during training ) # Forward pass with input token IDs # For input "123+456=579", digit positions are tracked independently input_ids = torch.tensor([[5, 6, 7, 14, 8, 9, 10, 17, 9, 11, 13]]) # Example tokenized arithmetic embeddings = abacus(input_ids) # Shape: [batch_size, seq_length, embedding_dim] print(f"Embedding shape: {embeddings.shape}") # Output: Embedding shape: torch.Size([1, 11, 1024]) # During training, random offset k is applied for robustness abacus.train() train_embeddings = abacus(input_ids) # During evaluation, no random offset is applied abacus.eval() eval_embeddings = abacus(input_ids) ``` -------------------------------- ### Configure Positional Embeddings Source: https://context7.com/mcleish7/arithmetic/llms.txt Shows how to modify the architecture configuration to support various positional embedding strategies, including Abacus, RoPE, and FIRE, which are critical for arithmetic generalization. ```python # Learned Positional Embeddings cfg_arch.embedding.pos_embedding = "learned" # Abacus Embeddings cfg_arch.embedding.pos_embedding = "abacus" cfg_arch.embedding.max_abacus_len = 100 # RoPE (Rotary Position Embeddings) cfg_arch.embedding.pos_embedding = None cfg_arch.attention.type = "self-attention" cfg_arch.attention.rotary_embedding = True # Combined: Abacus + FIRE cfg_arch.embedding.pos_embedding = "abacus" cfg_arch.attention.type = "self-attention" cfg_arch.attention.rotary_embedding = "fire" cfg_arch.attention.max_length = 128 ``` -------------------------------- ### Evaluate Arithmetic Models with Grid Analysis Source: https://context7.com/mcleish7/arithmetic/llms.txt Executes systematic evaluation of trained models across different operand lengths using arithmetic_eval_quicker.py. Supports grid-based testing, parallelized job splitting, and specific arithmetic operations like multiplication. ```bash python arithmetic_eval_quicker.py \ name=addition_abacus_run_1 \ data=arithmetic \ max_rec=1 \ token_limit=105 \ greedy=True python arithmetic_eval_quicker.py \ name=multiplication_run_1 \ mul=True \ token_limit=30 ``` -------------------------------- ### Iterate and Decode Training Data Source: https://context7.com/mcleish7/arithmetic/llms.txt This snippet demonstrates how to iterate through a PyTorch DataLoader to inspect training batches. It extracts input IDs and uses a tokenizer to decode samples, providing a quick way to verify data formatting and model inputs. ```python for batch_idx, batch in enumerate(dataloaders["train"]): input_ids = batch["input_ids"] print(f"Batch {batch_idx}: shape {input_ids.shape}") # Decode sample sample = tokenizer.decode(input_ids[0].tolist()) print(f"Sample: {sample}") if batch_idx >= 2: break ``` -------------------------------- ### BibTeX Citation Source: https://github.com/mcleish7/arithmetic/blob/main/README.md The BibTeX entry for citing the research paper 'Transformers Can Do Arithmetic with the Right Embeddings'. ```bibtex @article{mcleish2024transformers, title={Transformers Can Do Arithmetic with the Right Embeddings}, author={Sean McLeish and Arpit Bansal and Alex Stein and Neel Jain and John Kirchenbauer and Brian R. Bartoldson and Bhavya Kailkhura and Abhinav Bhatele and Jonas Geiping and Avi Schwarzschild and Tom Goldstein}, journal={arXiv preprint arXiv:2405.17399}, year={2024} } ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.