### Quickstart MoA Setup and Execution

Source: https://github.com/togethercomputer/moa/blob/main/README.md

Provides commands to install the Together Python library, set the API key, and run the basic MoA example script (`moa.py`). This allows users to quickly get a simple Mixture-of-Agents application running.

```Bash
pip install together
export TOGETHER_API_KEY=
python moa.py
```

--------------------------------

### Install alpaca-eval Python Package

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Instructions for installing the `alpaca-eval` library using pip, covering both stable releases from PyPI and nightly versions directly from the GitHub repository.

```bash
pip install alpaca-eval
```

```bash
pip install git+https://github.com/tatsu-lab/alpaca_eval
```

--------------------------------

### Install AlpacaEval with All LLM Provider Dependencies

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command installs the core AlpacaEval package along with all optional dependencies required to integrate with various large language model providers, including OpenAI, Anthropic, Cohere, and HuggingFace, ensuring full functionality for `fn_completions`.

```bash
pip install alpaca_eval[all]
```

--------------------------------

### Install FastChat Dependencies and Environment

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/webserver.md

This snippet outlines the steps to prepare a Linux environment for FastChat. It includes updating system packages, installing essential tools like tmux and htop, setting up Anaconda, creating and activating a dedicated Python environment, cloning the FastChat repository, and installing its Python dependencies.

```bash
sudo apt update
sudo apt install tmux htop

wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
bash Anaconda3-2022.10-Linux-x86_64.sh

conda create -n fastchat python=3.9
conda activate fastchat

git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip3 install -e .
```

--------------------------------

### Interactive CLI Chatbot Demo Setup

Source: https://github.com/togethercomputer/moa/blob/main/README.md

Instructions to set up and launch an interactive multi-turn chatbot demo using MoA. This involves exporting the API key, installing necessary dependencies, and running the main script to start the conversational interface.

```Bash
export TOGETHER_API_KEY={your_key}
pip install -r requirements.txt
python bot.py
```

--------------------------------

### Install AlpacaEval from Source for Development

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command installs the AlpacaEval package in editable mode directly from the source directory. It is typically used by developers who have forked the repository and wish to make local modifications and test them without reinstalling.

```bash
pip install -e .
```

--------------------------------

### Launch FastChat Controller, Workers, and Web Server

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/webserver.md

This section provides commands to initiate various FastChat components. It covers starting the main controller, registering a Hugging Face API worker, setting up environment variables for OpenAI, Anthropic, and GCP API keys, and finally launching the Gradio web server with multiple model integrations and logging configurations.

```bash
cd fastchat_logs/controller
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001
python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https://
python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001

cd fastchat_logs/server0

python3 -m fastchat.serve.huggingface_api_worker --model-info-file ~/elo_results/register_hf_api_models.json

export OPENAI_API_KEY=
export ANTHROPIC_API_KEY=
export GCP_PROJECT_ID=

python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 50 --add-chatgpt --add-claude --add-palm --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms

python3 backup_logs.py
```

--------------------------------

### Install Python Package Build and Distribution Dependencies

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/pypi.md

Installs the required Python packages for building and distributing a project. This includes 'twine' for interacting with PyPI, 'build' for creating distribution archives, and ensures 'pip' is up-to-date.

```bash
python3 -m pip install twine
python3 -m pip install --upgrade pip
pip3 install build
```

--------------------------------

### Example: Create a New Leaderboard with Alpaca Eval

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This example demonstrates how to use the `alpaca_eval make_leaderboard` command to generate a new leaderboard. It specifies the path where the leaderboard will be saved and the path to the model outputs to be included in the evaluation. This command is useful for consolidating evaluation results into a single leaderboard.

```bash
alpaca_eval make_leaderboard \
  --leaderboard_path <path_to_save_leaderboard> \
  --all_model_outputs <model_outputs_path>
```

--------------------------------

### Example FastChat CLI Command for Vicuna Model

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/model_support.md

Provides a concrete example of running the FastChat CLI with a specific pre-trained model, 'lmsys/vicuna-7b-v1.5'. This command can be used to learn the code logic and verify basic functionality.

```bash
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
```

--------------------------------

### BibTeX Entry for AlpacaEval General Benchmark

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This BibTeX entry provides citation details for the AlpacaEval project, covering its code, results, and general benchmark. It includes authors, title, year, publisher, journal, and a URL for the GitHub repository.

```BibTeX
@misc{alpaca_eval,
  author = {Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {AlpacaEval: An Automatic Evaluator of Instruction-following Models},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/alpaca_eval}}
}
```

--------------------------------

### Example AlpacaEval Annotation Output with Chain of Thought

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This JSON object illustrates a complete annotation record from AlpacaEval, showcasing the `preference` score, the raw outputs (`output_1`, `output_2`), and the detailed `raw_completion` which includes `concise_explanation` (chain of thought) and `ordered_models`. It also demonstrates how `referenced_models` maps temporary model names to actual outputs for clarity.

```json
{
  "instruction": "How did US states get their names?",
  "output_1": "The names of U.S. states are derived from a variety of languages and sources, including indigenous American languages, European languages (primarily English, Spanish, and French), as well as other historical influences. Here is a brief overview of how some states got their names:\n\n1. **Indigenous American Languages**: Many state names come from Native American words or names of tribes, often from the languages of the indigenous peoples who lived in or near those areas.\n   - **Alabama**: Possibly derived from the Choctaw language, meaning \"thicket clearers.\"\n   - **Connecticut**: From a Mohegan-Pequot word meaning \"long tidal river.\"\n   - **Massachusetts**: [...]",
  "generator_1": "gpt4_1106_preview",
  "dataset": "helpful_base",
  "output_2": "The names of the 50 U.S. states come from a variety of sources, including Native American languages, European languages, and historical figures. Here's a brief overview of how some states got their names:\n\n1. Native American origins: Many states have names derived from Native American languages. For example, Alabama comes from the Choctaw word \"Albah amo,\" meaning \"plant gatherers\" or \"herb gatherers.\" Similarly, the name Mississippi comes from the Ojibwe word \"Misi-ziibi,\" meaning \"great river.\"\n\n2. European languages: [...].",
  "generator_2": "gpt4",
  "annotator": "alpaca_eval_cot_gpt4_turbo_fn",
  "preference": 1.0,
  "raw_completion": {
    "concise_explanation": "Model M provided a more detailed and structured response, including bold headings for each category and a wider range of examples. It also included additional categories such as 'Other European Languages' and 'Combination of Languages and Influences', which added depth to the explanation. Model m's response was accurate but less comprehensive and lacked the clear structure found in Model M's output.",
    "ordered_models": [
      {
        "model": "M",
        "rank": 1
      },
      {
        "model": "m",
        "rank": 2
      }
    ]
  },
  "referenced_models": {
    "M": "output_1",
    "m": "output_2"
  }
}
```

--------------------------------

### Prepare Evaluation Dataset with HuggingFace Datasets

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This Python snippet demonstrates how to load an evaluation dataset, specifically "tatsu-lab/alpaca_eval", and populate the 'output' field for each example. It serves as a placeholder for integrating a model's generation logic, typically used when preparing data for evaluation.

```python
import datasets

eval_set = datasets.load_dataset("tatsu-lab/alpaca_eval", "alpaca_eval")["eval"]
for example in eval_set:
    # generate here is a placeholder for your models generations
    example["output"] = generate(example["instruction"])
```

--------------------------------

### AlpacaEval Leaderboard Generation Command Arguments

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Describes the command-line arguments used for generating or updating the AlpacaEval leaderboard. These arguments specify paths for model outputs, reference data, and annotator configuration files.

```APIDOC
Command: [Implied, e.g., alpaca_eval generate_leaderboard]
Arguments:
  --leaderboard_path:
    Description: Path to save the leaderboard to. The leaderboard will be saved as a csv file, if it already exists it will append.
  --all_model_outputs:
    Description: The json path to the outputs of all models to add to the leaderboard (as a single file or by globbing multiple files). Each dictionary should contain the keys (`instruction` and `output`) that are formatted in the prompts and a column `generator` with the name of the current model.
    Example: https://huggingface.co/datasets/tatsu-lab/alpaca_eval/blob/main/alpaca_eval_all_outputs.json
  --reference_outputs:
    Description: The path to the outputs of the reference model. Each dictionary should contain the keys (`instruction` and `output`) that are formatted in the prompts. By default, the reference outputs are the 003 outputs on AlpacaEval set.
  --annotators_config:
    Description: The path to the annotator's config file. Defaults to `alpaca_eval_gpt4`.
```

--------------------------------

### Install and Configure AlpacaEval 2.0

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

AlpacaEval 2.0 is the default version when installing `alpaca_eval==0.5`. This snippet provides the installation command and shows how to set an environment variable to revert to older configurations if needed, allowing flexibility in version usage.

```bash
pip install alpaca_eval==0.5
```

```bash
export IS_ALPACA_EVAL_2=False
```

--------------------------------

### Install FastChat with pip

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md

This command installs FastChat and its essential components, including the model worker and web UI, directly using pip. It's the quickest way to get started with FastChat.

```bash
pip3 install "fschat[model_worker,webui]"
```

--------------------------------

### BibTeX Entry for AlpacaFarm Simulation Framework

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This BibTeX entry provides citation details for the AlpacaFarm project, a simulation framework designed for methods that learn from human feedback. It includes authors, title, year, and arXiv preprint information with its primary classification.

```BibTeX
@misc{dubois2023alpacafarm,
  title={AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback}, 
  author={Yann Dubois and Xuechen Li and Rohan Taori and Tianyi Zhang and Ishaan Gulrajani and Jimmy Ba and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto},
  year={2023},
  eprint={2305.14387},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

--------------------------------

### alpaca-eval Command Line Interface (CLI) Reference

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Comprehensive documentation for the `alpaca-eval` CLI tool, detailing its main commands and important parameters for configuring model evaluation and output handling. It outlines the purpose of each command and the expected input/output for key parameters.

```APIDOC
alpaca_eval CLI:
  SYNOPSIS: alpaca_eval COMMAND

  COMMANDS:
    evaluate:
      Description: Evaluate a model based on its outputs. This is the default entrypoint if no command is specified.
    evaluate_from_model:
      Description: Evaluate a model from HuggingFace or an API provider. This is a wrapper around `evaluate` which includes generating from a desired model.
    make_leaderboard:
      Description: Precompute and save an entire leaderboard for a given dataset / evaluator / set of models generations.
    analyze_evaluators:
      Description: Analyze an evaluator and populates the evaluators leaderboard (agreement with human, speed, price,...).

  IMPORTANT PARAMETERS:
    model_outputs:
      Type: string (path to JSON file)
      Description: A path to a json file for the outputs of the model to add to the leaderboard. Each dictionary should contain the keys `instruction` and `output`.
    annotators_config:
      Type: string (annotator name)
      Description: The annotator to use. Recommended: `weighted_alpaca_eval_gpt4_turbo` (default for AlpacaEval 2.0), which has a high agreement rate with human annotation data, large context size, and is pretty cheap. For a comparison of all annotators, see the evaluators section.
    reference_outputs:
      Type: string (path to JSON file)
      Description: The outputs of the reference model. Same format as `model_outputs`. Default: `gpt4_turbo` for AlpacaEval 2.0.
    output_path:
      Type: string (path)
      Description: Path for saving annotations and leaderboard.
```

--------------------------------

### Install ExllamaV2 Library

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/exllama_v2.md

Commands to clone and install the ExllamaV2 library, which provides optimized kernels for faster GPTQ inference when integrated with Fastchat.

```bash
git clone https://github.com/turboderp/exllamav2
cd exllamav2
pip install -e .
```

--------------------------------

### Download LangChain Example Text File

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/langchain_integration.md

Downloads a sample text file ('state_of_the_union.txt') from the LangChain GitHub repository, which serves as input for the question-answering example.

```bash
wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt
```

--------------------------------

### Configuration Parameters for SinglePairwiseAnnotator

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This section details the key parameters used in the `SinglePairwiseAnnotator` configuration. It covers settings for prompt templates, functions for parsing completions, arguments for these functions, and options for output randomization and batch processing.

```APIDOC
Parameters
----------
prompt_template : path
    A prompt that will be given to `fn_prompter` or path to the prompts. Path is relative to
    `evaluators_configs/`

fn_completion_parser : callable or str
    Function in `completion_parsers.py` to use for parsing the completions into preferences. For each completion,
    the number of preferences should be equal to the batch_size if not we set all the preferences in that batch to
    NaN.

completion_parser_kwargs : dict
    Kwargs for fn_completion_parser.

fn_completions : callable or str
    Function in `decoders.py` to use for decoding the output.

completions_kwargs : dict
    kwargs for fn_completions. E.g. model_name, max_tokens, temperature, top_p, top_k, stop_seq.

is_randomize_output_order : bool
    Whether to randomize output_1, output_2 when formatting.

batch_size : int
    Number of examples that will be added in a single prompt.
```

--------------------------------

### Run alpaca-eval for Model Output Evaluation

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Demonstrates how to execute `alpaca-eval` from the command line, including setting the OpenAI API key and specifying model outputs. This command processes outputs, prints a leaderboard to the console, and saves both the leaderboard and annotations to the specified directory.

```bash
export OPENAI_API_KEY=<your_api_key> # for more complex configs, e.g. using Azure or switching clients see client_configs/README.md 
alpaca_eval --model_outputs 'example/outputs.json'
```

--------------------------------

### Git Commands for Adding a New Model Contribution

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

These Git commands are used to stage and commit the necessary files for a new model contribution to the AlpacaEval repository. This includes the model configuration, generated outputs, and updated leaderboard entries.

```sh
git add src/alpaca_eval/models_configs/<model_name>
git add src/alpaca_eval/leaderboards/ 
git add -f results/<model_name>/model_outputs.json
git add -f results/<model_name>/*/annotations.json
git commit -m "Add <model_name> to AlpacaEval"
git push
```

--------------------------------

### Evaluate Model with AlpacaEval CLI

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command initiates the evaluation process for a specified model configuration using the AlpacaEval command-line interface. It runs inference on the model to produce outputs on the evaluation set.

```sh
alpaca_eval evaluate_from_model \
  --model_configs 'falcon-7b-instruct'
```

--------------------------------

### Prepare Environment for MoA Evaluation Benchmarks

Source: https://github.com/togethercomputer/moa/blob/main/README.md

Commands to install all necessary Python packages and set up API keys for Together AI and OpenAI. These steps are crucial for preparing the environment to run various evaluation benchmarks like AlpacaEval, MT-Bench, and FLASK.

```Bash
pip install -r requirements.txt
cd alpaca_eval
pip install -e .
cd FastChat
pip install -e ".[model_worker,llm_judge]"
cd ..
export TOGETHER_API_KEY=<TOGETHER_API_KEY>
export OPENAI_API_KEY=<OPENAI_API_KEY>
```

--------------------------------

### Recompute Length-Controlled Win Rates with AlpacaEval

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command allows users to recompute length-controlled win rates on previously annotated models. It involves installing or updating the `alpaca_eval` package and then executing the evaluation script with specific parameters to enable metric recomputation.

```bash
pip install -U alpaca_eval
alpaca_eval --model_outputs … --is_recompute_metrics_only True
```

--------------------------------

### Check FastChat Server Launch Time

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/webserver.md

This command helps verify the successful launch of the Gradio web server by searching its logs for the 'Running on local URL' message. It iterates through multiple server logs and extracts the last occurrence of the message.

```bash
for i in $(seq 0 11); do cat fastchat_logs/server$i/gradio_web_server.log | grep "Running on local URL" | tail -n 1; done
```

--------------------------------

### Launch Chatbot Arena Vision Components and Gradio Server

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/arena.md

This set of Python commands initiates the core components required for the Chatbot Arena vision environment. It covers starting the central controller, launching SGLang model workers for vision-language models, and running the Gradio web server with advanced features like API endpoint registration, vision arena mode, remote storage, and random question sampling. It also includes a command to prepare VQA example data.

```Python
python3 -m fastchat.serve.controller
```

```Python
python3 -m fastchat.serve.sglang_worker --model-path <model-path> --tokenizer-path <tokenizer-path>
```

```Python
python3 -m fastchat.serve.gradio_web_server_multi --share --register-api-endpoint-file api_endpoints.json --vision-arena --use-remote-storage --random-questions metadata_sampled.json
```

```Python
python fastchat/serve/vision/create_vqa_examples_dir.py
```

--------------------------------

### Launch FastChat OpenAI-Compatible API Server Components

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/openai_api.md

Steps to start the FastChat controller, model worker, and the OpenAI-compatible API server locally. This sets up the necessary backend services for the API.

```bash
python3 -m fastchat.serve.controller
```

```bash
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5
```

```bash
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
```

--------------------------------

### AlpacaEval Command-Line Interface Reference

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This section details the command-line arguments available for the `alpaca_eval` tool, used for evaluating models. It covers positional arguments like `MODEL_CONFIGS` and various flags for specifying reference models, evaluation datasets, annotators, output paths, and generation limits.

```APIDOC
POSITIONAL ARGUMENTS
    MODEL_CONFIGS
        Type: Union
        Description: A dictionary or path (relative to `models_configs`) to a yaml file containing the configuration of the model to decode from. If a directory, we search for 'configs.yaml' in it. The keys in the first dictionary should be the generator's name, and the value should be a dictionary of the generator's configuration.

FLAGS
    -r, --reference_model_configs=REFERENCE_MODEL_CONFIGS
        Type: Optional[Union]
        Default: None
        Description: Same as in `model_configs` but for the reference model. If None, we use the default Davinci003 outputs.
    -e, --evaluation_dataset=EVALUATION_DATASET
        Type: Union
        Default: <func...>
        Description: Path to the evaluation dataset or a function that returns a dataframe. If None, we use the default evaluation.
    -a, --annotators_config=ANNOTATORS_CONFIG
        Type: Union
        Default: 'alpaca_eval_gpt4_turbo_fn'
        Description: Path to the annotators configuration or a dictionary. If None, we use the default annotators configuration.
    -o, --output_path=OUTPUT_PATH
        Type: Union
        Default: 'auto'
        Description: Path to save the generations, annotations and leaderboard. If auto saves at `results/<model_name>`.
    -m, --max_instances=MAX_INSTANCES
        Type: Optional[int]
        Default: None
        Description: Maximum number of instances to generate and evaluate. If None, we evaluate all instances.
    --is_strip_output=IS_STRIP_OUTPUT
        Type: bool
        Default: True
        Description: Whether to strip trailing and leading whitespaces from the outputs.
    --is_load_outputs=IS_LOAD_OUTPUTS
        Type: bool
        Default: True
        Description: Whether to try to load outputs from the output path. If True and outputs exist we only generate outputs for instructions that don't have outputs yet.
    -c, --chunksize=CHUNKSIZE
        Type: int
        Default: 64
        Description: Number of instances to generate before saving. If None, we save after all generations.
    Additional flags are accepted.
        Description: Other kwargs to `evaluate`.
```

--------------------------------

### Alpaca Eval make_leaderboard Command Reference

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Detailed documentation for the `alpaca_eval make_leaderboard` command, which is used to precompute and save an entire leaderboard. This section outlines the command's synopsis, description, and all available flags, including their types, default values, and purposes. It covers parameters like `leaderboard_path`, `annotators_config`, `all_model_outputs`, and `reference_outputs`.

```APIDOC
NAME
    alpaca_eval make_leaderboard - Precompute and save an entire leaderboard for a given dataset / evaluator / set of models generations.

SYNOPSIS
    alpaca_eval make_leaderboard <flags>

DESCRIPTION
    Precompute and save an entire leaderboard for a given dataset / evaluator / set of models generations.

FLAGS
    --leaderboard_path=LEADERBOARD_PATH
        Type: Optional[Union]
        Default: None
        The path to save the leaderboard to. The leaderboard will be saved as a csv file, if it already exists it will
    --annotators_config=ANNOTATORS_CONFIG
        Type: Union
        Default: 'alpaca_eval_gpt4_turbo_fn'
        The path the (or list of dict of) the annotator's config file.
    --all_model_outputs=ALL_MODEL_OUTPUTS
        Type: Union
        Default: <fu...
        The outputs of all models to add to the leaderboard. Accepts data (list of dictionary, pd.dataframe, datasets.Dataset) or a path to read those (json, csv, tsv potentially with globbing) or a function to generate those. If the path contains a globbing pattern, we will read all files matching the pattern and concatenate them. Each dictionary (or row of dataframe) should contain the keys that are formatted in the prompts. E.g. by default `instruction` and `output` with optional `input`. It should also contain a column `generator` with the name of the current model.
    -r, --reference_outputs=REFERENCE_OUTPUTS
        Type: Union
        Default: <func...
        The outputs of the reference model. Same format as `all_model_outputs` but without needing `generator`. By default, the reference outputs are the 003 outputs on AlpacaEval set.
    -f, --fn_add_to_leaderboard=FN_ADD_TO_LEADERBOARD
        Type: Callable
        Default: 'evaluate'
        The function to use to add a model to the leaderboard. If a string, it should be the name of a function in `main.py`. The function should take the arguments: `model_outputs`, `annotators_config`, `name`, `precomputed_leaderboard`, `is_return_instead_of_print`, `reference_outputs`.
    --leaderboard_mode=LEADERBOARD_MODE
        Type: str
        Default: 'verified'
        The mode of the leaderboard to save all new entries with.
    -i, --is_return_instead_of_print=IS_RETURN_INSTEAD_OF_PRINT
        Type: bool
        Default: False
        Whether to return the metrics instead of printing the results.
    Additional flags are accepted.
        Additional arguments to pass to `fn_add_to_leaderboard`.
```

--------------------------------

### Execute Advanced Multi-layer MoA Example

Source: https://github.com/togethercomputer/moa/blob/main/README.md

Command to run the `advanced-moa.py` script, demonstrating how to implement MoA with three or more layers for more refined responses, showcasing the scalability of the MoA architecture.

```Bash
python advanced-moa.py
```

--------------------------------

### Install GPTQ-for-LLaMa and Quantization Dependencies

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/gptq.md

This script sets up the necessary environment for GPTQ 4-bit inference. It clones the GPTQ-for-LLaMa repository, switches to the recommended branch (fastest-inference-4bit for Linux, old-cuda for Windows), and installs the `quant-cuda` package along with `texttable` within the FastChat virtual environment.

```bash
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git repositories/GPTQ-for-LLaMa
cd repositories/GPTQ-for-LLaMa
git switch fastest-inference-4bit
python3 setup_cuda.py install
pip3 install texttable
```

--------------------------------

### Execute FLASK Benchmark for MoA

Source: https://github.com/togethercomputer/moa/blob/main/README.md

Command to run a minimal example of the FLASK evaluation, which helps in assessing the MoA model's capabilities on specific tasks and evaluating its factual knowledge and reasoning.

```Bash
bash run_eval_flask.sh
```

--------------------------------

### Install Nginx on Linux Distributions

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/fastchat/serve/gateway/README.md

Commands to install the Nginx web server on common Debian-based (e.g., Ubuntu) and Red Hat-based (e.g., CentOS, Fedora) Linux distributions.

```bash
sudo apt update
sudo apt install nginx
```

```bash
sudo yum install epel-release
sudo yum install nginx
```

--------------------------------

### Generate AlpacaEval Leaderboard CSV

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command generates a leaderboard CSV file for AlpacaEval. It requires specifying the output path for the leaderboard, the JSON file containing all model outputs, and a reference outputs file for comparison.

```bash
alpaca_eval make_leaderboard \
  --leaderboard_path <src/alpaca_eval/leaderboards/data_AlpacaEval/your_leaderboard_name.csv> \
  --all_model_outputs alpaca_eval_all_outputs.json \
  --reference_outputs <path_to_json_file>
```

--------------------------------

### Launch Fastchat Model Worker with xFasterTransformer Integration

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/xFasterTransformer.md

These commands illustrate the process of starting the Fastchat model worker with xFasterTransformer enabled for efficient model serving. The examples include standard deployment, NUMA-optimized execution for multi-socket systems, and MPI-based distributed setups. The model worker loads the specified model and applies the chosen data type for inference operations.

```bash
# Load model with default configuration (max sequence length 4096, no GPU split setting).
python3 -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16
```

```bash
#run inference on numanode 0 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16)
numactl -N 0  --localalloc python3 -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16
```

```bash
#run inference on numanode 0 and 1 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16)
OMP_NUM_THREADS=$CORE_NUM_PER_SOCKET LD_PRELOAD=libiomp5.so mpirun \
-n 1 numactl -N 0  --localalloc  python -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16 : \
-n 1 numactl -N 1  --localalloc  python -m fastchat.serve.model_worker \
    --model-path /path/to/models \
    --enable-xft \
    --xft-dtype bf16_fp16
```

--------------------------------

### Launch FastChat LightLLM Worker

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/lightllm_integration.md

Commands to launch the FastChat model worker using LightLLM for optimized inference. These examples demonstrate how to specify the model path, tokenizer mode, and the crucial `--max_total_token_num` argument for continuous batching. An additional example shows how to enable quantized weight and KV cache for inference.

```Python
python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000
```

```Python
python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000 --mode triton_int8weight triton_int8kv
```

--------------------------------

### Initialize FastChat Controller on node-01

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/local_cluster.md

Starts the FastChat controller service, which acts as the central orchestrator for managing distributed worker nodes and model serving. This command should be executed on the designated controller node.

```bash
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 10002
```

--------------------------------

### Analyze Evaluators with AlpacaEval CLI

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command-line tool allows users to analyze their custom evaluators and integrate them into the AlpacaEval leaderboard. It supports detailed bias and variance estimation through multi-seed evaluation, with an option for a quicker single-seed run.

```bash
alpaca_eval analyze_evaluators --annotators_config '<path_to_config.yaml>'
```

--------------------------------

### Example Prompt Template for Vicuna v0

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/vicuna_weights_version.md

Illustrates the conversation format for Vicuna model version 0. This version uses `### Human:` and `### Assistant:` as explicit turn separators within the prompt.

```text
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: Hello!
### Assistant: Hello!
### Human: How are you?
### Assistant: I am good.
```

--------------------------------

### Generate Leaderboard for New Evaluator

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command creates a new leaderboard CSV file for a custom annotator (evaluator) within the AlpacaEval project. It uses a specified annotator configuration and a dataset of all model outputs to populate the leaderboard.

```bash
alpaca_eval make_leaderboard \
  --leaderboard_path src/alpaca_eval/leaderboards/data_AlpacaEval/<evaluator>_leaderboard.csv \
  --all_model_outputs alpaca_eval_all_outputs.json \
  --annotators_config <evaluator_config>
```

--------------------------------

### Run AlpacaEval with Pre-decoded Outputs

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This Bash command shows how to execute `alpaca_eval` when model outputs are already available in a file. It specifies the path to the model outputs and the annotator configuration to use for evaluation, streamlining the evaluation process for pre-generated results.

```bash
alpaca_eval --model_outputs 'example/outputs.json' \
  --annotators_config 'alpaca_eval_gpt4_turbo_fn'
```

--------------------------------

### Install Python Dependencies for Vicuna Training

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md

This command installs the necessary Python packages required for the Vicuna training environment. The '-e ".[train]"' flag installs the package in editable mode and includes additional dependencies specified for the 'train' extra.

```bash
pip3 install -e ".[train]"
```

--------------------------------

### CLI: Evaluate Model Outputs with alpaca_eval

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Provides command-line arguments and options for the `alpaca_eval evaluate` command, used to assess model performance based on generated outputs against a reference. It details parameters for model outputs, reference data, annotator configuration, naming, output paths, precomputed leaderboards, and various evaluation metrics.

```APIDOC
NAME
    alpaca_eval evaluate - Evaluate a model based on its outputs. This is the default entrypoint if no command is specified.

SYNOPSIS
    alpaca_eval evaluate <flags>

DESCRIPTION
    Evaluate a model based on its outputs. This is the default entrypoint if no command is specified.

FLAGS
    --model_outputs=MODEL_OUTPUTS
        Type: Optional[Union]
        Default: None
        The outputs of the model to add to the leaderboard. Accepts data (list of dictionary, pd.dataframe, datasets.Dataset) or a path to read those (json, csv, tsv) or a function to generate those. Each dictionary (or row of dataframe) should contain the keys that are formatted in the prompts. E.g. by default `instruction` and `output` with optional `input`. If None, we just print the leaderboard.
    -r, --reference_outputs=REFERENCE_OUTPUTS
        Type: Union
        Default: <func...
        The outputs of the reference model. Same format as `model_outputs`. If None, the reference outputs are a specific set of Davinci 003 outputs on the AlpacaEval set:
    --annotators_config=ANNOTATORS_CONFIG
        Type: Union
        Default: 'alpaca_eval_gpt4_turbo_fn'
        The path the (or list of dict of) the annotator's config file. For details see the docstring of `PairwiseAnnotator`.
    -n, --name=NAME
        Type: Optional[Optional]
        Default: None
        The name of the model to add to the leaderboard. If None we check if `generator is in model_outputs` if not we use "Current model".
    -o, --output_path=OUTPUT_PATH
        Type: Union
        Default: 'auto'
        Path to the directory where the new leaderboard and the annotations should be stored. If None we don't save. If `auto` we use `model_outputs` if it is a path, and otherwise use the directory from which we call the script.
    -p, --precomputed_leaderboard=PRECOMPUTED_LEADERBOARD
        Type: Union
        Default: 'auto'
        The precomputed leaderboard or a path to it (json, csv, or tsv). The leaderboard should contain at least the column `win_rate`. If `auto` we will try to use the corresponding leaderboard for the reference outputs (only if in CORRESPONDING_OUTPUTS_LEADERBOARDS). If `None` we won't add other models from the leaderboard.
    --is_overwrite_leaderboard=IS_OVERWRITE_LEADERBOARD
        Type: bool
        Default: False
        Whether to overwrite the leaderboard if the model is already in it.
    -l, --leaderboard_mode_to_print=LEADERBOARD_MODE_TO_PRINT
        Type: Optional
        Default: 'minimal'
        The mode of the leaderboard to use. Only used if the precomputed leaderboard has a column `mode`, in which case it will filter the leaderboard by this mode. If None keeps all.
    -c, --current_leaderboard_mode=CURRENT_LEADERBOARD_MODE
        Type: str
        Default: 'community'
        The mode of the leaderboard for the current method.
    --is_return_instead_of_print=IS_RETURN_INSTEAD_OF_PRINT
        Type: bool
        Default: False
        Whether to return the metrics instead of printing the results.
    -f, --fn_metric=FN_METRIC
        Type: Union
        Default: 'pairwise_to_winrate'
        The function or function name in `metrics.py` that will be used to convert preference to metrics. The function should take a sequence of preferences (0 for draw, 1 for base win, 2 when the model to compare wins) and return a dictionary of metrics and the key by which to sort the leaderboard.
    -s, --sort_by=SORT_BY
        Type: str
        Default: 'win_rate'
        The key by which to sort the leaderboard.
    --is_cache_leaderboard=IS_CACHE_LEADERBOARD
        Type: Optional[Optional]
        Default: None
        Whether to save the result leaderboard to `precomputed_leaderboard`. If None we save only if max_instances not None. A preferred way of adding models to the leaderboard is to set `precomputed_leaderboard` to the previously saved leaderboard at `<output_path>/leaderboard.csv`.
    --max_instances=MAX_INSTANCES
        Type: Optional[Optional]
        Default: None
        The maximum number of instances to annotate. Useful for testing.
    --annotation_kwargs=ANNOTATION_KWARGS
        Type: Optional[Optional]
        Default: None
        Additional arguments to pass to `PairwiseAnnotator.annotate_head2head`.
    -A, --Annotator=ANNOTATOR
        Default: <class 'alpaca_eval.annotators.pairwise_evaluator.PairwiseAn...
        The annotator class to use.
    Additional flags are accepted.
        Additional arguments to pass to `PairwiseAnnotator`.
```

--------------------------------

### Evaluate Language Models with Alpaca Eval

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This command initiates the evaluation of a language model using Alpaca Eval. It requires specifying the model configuration and the annotator configuration, such as GPT-4 Turbo, to assess model performance. The `model_configs` parameter points to a directory defining the model's prompt, provider, and decoding parameters.

```bash
alpaca_eval evaluate_from_model \
  --model_configs 'oasst_pythia_12b' \
  --annotators_config 'alpaca_eval_gpt4_turbo_fn'
```

--------------------------------

### Serve Vicuna-7b-v1.5 with FastChat CLI

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/model_support.md

This command initiates the FastChat command-line interface to serve the 'lmsys/vicuna-7b-v1.5' large language model. It provides a direct way to interact with the Vicuna model from the command line.

```bash
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5
```

--------------------------------

### BibTeX Entry for Length-Controlled AlpacaEval

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

This BibTeX entry provides citation details for the 'Length-Controlled AlpacaEval' paper, which introduces a simple method to debias automatic evaluators. It includes authors, title, journal (arXiv preprint), and publication year.

```BibTeX
@article{dubois2024length,
  title={Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators},
  author={Dubois, Yann and Galambosi, Bal{\'a}zs and Liang, Percy and Hashimoto, Tatsunori B},
  journal={arXiv preprint arXiv:2404.04475},
  year={2024}
}
```

--------------------------------

### Install Tunnelmole on Linux/macOS

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/third_party_ui.md

This command downloads and executes the Tunnelmole installation script. It's designed for Linux and macOS environments. For Windows, a separate executable download is required.

```Bash
curl -O https://install.tunnelmole.com/9Wtxu/install && sudo bash install
```

--------------------------------

### CLI: Evaluate Model from HuggingFace/API with alpaca_eval

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md

Details the command-line arguments for `alpaca_eval evaluate_from_model`, a wrapper around `evaluate` that includes generating outputs from a specified model (HuggingFace or API). This command simplifies the evaluation workflow by integrating model generation directly.

```APIDOC
NAME
    alpaca_eval evaluate_from_model - Evaluate a model from HuggingFace or an API provider. This is a wrapper around `evaluate` which includes generating from a desired model.

SYNOPSIS
    alpaca_eval evaluate_from_model MODEL_CONFIGS <flags>
```

--------------------------------

### Run FastChat CLI with Custom Local Model

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/model_support.md

Demonstrates the basic command to run the FastChat command-line interface, specifying a placeholder for your local model's path. This is the primary command to test new model integrations.

```bash
python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
```

--------------------------------

### Install AWQ and FastChat for 4-bit LLM Inference

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/awq.md

This sequence of bash commands sets up a dedicated Conda environment for FastChat and AWQ, installs FastChat, clones the AWQ repository, installs the AWQ Python package, and compiles the necessary CUDA kernels for optimized performance.

```bash
conda create -n fastchat-awq python=3.10 -y
conda activate fastchat-awq
# cd /path/to/FastChat
pip install --upgrade pip    # enable PEP 660 support
pip install -e .             # install fastchat

git clone https://github.com/mit-han-lab/llm-awq repositories/llm-awq
cd repositories/llm-awq
pip install -e .             # install awq package

cd awq/kernels				
python setup.py install	     # install awq CUDA kernels
```

--------------------------------

### Launch FastChat Multi-Tab Gradio Web Server

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md

Starts an advanced Gradio web server that includes multiple tabs, such as the Chatbot Arena interface, providing a richer user experience beyond a single chat window.

```bash
python3 -m fastchat.serve.gradio_web_server_multi
```

--------------------------------

### Clone FastChat Repository from Source

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md

This sequence of commands clones the FastChat GitHub repository to your local machine and then changes the current directory into the newly cloned FastChat folder. This is the first step for installing FastChat from source.

```bash
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
```

--------------------------------

### Download Quantized GPTQ Model

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/exllama_v2.md

Instructions to install Git LFS and then clone a specific quantized GPTQ model (vicuna-7B-1.1-GPTQ-4bit-128g) from Hugging Face, which is required for the Fastchat model worker.

```bash
git lfs install
git clone https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g models/vicuna-7B-1.1-GPTQ-4bit-128g
```

--------------------------------

### Execute MT-Bench Benchmark for MoA

Source: https://github.com/togethercomputer/moa/blob/main/README.md

Command to run a minimal example of the MT-Bench evaluation, used for assessing the performance of MoA models in a multi-turn conversation setting and evaluating their conversational abilities.

```Bash
bash run_eval_mt_bench.sh
```

--------------------------------

### Upload Python Package to PyPI

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/pypi.md

Executes a shell script responsible for uploading the compiled Python package to the Python Package Index (PyPI). This step typically follows the package build process and handles authentication and submission.

```bash
bash scripts/upload_pypi.sh
```

--------------------------------

### Install vLLM Python Library

Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/vllm_integration.md

Installs the vLLM library using pip, a prerequisite for leveraging vLLM's optimized worker capabilities within FastChat.

```bash
pip install vllm
```

--------------------------------

### Simplest OpenAI Client Configuration (YAML)

Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/client_configs/README.md

This YAML snippet shows the most basic configuration for an OpenAI client. It defines a single client entry under the 'default' key, specifying an API key and an organization ID. This setup is suitable when no client switching is required.

```yaml
default:
    - api_key: "<your OpenAI API key here>"
      organization: "<your organization ID>"
```