### Quickstart MoA Setup and Execution Source: https://github.com/togethercomputer/moa/blob/main/README.md Provides commands to install the Together Python library, set the API key, and run the basic MoA example script (`moa.py`). This allows users to quickly get a simple Mixture-of-Agents application running. ```Bash pip install together export TOGETHER_API_KEY= python moa.py ``` -------------------------------- ### Install alpaca-eval Python Package Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md Instructions for installing the `alpaca-eval` library using pip, covering both stable releases from PyPI and nightly versions directly from the GitHub repository. ```bash pip install alpaca-eval ``` ```bash pip install git+https://github.com/tatsu-lab/alpaca_eval ``` -------------------------------- ### Install AlpacaEval with All LLM Provider Dependencies Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This command installs the core AlpacaEval package along with all optional dependencies required to integrate with various large language model providers, including OpenAI, Anthropic, Cohere, and HuggingFace, ensuring full functionality for `fn_completions`. ```bash pip install alpaca_eval[all] ``` -------------------------------- ### Install FastChat Dependencies and Environment Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/webserver.md This snippet outlines the steps to prepare a Linux environment for FastChat. It includes updating system packages, installing essential tools like tmux and htop, setting up Anaconda, creating and activating a dedicated Python environment, cloning the FastChat repository, and installing its Python dependencies. ```bash sudo apt update sudo apt install tmux htop wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh bash Anaconda3-2022.10-Linux-x86_64.sh conda create -n fastchat python=3.9 conda activate fastchat git clone https://github.com/lm-sys/FastChat.git cd FastChat pip3 install -e . ``` -------------------------------- ### Interactive CLI Chatbot Demo Setup Source: https://github.com/togethercomputer/moa/blob/main/README.md Instructions to set up and launch an interactive multi-turn chatbot demo using MoA. This involves exporting the API key, installing necessary dependencies, and running the main script to start the conversational interface. ```Bash export TOGETHER_API_KEY={your_key} pip install -r requirements.txt python bot.py ``` -------------------------------- ### Install AlpacaEval from Source for Development Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This command installs the AlpacaEval package in editable mode directly from the source directory. It is typically used by developers who have forked the repository and wish to make local modifications and test them without reinstalling. ```bash pip install -e . ``` -------------------------------- ### Launch FastChat Controller, Workers, and Web Server Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/webserver.md This section provides commands to initiate various FastChat components. It covers starting the main controller, registering a Hugging Face API worker, setting up environment variables for OpenAI, Anthropic, and GCP API keys, and finally launching the Gradio web server with multiple model integrations and logging configurations. ```bash cd fastchat_logs/controller python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001 python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https:// python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001 cd fastchat_logs/server0 python3 -m fastchat.serve.huggingface_api_worker --model-info-file ~/elo_results/register_hf_api_models.json export OPENAI_API_KEY= export ANTHROPIC_API_KEY= export GCP_PROJECT_ID= python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 50 --add-chatgpt --add-claude --add-palm --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms python3 backup_logs.py ``` -------------------------------- ### Install Python Package Build and Distribution Dependencies Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/pypi.md Installs the required Python packages for building and distributing a project. This includes 'twine' for interacting with PyPI, 'build' for creating distribution archives, and ensures 'pip' is up-to-date. ```bash python3 -m pip install twine python3 -m pip install --upgrade pip pip3 install build ``` -------------------------------- ### Example: Create a New Leaderboard with Alpaca Eval Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This example demonstrates how to use the `alpaca_eval make_leaderboard` command to generate a new leaderboard. It specifies the path where the leaderboard will be saved and the path to the model outputs to be included in the evaluation. This command is useful for consolidating evaluation results into a single leaderboard. ```bash alpaca_eval make_leaderboard \ --leaderboard_path \ --all_model_outputs ``` -------------------------------- ### Example FastChat CLI Command for Vicuna Model Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/model_support.md Provides a concrete example of running the FastChat CLI with a specific pre-trained model, 'lmsys/vicuna-7b-v1.5'. This command can be used to learn the code logic and verify basic functionality. ```bash python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5 ``` -------------------------------- ### BibTeX Entry for AlpacaEval General Benchmark Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This BibTeX entry provides citation details for the AlpacaEval project, covering its code, results, and general benchmark. It includes authors, title, year, publisher, journal, and a URL for the GitHub repository. ```BibTeX @misc{alpaca_eval, author = {Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {AlpacaEval: An Automatic Evaluator of Instruction-following Models}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tatsu-lab/alpaca_eval}} } ``` -------------------------------- ### Example AlpacaEval Annotation Output with Chain of Thought Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This JSON object illustrates a complete annotation record from AlpacaEval, showcasing the `preference` score, the raw outputs (`output_1`, `output_2`), and the detailed `raw_completion` which includes `concise_explanation` (chain of thought) and `ordered_models`. It also demonstrates how `referenced_models` maps temporary model names to actual outputs for clarity. ```json { "instruction": "How did US states get their names?", "output_1": "The names of U.S. states are derived from a variety of languages and sources, including indigenous American languages, European languages (primarily English, Spanish, and French), as well as other historical influences. Here is a brief overview of how some states got their names:\n\n1. **Indigenous American Languages**: Many state names come from Native American words or names of tribes, often from the languages of the indigenous peoples who lived in or near those areas.\n - **Alabama**: Possibly derived from the Choctaw language, meaning \"thicket clearers.\"\n - **Connecticut**: From a Mohegan-Pequot word meaning \"long tidal river.\"\n - **Massachusetts**: [...]", "generator_1": "gpt4_1106_preview", "dataset": "helpful_base", "output_2": "The names of the 50 U.S. states come from a variety of sources, including Native American languages, European languages, and historical figures. Here's a brief overview of how some states got their names:\n\n1. Native American origins: Many states have names derived from Native American languages. For example, Alabama comes from the Choctaw word \"Albah amo,\" meaning \"plant gatherers\" or \"herb gatherers.\" Similarly, the name Mississippi comes from the Ojibwe word \"Misi-ziibi,\" meaning \"great river.\"\n\n2. European languages: [...].", "generator_2": "gpt4", "annotator": "alpaca_eval_cot_gpt4_turbo_fn", "preference": 1.0, "raw_completion": { "concise_explanation": "Model M provided a more detailed and structured response, including bold headings for each category and a wider range of examples. It also included additional categories such as 'Other European Languages' and 'Combination of Languages and Influences', which added depth to the explanation. Model m's response was accurate but less comprehensive and lacked the clear structure found in Model M's output.", "ordered_models": [ { "model": "M", "rank": 1 }, { "model": "m", "rank": 2 } ] }, "referenced_models": { "M": "output_1", "m": "output_2" } } ``` -------------------------------- ### Prepare Evaluation Dataset with HuggingFace Datasets Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This Python snippet demonstrates how to load an evaluation dataset, specifically "tatsu-lab/alpaca_eval", and populate the 'output' field for each example. It serves as a placeholder for integrating a model's generation logic, typically used when preparing data for evaluation. ```python import datasets eval_set = datasets.load_dataset("tatsu-lab/alpaca_eval", "alpaca_eval")["eval"] for example in eval_set: # generate here is a placeholder for your models generations example["output"] = generate(example["instruction"]) ``` -------------------------------- ### AlpacaEval Leaderboard Generation Command Arguments Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md Describes the command-line arguments used for generating or updating the AlpacaEval leaderboard. These arguments specify paths for model outputs, reference data, and annotator configuration files. ```APIDOC Command: [Implied, e.g., alpaca_eval generate_leaderboard] Arguments: --leaderboard_path: Description: Path to save the leaderboard to. The leaderboard will be saved as a csv file, if it already exists it will append. --all_model_outputs: Description: The json path to the outputs of all models to add to the leaderboard (as a single file or by globbing multiple files). Each dictionary should contain the keys (`instruction` and `output`) that are formatted in the prompts and a column `generator` with the name of the current model. Example: https://huggingface.co/datasets/tatsu-lab/alpaca_eval/blob/main/alpaca_eval_all_outputs.json --reference_outputs: Description: The path to the outputs of the reference model. Each dictionary should contain the keys (`instruction` and `output`) that are formatted in the prompts. By default, the reference outputs are the 003 outputs on AlpacaEval set. --annotators_config: Description: The path to the annotator's config file. Defaults to `alpaca_eval_gpt4`. ``` -------------------------------- ### Install and Configure AlpacaEval 2.0 Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md AlpacaEval 2.0 is the default version when installing `alpaca_eval==0.5`. This snippet provides the installation command and shows how to set an environment variable to revert to older configurations if needed, allowing flexibility in version usage. ```bash pip install alpaca_eval==0.5 ``` ```bash export IS_ALPACA_EVAL_2=False ``` -------------------------------- ### Install FastChat with pip Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md This command installs FastChat and its essential components, including the model worker and web UI, directly using pip. It's the quickest way to get started with FastChat. ```bash pip3 install "fschat[model_worker,webui]" ``` -------------------------------- ### BibTeX Entry for AlpacaFarm Simulation Framework Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This BibTeX entry provides citation details for the AlpacaFarm project, a simulation framework designed for methods that learn from human feedback. It includes authors, title, year, and arXiv preprint information with its primary classification. ```BibTeX @misc{dubois2023alpacafarm, title={AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback}, author={Yann Dubois and Xuechen Li and Rohan Taori and Tianyi Zhang and Ishaan Gulrajani and Jimmy Ba and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto}, year={2023}, eprint={2305.14387}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` -------------------------------- ### alpaca-eval Command Line Interface (CLI) Reference Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md Comprehensive documentation for the `alpaca-eval` CLI tool, detailing its main commands and important parameters for configuring model evaluation and output handling. It outlines the purpose of each command and the expected input/output for key parameters. ```APIDOC alpaca_eval CLI: SYNOPSIS: alpaca_eval COMMAND COMMANDS: evaluate: Description: Evaluate a model based on its outputs. This is the default entrypoint if no command is specified. evaluate_from_model: Description: Evaluate a model from HuggingFace or an API provider. This is a wrapper around `evaluate` which includes generating from a desired model. make_leaderboard: Description: Precompute and save an entire leaderboard for a given dataset / evaluator / set of models generations. analyze_evaluators: Description: Analyze an evaluator and populates the evaluators leaderboard (agreement with human, speed, price,...). IMPORTANT PARAMETERS: model_outputs: Type: string (path to JSON file) Description: A path to a json file for the outputs of the model to add to the leaderboard. Each dictionary should contain the keys `instruction` and `output`. annotators_config: Type: string (annotator name) Description: The annotator to use. Recommended: `weighted_alpaca_eval_gpt4_turbo` (default for AlpacaEval 2.0), which has a high agreement rate with human annotation data, large context size, and is pretty cheap. For a comparison of all annotators, see the evaluators section. reference_outputs: Type: string (path to JSON file) Description: The outputs of the reference model. Same format as `model_outputs`. Default: `gpt4_turbo` for AlpacaEval 2.0. output_path: Type: string (path) Description: Path for saving annotations and leaderboard. ``` -------------------------------- ### Install ExllamaV2 Library Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/exllama_v2.md Commands to clone and install the ExllamaV2 library, which provides optimized kernels for faster GPTQ inference when integrated with Fastchat. ```bash git clone https://github.com/turboderp/exllamav2 cd exllamav2 pip install -e . ``` -------------------------------- ### Download LangChain Example Text File Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/langchain_integration.md Downloads a sample text file ('state_of_the_union.txt') from the LangChain GitHub repository, which serves as input for the question-answering example. ```bash wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt ``` -------------------------------- ### Configuration Parameters for SinglePairwiseAnnotator Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This section details the key parameters used in the `SinglePairwiseAnnotator` configuration. It covers settings for prompt templates, functions for parsing completions, arguments for these functions, and options for output randomization and batch processing. ```APIDOC Parameters ---------- prompt_template : path A prompt that will be given to `fn_prompter` or path to the prompts. Path is relative to `evaluators_configs/` fn_completion_parser : callable or str Function in `completion_parsers.py` to use for parsing the completions into preferences. For each completion, the number of preferences should be equal to the batch_size if not we set all the preferences in that batch to NaN. completion_parser_kwargs : dict Kwargs for fn_completion_parser. fn_completions : callable or str Function in `decoders.py` to use for decoding the output. completions_kwargs : dict kwargs for fn_completions. E.g. model_name, max_tokens, temperature, top_p, top_k, stop_seq. is_randomize_output_order : bool Whether to randomize output_1, output_2 when formatting. batch_size : int Number of examples that will be added in a single prompt. ``` -------------------------------- ### Run alpaca-eval for Model Output Evaluation Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md Demonstrates how to execute `alpaca-eval` from the command line, including setting the OpenAI API key and specifying model outputs. This command processes outputs, prints a leaderboard to the console, and saves both the leaderboard and annotations to the specified directory. ```bash export OPENAI_API_KEY= # for more complex configs, e.g. using Azure or switching clients see client_configs/README.md alpaca_eval --model_outputs 'example/outputs.json' ``` -------------------------------- ### Git Commands for Adding a New Model Contribution Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md These Git commands are used to stage and commit the necessary files for a new model contribution to the AlpacaEval repository. This includes the model configuration, generated outputs, and updated leaderboard entries. ```sh git add src/alpaca_eval/models_configs/ git add src/alpaca_eval/leaderboards/ git add -f results//model_outputs.json git add -f results//*/annotations.json git commit -m "Add to AlpacaEval" git push ``` -------------------------------- ### Evaluate Model with AlpacaEval CLI Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This command initiates the evaluation process for a specified model configuration using the AlpacaEval command-line interface. It runs inference on the model to produce outputs on the evaluation set. ```sh alpaca_eval evaluate_from_model \ --model_configs 'falcon-7b-instruct' ``` -------------------------------- ### Prepare Environment for MoA Evaluation Benchmarks Source: https://github.com/togethercomputer/moa/blob/main/README.md Commands to install all necessary Python packages and set up API keys for Together AI and OpenAI. These steps are crucial for preparing the environment to run various evaluation benchmarks like AlpacaEval, MT-Bench, and FLASK. ```Bash pip install -r requirements.txt cd alpaca_eval pip install -e . cd FastChat pip install -e ".[model_worker,llm_judge]" cd .. export TOGETHER_API_KEY= export OPENAI_API_KEY= ``` -------------------------------- ### Recompute Length-Controlled Win Rates with AlpacaEval Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This command allows users to recompute length-controlled win rates on previously annotated models. It involves installing or updating the `alpaca_eval` package and then executing the evaluation script with specific parameters to enable metric recomputation. ```bash pip install -U alpaca_eval alpaca_eval --model_outputs … --is_recompute_metrics_only True ``` -------------------------------- ### Check FastChat Server Launch Time Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/webserver.md This command helps verify the successful launch of the Gradio web server by searching its logs for the 'Running on local URL' message. It iterates through multiple server logs and extracts the last occurrence of the message. ```bash for i in $(seq 0 11); do cat fastchat_logs/server$i/gradio_web_server.log | grep "Running on local URL" | tail -n 1; done ``` -------------------------------- ### Launch Chatbot Arena Vision Components and Gradio Server Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/arena.md This set of Python commands initiates the core components required for the Chatbot Arena vision environment. It covers starting the central controller, launching SGLang model workers for vision-language models, and running the Gradio web server with advanced features like API endpoint registration, vision arena mode, remote storage, and random question sampling. It also includes a command to prepare VQA example data. ```Python python3 -m fastchat.serve.controller ``` ```Python python3 -m fastchat.serve.sglang_worker --model-path --tokenizer-path ``` ```Python python3 -m fastchat.serve.gradio_web_server_multi --share --register-api-endpoint-file api_endpoints.json --vision-arena --use-remote-storage --random-questions metadata_sampled.json ``` ```Python python fastchat/serve/vision/create_vqa_examples_dir.py ``` -------------------------------- ### Launch FastChat OpenAI-Compatible API Server Components Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/openai_api.md Steps to start the FastChat controller, model worker, and the OpenAI-compatible API server locally. This sets up the necessary backend services for the API. ```bash python3 -m fastchat.serve.controller ``` ```bash python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5 ``` ```bash python3 -m fastchat.serve.openai_api_server --host localhost --port 8000 ``` -------------------------------- ### AlpacaEval Command-Line Interface Reference Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This section details the command-line arguments available for the `alpaca_eval` tool, used for evaluating models. It covers positional arguments like `MODEL_CONFIGS` and various flags for specifying reference models, evaluation datasets, annotators, output paths, and generation limits. ```APIDOC POSITIONAL ARGUMENTS MODEL_CONFIGS Type: Union Description: A dictionary or path (relative to `models_configs`) to a yaml file containing the configuration of the model to decode from. If a directory, we search for 'configs.yaml' in it. The keys in the first dictionary should be the generator's name, and the value should be a dictionary of the generator's configuration. FLAGS -r, --reference_model_configs=REFERENCE_MODEL_CONFIGS Type: Optional[Union] Default: None Description: Same as in `model_configs` but for the reference model. If None, we use the default Davinci003 outputs. -e, --evaluation_dataset=EVALUATION_DATASET Type: Union Default: Description: Path to the evaluation dataset or a function that returns a dataframe. If None, we use the default evaluation. -a, --annotators_config=ANNOTATORS_CONFIG Type: Union Default: 'alpaca_eval_gpt4_turbo_fn' Description: Path to the annotators configuration or a dictionary. If None, we use the default annotators configuration. -o, --output_path=OUTPUT_PATH Type: Union Default: 'auto' Description: Path to save the generations, annotations and leaderboard. If auto saves at `results/`. -m, --max_instances=MAX_INSTANCES Type: Optional[int] Default: None Description: Maximum number of instances to generate and evaluate. If None, we evaluate all instances. --is_strip_output=IS_STRIP_OUTPUT Type: bool Default: True Description: Whether to strip trailing and leading whitespaces from the outputs. --is_load_outputs=IS_LOAD_OUTPUTS Type: bool Default: True Description: Whether to try to load outputs from the output path. If True and outputs exist we only generate outputs for instructions that don't have outputs yet. -c, --chunksize=CHUNKSIZE Type: int Default: 64 Description: Number of instances to generate before saving. If None, we save after all generations. Additional flags are accepted. Description: Other kwargs to `evaluate`. ``` -------------------------------- ### Alpaca Eval make_leaderboard Command Reference Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md Detailed documentation for the `alpaca_eval make_leaderboard` command, which is used to precompute and save an entire leaderboard. This section outlines the command's synopsis, description, and all available flags, including their types, default values, and purposes. It covers parameters like `leaderboard_path`, `annotators_config`, `all_model_outputs`, and `reference_outputs`. ```APIDOC NAME alpaca_eval make_leaderboard - Precompute and save an entire leaderboard for a given dataset / evaluator / set of models generations. SYNOPSIS alpaca_eval make_leaderboard DESCRIPTION Precompute and save an entire leaderboard for a given dataset / evaluator / set of models generations. FLAGS --leaderboard_path=LEADERBOARD_PATH Type: Optional[Union] Default: None The path to save the leaderboard to. The leaderboard will be saved as a csv file, if it already exists it will --annotators_config=ANNOTATORS_CONFIG Type: Union Default: 'alpaca_eval_gpt4_turbo_fn' The path the (or list of dict of) the annotator's config file. --all_model_outputs=ALL_MODEL_OUTPUTS Type: Union Default: \ --all_model_outputs alpaca_eval_all_outputs.json \ --reference_outputs ``` -------------------------------- ### Launch Fastchat Model Worker with xFasterTransformer Integration Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/xFasterTransformer.md These commands illustrate the process of starting the Fastchat model worker with xFasterTransformer enabled for efficient model serving. The examples include standard deployment, NUMA-optimized execution for multi-socket systems, and MPI-based distributed setups. The model worker loads the specified model and applies the chosen data type for inference operations. ```bash # Load model with default configuration (max sequence length 4096, no GPU split setting). python3 -m fastchat.serve.model_worker \ --model-path /path/to/models \ --enable-xft \ --xft-dtype bf16_fp16 ``` ```bash #run inference on numanode 0 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16) numactl -N 0 --localalloc python3 -m fastchat.serve.model_worker \ --model-path /path/to/models \ --enable-xft \ --xft-dtype bf16_fp16 ``` ```bash #run inference on numanode 0 and 1 and with data type bf16_fp16 (first token uses bfloat16, and rest tokens use float16) OMP_NUM_THREADS=$CORE_NUM_PER_SOCKET LD_PRELOAD=libiomp5.so mpirun \ -n 1 numactl -N 0 --localalloc python -m fastchat.serve.model_worker \ --model-path /path/to/models \ --enable-xft \ --xft-dtype bf16_fp16 : \ -n 1 numactl -N 1 --localalloc python -m fastchat.serve.model_worker \ --model-path /path/to/models \ --enable-xft \ --xft-dtype bf16_fp16 ``` -------------------------------- ### Launch FastChat LightLLM Worker Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/lightllm_integration.md Commands to launch the FastChat model worker using LightLLM for optimized inference. These examples demonstrate how to specify the model path, tokenizer mode, and the crucial `--max_total_token_num` argument for continuous batching. An additional example shows how to enable quantized weight and KV cache for inference. ```Python python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000 ``` ```Python python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000 --mode triton_int8weight triton_int8kv ``` -------------------------------- ### Initialize FastChat Controller on node-01 Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/local_cluster.md Starts the FastChat controller service, which acts as the central orchestrator for managing distributed worker nodes and model serving. This command should be executed on the designated controller node. ```bash python3 -m fastchat.serve.controller --host 0.0.0.0 --port 10002 ``` -------------------------------- ### Analyze Evaluators with AlpacaEval CLI Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This command-line tool allows users to analyze their custom evaluators and integrate them into the AlpacaEval leaderboard. It supports detailed bias and variance estimation through multi-seed evaluation, with an option for a quicker single-seed run. ```bash alpaca_eval analyze_evaluators --annotators_config '' ``` -------------------------------- ### Example Prompt Template for Vicuna v0 Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/vicuna_weights_version.md Illustrates the conversation format for Vicuna model version 0. This version uses `### Human:` and `### Assistant:` as explicit turn separators within the prompt. ```text A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ### Human: Hello! ### Assistant: Hello! ### Human: How are you? ### Assistant: I am good. ``` -------------------------------- ### Generate Leaderboard for New Evaluator Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This command creates a new leaderboard CSV file for a custom annotator (evaluator) within the AlpacaEval project. It uses a specified annotator configuration and a dataset of all model outputs to populate the leaderboard. ```bash alpaca_eval make_leaderboard \ --leaderboard_path src/alpaca_eval/leaderboards/data_AlpacaEval/_leaderboard.csv \ --all_model_outputs alpaca_eval_all_outputs.json \ --annotators_config ``` -------------------------------- ### Run AlpacaEval with Pre-decoded Outputs Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md This Bash command shows how to execute `alpaca_eval` when model outputs are already available in a file. It specifies the path to the model outputs and the annotator configuration to use for evaluation, streamlining the evaluation process for pre-generated results. ```bash alpaca_eval --model_outputs 'example/outputs.json' \ --annotators_config 'alpaca_eval_gpt4_turbo_fn' ``` -------------------------------- ### Install Python Dependencies for Vicuna Training Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md This command installs the necessary Python packages required for the Vicuna training environment. The '-e ".[train]"' flag installs the package in editable mode and includes additional dependencies specified for the 'train' extra. ```bash pip3 install -e ".[train]" ``` -------------------------------- ### CLI: Evaluate Model Outputs with alpaca_eval Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/README.md Provides command-line arguments and options for the `alpaca_eval evaluate` command, used to assess model performance based on generated outputs against a reference. It details parameters for model outputs, reference data, annotator configuration, naming, output paths, precomputed leaderboards, and various evaluation metrics. ```APIDOC NAME alpaca_eval evaluate - Evaluate a model based on its outputs. This is the default entrypoint if no command is specified. SYNOPSIS alpaca_eval evaluate DESCRIPTION Evaluate a model based on its outputs. This is the default entrypoint if no command is specified. FLAGS --model_outputs=MODEL_OUTPUTS Type: Optional[Union] Default: None The outputs of the model to add to the leaderboard. Accepts data (list of dictionary, pd.dataframe, datasets.Dataset) or a path to read those (json, csv, tsv) or a function to generate those. Each dictionary (or row of dataframe) should contain the keys that are formatted in the prompts. E.g. by default `instruction` and `output` with optional `input`. If None, we just print the leaderboard. -r, --reference_outputs=REFERENCE_OUTPUTS Type: Union Default: /leaderboard.csv`. --max_instances=MAX_INSTANCES Type: Optional[Optional] Default: None The maximum number of instances to annotate. Useful for testing. --annotation_kwargs=ANNOTATION_KWARGS Type: Optional[Optional] Default: None Additional arguments to pass to `PairwiseAnnotator.annotate_head2head`. -A, --Annotator=ANNOTATOR Default: ``` -------------------------------- ### Run FastChat CLI with Custom Local Model Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/model_support.md Demonstrates the basic command to run the FastChat command-line interface, specifying a placeholder for your local model's path. This is the primary command to test new model integrations. ```bash python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH] ``` -------------------------------- ### Install AWQ and FastChat for 4-bit LLM Inference Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/awq.md This sequence of bash commands sets up a dedicated Conda environment for FastChat and AWQ, installs FastChat, clones the AWQ repository, installs the AWQ Python package, and compiles the necessary CUDA kernels for optimized performance. ```bash conda create -n fastchat-awq python=3.10 -y conda activate fastchat-awq # cd /path/to/FastChat pip install --upgrade pip # enable PEP 660 support pip install -e . # install fastchat git clone https://github.com/mit-han-lab/llm-awq repositories/llm-awq cd repositories/llm-awq pip install -e . # install awq package cd awq/kernels python setup.py install # install awq CUDA kernels ``` -------------------------------- ### Launch FastChat Multi-Tab Gradio Web Server Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md Starts an advanced Gradio web server that includes multiple tabs, such as the Chatbot Arena interface, providing a richer user experience beyond a single chat window. ```bash python3 -m fastchat.serve.gradio_web_server_multi ``` -------------------------------- ### Clone FastChat Repository from Source Source: https://github.com/togethercomputer/moa/blob/main/FastChat/README.md This sequence of commands clones the FastChat GitHub repository to your local machine and then changes the current directory into the newly cloned FastChat folder. This is the first step for installing FastChat from source. ```bash git clone https://github.com/lm-sys/FastChat.git cd FastChat ``` -------------------------------- ### Download Quantized GPTQ Model Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/exllama_v2.md Instructions to install Git LFS and then clone a specific quantized GPTQ model (vicuna-7B-1.1-GPTQ-4bit-128g) from Hugging Face, which is required for the Fastchat model worker. ```bash git lfs install git clone https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g models/vicuna-7B-1.1-GPTQ-4bit-128g ``` -------------------------------- ### Execute MT-Bench Benchmark for MoA Source: https://github.com/togethercomputer/moa/blob/main/README.md Command to run a minimal example of the MT-Bench evaluation, used for assessing the performance of MoA models in a multi-turn conversation setting and evaluating their conversational abilities. ```Bash bash run_eval_mt_bench.sh ``` -------------------------------- ### Upload Python Package to PyPI Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/commands/pypi.md Executes a shell script responsible for uploading the compiled Python package to the Python Package Index (PyPI). This step typically follows the package build process and handles authentication and submission. ```bash bash scripts/upload_pypi.sh ``` -------------------------------- ### Install vLLM Python Library Source: https://github.com/togethercomputer/moa/blob/main/FastChat/docs/vllm_integration.md Installs the vLLM library using pip, a prerequisite for leveraging vLLM's optimized worker capabilities within FastChat. ```bash pip install vllm ``` -------------------------------- ### Simplest OpenAI Client Configuration (YAML) Source: https://github.com/togethercomputer/moa/blob/main/alpaca_eval/client_configs/README.md This YAML snippet shows the most basic configuration for an OpenAI client. It defines a single client entry under the 'default' key, specifying an API key and an organization ID. This setup is suitable when no client switching is required. ```yaml default: - api_key: "" organization: "" ```