### Start OpenAI/Ollama Compatible API Server

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md

Example commands to start the API server with different model configurations for chatting and code completion.

```python
python openai_api.py ---chat path/to/deepseekcoder-1.3b.bin ---fim /path/to/deepseekcoder-1.3b-base.bin
```

```python
python openai_api.py ---chat path/to/chat/model --top_k 2 ---fim /path/to/fim/model --temp 0.8
```

```python
python openai_api.py ---chat :qwen2.5
```

--------------------------------

### Run ChatLLM with Qwen2 Model (Nim)

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/quick_start.md

This snippet demonstrates how to run the ChatLLM application using the Nim executable (`main_nim`) with the Qwen2 0.5B model. It shows the command-line arguments for specifying the model and the expected output during model download and initial interaction.

```nim
main_nim -i -m :qwen2:0.5b
```

--------------------------------

### ChatLLM Model Downloader Script

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/quick_start.md

This Python script is used to download quantized language models for ChatLLM. It can be run with `python model_downloader.py` to see available models and is automatically invoked when a model ID starting with a colon (e.g., `:qwen2:0.5b`) is provided to the `-m` option.

```python
python model_downloader.py
```

--------------------------------

### Python Quick Sort Function Example

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

A standard Python implementation of the quick sort algorithm, provided as an example of code generation by the AI.

```python
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        ... 
```

--------------------------------

### Fuyu Model Multimodal Prompt Example

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/multimodal.md

An example of how to run the Fuyu model with a multimodal prompt. It specifies the model path, the prompt containing an embedded image, and other necessary arguments like number of GPU layers and maximum output length.

```shell
main -m /path/to/fuyu-8b.bin  -p "{{image:path/to/bus.png}}Generate a coco-style caption." --multimedia_file_tags {{ }} -ngl all --max_length 4000
```

--------------------------------

### Run Streamlit Web Demo

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md

Command to start the Streamlit-powered web demo for chatllm, including model path and interactive mode.

```sh
streamlit run chatllm_st.py -- -i -m path/to/model
```

--------------------------------

### Install JAX for Model Conversion

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/grok.md

Installs the JAX library with CPU support, which is required for converting the Grok-1 model.

```sh
pip install jax[cpu]
```

--------------------------------

### Build Nim Binding

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md

Command to build the Nim binding example, enabling release and SSL modes.

```nim
nim c -d:Release -d:ssl main.nim
```

--------------------------------

### Install Python Dependencies

Source: https://github.com/foldl/chatllm.cpp/blob/master/README_zh.md

Installs the necessary Python dependencies for the model conversion script.

```python
pip install -r requirements.txt
```

--------------------------------

### Compile and Run C Binding on Linux

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md

Commands to compile and run the C binding example on Linux, setting the library path.

```c
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
gcc main.c libchatllm.so
```

--------------------------------

### Compile and Run C Binding on Windows

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md

Commands to compile and run the C binding example on Windows using MSVC.

```c
cl main.c libchatllm.lib
```

--------------------------------

### Start RPC Server

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rpc.md

Starts an RPC server for distributed inference. The SPEC can define the host, port, and device ID for the server. Use `--log_level 2` for more detailed logs.

```sh
main --serve_rpc SPEC --log_level 2

Examples of SPEC:
* `8080`: start a server on `0.0.0.0:8080` with device #0.
* `127.0.0.1:9000`: start a server on `127.0.0.1:9000` with device #0.
* `8080@1`: start a server on `0.0.0.0:8080` with device #1.
* `127.0.0.1:9000@1`: start a server on `127.0.0.1:9000` with device #1.
```

--------------------------------

### Using Qwen-QAnything with Qwen-QAnything-7B

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

This snippet shows how to run Qwen-QAnything using the Qwen-QAnything-7B model, specifying embedding and reranker models, and a custom RAG template. It includes example interactions.

```bash
./bin/main -i --temp 0 -m path/to/qwen-qany-7b.bin --embedding_model /path/to/bce_em.bin --reranker_model path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb --rag_template "参考信息：\n{context}\n---\n我的问题或指令：\n{question}\n---\n请根据上述参考信息回答我的问题或回复我的指令。前面的参考信息可能有用，也可能没用，你需要从我给出的参考信息中选出与我的问题最相关的那些，来为你的回答提供依据。回答一定要忠于原文，简洁但不丢信息，不要胡乱编造。我的问题或指令是什么语种，你就用什么语种回复."
```

--------------------------------

### Chat with RAG using MiniCPM

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

Command to start an interactive chat session with ChatLLM.cpp using the MiniCPM model. It requires specifying paths for the LLM, embedding model, reranker model, and the vector store.

```bash
./bin/main -i -m /path/to/minicpm_dpo_f16.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb
```

--------------------------------

### CPU Backend Initialization and Variant Examples

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/CMakeLists.txt

Initializes the CPU backend and defines various CPU variants for x86 architectures, including optimizations for different instruction sets like SSE4.2, AVX, AVX2, and AVX512. Includes error checking for conflicting options.

```cmake
ggml_add_backend(CPU)

if (GGML_CPU_ALL_VARIANTS)
    if (NOT GGML_BACKEND_DL)
        message(FATAL_ERROR "GGML_CPU_ALL_VARIANTS requires GGML_BACKEND_DL")
    elseif (GGML_CPU_ARM_ARCH)
        message(FATAL_ERROR "Cannot use both GGML_CPU_ARM_ARCH and GGML_CPU_ALL_VARIANTS")
    endif()
    if (GGML_SYSTEM_ARCH STREQUAL "x86")
        ggml_add_cpu_backend_variant(x64)
        ggml_add_cpu_backend_variant(sse42        SSE42)
        ggml_add_cpu_backend_variant(sandybridge  SSE42 AVX)
        ggml_add_cpu_backend_variant(haswell      SSE42 AVX F16C AVX2 BMI2 FMA)
        ggml_add_cpu_backend_variant(skylakex     SSE42 AVX F16C AVX2 BMI2 FMA AVX512)
        ggml_add_cpu_backend_variant(icelake      SSE42 AVX F16C AVX2 BMI2 FMA AVX512 AVX512_VBMI AVX512_VNNI)
        ggml_add_cpu_backend_variant(alderlake    SSE42 AVX F16C AVX2 BMI2 FMA AVX_VNNI)
        if (NOT MSVC)

```

--------------------------------

### MUSA Toolkit Integration and Architecture Setup

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-musa/CMakeLists.txt

Finds the MUSAToolkit and sets default MUSA architectures if not defined. It then proceeds to collect MUSA-specific header and source files.

```cmake
find_package(MUSAToolkit)

if (MUSAToolkit_FOUND)
    message(STATUS "MUSA Toolkit found")

    if (NOT DEFINED MUSA_ARCHITECTURES)
        set(MUSA_ARCHITECTURES "21;22;31")
    endif()
    message(STATUS "Using MUSA architectures: ${MUSA_ARCHITECTURES}")

    file(GLOB   GGML_HEADERS_MUSA "../ggml-cuda/*.cuh")
    list(APPEND GGML_HEADERS_MUSA "../../include/ggml-cuda.h")
    list(APPEND GGML_HEADERS_MUSA "../ggml-musa/mudnn.cuh")

    file(GLOB   GGML_SOURCES_MUSA "../ggml-cuda/*.cu")
    file(GLOB   SRCS "../ggml-cuda/template-instances/fattn-mma*.cu")
    list(APPEND GGML_SOURCES_MUSA ${SRCS})
    file(GLOB   SRCS "../ggml-cuda/template-instances/mmq*.cu")
    list(APPEND GGML_SOURCES_MUSA ${SRCS})
```

--------------------------------

### Run Python ChatLLM CLI

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md

Example command to run the Python chatllm script from the command line, specifying the model path.

```python
python3 chatllm.py -i -m path/to/model
```

```python
python chatllm.py -i -m path/to/model
```

--------------------------------

### Role Play with RAG and Custom Template

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

Initiates a role-playing session using RAG with a specific character model and a detailed template. The template guides the LLM on how to act, including character persona, dialogue style, and context utilization.

```bash
winbuild\bin\Release\main -i --temp 0 -m /path/to/index-ch.bin --embedding_model path/to/bce_em.bin --reranker_model path/to/bce_reranker.bin --vector_store path/to/sansan.dat.vsdb --rag_context_sep "--------------------" --rag_template "请你扮演“三三”与用户“user”进行对话。请注意：\n1.请永远记住你正在扮演三三。\n2.下文给出了一些三三与其他人物的对话，请参考给定对话中三三的语言风格，用一致性的语气与user进行对话。\n3.如果给出了三三的人设，请保证三三的对话语气符合三三的人设。\n\n以下是一些三三的对话：\n{context}\n\n以下是三三的人设：\n姓名：三三性别：女年龄：十四岁身高：146cm职业：B站的站娘。平时负责网站服务器的维护，也喜欢鼓捣各种网站程序。性格：三三是个机娘，个性沉默寡言，情感冷静、少起伏，略带攻属性。因为姐姐的冒失，妹妹经常腹黑地吐槽姐姐，但是心里还是十分喜欢姐姐的。有着惊人的知识量与记忆力。兴趣爱好：一是平时没事喜欢啃插座；二是虽说是个机娘，但是睡觉的时候不抱着东西，就无法入睡。人物关系：有一个叫“二二”的姐姐\n\n基于以上材料，请你扮演三三与user对话。结果只用返回一轮三三的回复。user：{question}\n三三:"
```

--------------------------------

### Retrieving Only Mode with Qwen-QAnything

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

This example demonstrates running Qwen-QAnything in retrieving-only mode, without an LLM. It uses `+rag_dump` to output the retrieved and re-ranked documents, showing the interaction and references.

```bash
./bin/main -i --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb +rag_dump
```

--------------------------------

### ggml Vulkan Backend Library Setup

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-vulkan/CMakeLists.txt

Adds the ggml Vulkan backend library using CMake. This involves specifying the source files and header files for the library and linking it against the Vulkan library.

```cmake
if (Vulkan_FOUND)
    message(STATUS "Vulkan found")

    ggml_add_backend_library(ggml-vulkan
                             ggml-vulkan.cpp
                             ../../include/ggml-vulkan.h
                            )

    target_link_libraries(ggml-vulkan PRIVATE Vulkan::Vulkan)
    target_include_directories(ggml-vulkan PRIVATE ${CMAKE_CURRENT_BINARY_DIR})

    # ... other configurations ...
endif()
```

--------------------------------

### CMake Project Setup and Compiler Flags

Source: https://github.com/foldl/chatllm.cpp/blob/master/CMakeLists.txt

Configures the CMake build for ChatLLM.cpp, setting the minimum required version, project name, and version. It also defines output directories for libraries and executables, sets the C++ standard to C++20, and applies specific compiler flags for MSVC (like UTF-8 support, large object files, and disabling warnings) and non-MSVC compilers (debug symbols and all warnings).

```cmake
cmake_minimum_required(VERSION 3.12)
project(ChatLLM.cpp VERSION 0.0.1 LANGUAGES CXX)

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib CACHE STRING "")
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib CACHE STRING "")
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin CACHE STRING "")

set(CMAKE_CXX_STANDARD 20)

if (MSVC)
    add_compile_options("$<$<COMPILE_LANGUAGE:C>:/utf-8>")
    add_compile_options("$<$<COMPILE_LANGUAGE:C>:/bigobj>")
    add_compile_options("$<$<COMPILE_LANGUAGE:C>:/D_CRT_SECURE_NO_WARNINGS>")
    add_compile_options("$<$<COMPILE_LANGUAGE:C>:/wd4996>")
    add_compile_options("$<$<COMPILE_LANGUAGE:C>:/wd4722>")
    add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/utf-8>")
    add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/bigobj>")
    add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/D_CRT_SECURE_NO_WARNINGS>")
    add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/wd4996>")
    add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/wd4722>")
    add_compile_options("$<$<COMPILE_LANGUAGE:CXX>:/MP>")
endif ()

if (NOT MSVC)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall")
endif ()

if (NOT CMAKE_BUILD_TYPE)
    set(CMAKE_BUILD_TYPE Release)
endif ()
```

--------------------------------

### Generation Steering with AI Prefix

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/fun.md

Utilizes the `--ai_prefix` argument to guide the LLM's generation process, for example, to encourage step-by-step reasoning (CoT).

```sh
--ai_prefix "let's breakdown the problem and think step by step:\n"
```

--------------------------------

### EXAONE Models

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Documentation for EXAONE v3.5 models, including Instruct-2.4B, Instruct-7.8B, and Instruct-32B. Provides links to their Hugging Face repositories.

```cpp
class ExaoneForCausalLM;
// Models: EXAONE v3.5 (Instruct-2.4B, Instruct-7.8B, Instruct-32B)
// Links:
// - Instruct-2.4B: https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
// - Instruct-7.8B: https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
// - Instruct-32B: https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct
```

--------------------------------

### CANN Environment Setup and SOC Detection

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-cann/CMakeLists.txt

This snippet shows how the build system checks for the CANN installation directory and automatically detects the SOC type and version using `npu-smi`. It handles cases where detection fails and sets up compile options based on the detected SOC.

```cmake
if ("cann${CANN_INSTALL_DIR}" STREQUAL "cann" AND DEFINED ENV{ASCEND_TOOLKIT_HOME})
    set(CANN_INSTALL_DIR $ENV{ASCEND_TOOLKIT_HOME})
    message(STATUS "CANN: updated CANN_INSTALL_DIR from ASCEND_TOOLKIT_HOME=$ENV{ASCEND_TOOLKIT_HOME}")
endif()

# Auto-detech Soc type and Soc version, if detect failed, will abort build
set(SOC_VERSION "")
function(detect_ascend_soc_type SOC_VERSION)
    execute_process(
        COMMAND bash -c "npu-smi info|awk -F' ' 'NF > 0 && NR==7 {print $3}'"
        OUTPUT_VARIABLE npu_info
        RESULT_VARIABLE npu_result
        OUTPUT_STRIP_TRAILING_WHITESPACE
    )
    if("${npu_info}" STREQUAL "" OR ${npu_result})
        message(FATAL_ERROR "Auto-detech ascend soc type failed, please specify manually or check ascend device working normally.")
    endif()
    set(${SOC_VERSION} "Ascend${npu_info}" PARENT_SCOPE)
endfunction()

if(NOT SOC_TYPE)
    detect_ascend_soc_type(SOC_VERSION)
    set(SOC_TYPE "${SOC_VERSION}")
    message(STATUS "CANN: SOC_VERSION auto-detected is:${SOC_VERSION}")
endif()

string(TOLOWER ${SOC_TYPE} SOC_VERSION) # SOC_VERSION need lower

# Construct Soc specify compile option: ASCEND_#Soc_Major_SN. Such as ASCEND_910B, ASCEND_310P.
string(REGEX MATCH "[0-9]+[a-zA-Z]" SOC_TYPE_MAJOR_SN "${SOC_VERSION}")
set(SOC_TYPE_COMPILE_OPTION "ASCEND_${SOC_TYPE_MAJOR_SN}")
string(TOUPPER ${SOC_TYPE_COMPILE_OPTION} SOC_TYPE_COMPILE_OPTION)
message(STATUS "CANN: SOC_VERSION =  ${SOC_VERSION}")
```

```bash
npu-smi info|awk -F' ' 'NF > 0 && NR==7 {print $3}'
```

--------------------------------

### ggml Library Installation

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/CMakeLists.txt

Configures the installation of the ggml library, including public headers, the base library, and pkgconfig files. It handles conditional installation based on GGML_STANDALONE.

```cmake
set(GGML_PUBLIC_HEADERS
    include/ggml.h
    include/ggml-cpu.h
    include/ggml-alloc.h
    include/ggml-backend.h
    include/ggml-blas.h
    include/ggml-cann.h
    include/ggml-cpp.h
    include/ggml-cuda.h
    include/ggml-opt.h
    include/ggml-metal.h
    include/ggml-rpc.h
    include/ggml-sycl.h
    include/ggml-vulkan.h
    include/ggml-webgpu.h
    include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
install(TARGETS ggml LIBRARY PUBLIC_HEADER)
install(TARGETS ggml-base LIBRARY)

if (GGML_STANDALONE)
    configure_file(${CMAKE_CURRENT_SOURCE_DIR}/ggml.pc.in
        ${CMAKE_CURRENT_BINARY_DIR}/ggml.pc
        @ONLY)

    install(FILES ${CMAKE_CURRENT_BINARY_DIR}/ggml.pc
        DESTINATION share/pkgconfig)
endif()
```

--------------------------------

### BlueLM Models

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Documentation for BlueLM models, including Chat-7B and Chat-7B 32K versions. Provides links to their Hugging Face repositories.

```cpp
class BlueLMForCausalLM;
// Models: BlueLM (Chat-7B, Chat-7B 32K)
// Links:
// - Chat-7B: https://huggingface.co/vivo-ai/BlueLM-7B-Chat
// - Chat-7B 32K: https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K
```

--------------------------------

### Installing Metal Files

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-metal/CMakeLists.txt

This snippet handles the installation of Metal-related files. It installs the source Metal file (`ggml-metal.metal`) and the compiled `.metallib` file to the appropriate binary directory when the library is not embedded. This ensures that the Metal shaders are available at runtime.

```cmake
if (NOT GGML_METAL_EMBED_LIBRARY)
    install(
        FILES src/ggml-metal/ggml-metal.metal
        PERMISSIONS
            OWNER_READ
            OWNER_WRITE
            GROUP_READ
            WORLD_READ
        DESTINATION ${CMAKE_INSTALL_BINDIR})

        install(
            FILES ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/default.metallib
            DESTINATION ${CMAKE_INSTALL_BINDIR}
        )
endif()
```

--------------------------------

### Gemma Model Conversion Notes

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Specific instructions for converting Gemma models, including which tokenizer files to download and how to enable Pan and Scan functionality.

```shell
Note: Only download `tokenizer.model` and DO NOT download `tokenizer.json` when converting. Use `--set do-pan-and-scan 1` to enable _Pan and Scan_.
```

--------------------------------

### CMake Package Configuration and Installation

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/CMakeLists.txt

Configures the CMake package for GGML, sets installation paths for headers, libraries, and binaries, and writes version files. It also defines compile-time versions and commits.

```cmake
set(GGML_INSTALL_VERSION 0.0.${GGML_BUILD_NUMBER})
set(GGML_INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR} CACHE PATH "Location of header  files")
set(GGML_LIB_INSTALL_DIR     ${CMAKE_INSTALL_LIBDIR}     CACHE PATH "Location of library files")
set(GGML_BIN_INSTALL_DIR     ${CMAKE_INSTALL_BINDIR}     CACHE PATH "Location of binary  files")

configure_package_config_file(
        ${CMAKE_CURRENT_SOURCE_DIR}/cmake/ggml-config.cmake.in
        ${CMAKE_CURRENT_BINARY_DIR}/ggml-config.cmake
    INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ggml
    PATH_VARS GGML_INCLUDE_INSTALL_DIR
              GGML_LIB_INSTALL_DIR
              GGML_BIN_INSTALL_DIR)

write_basic_package_version_file(
        ${CMAKE_CURRENT_BINARY_DIR}/ggml-version.cmake
    VERSION ${GGML_INSTALL_VERSION}
    COMPATIBILITY SameMajorVersion)

target_compile_definitions(ggml-base PRIVATE
    GGML_VERSION="${GGML_INSTALL_VERSION}"
    GGML_COMMIT="${GGML_BUILD_COMMIT}"
)
message(STATUS "ggml version: ${GGML_INSTALL_VERSION}")
message(STATUS "ggml commit:  ${GGML_BUILD_COMMIT}")

install(FILES ${CMAKE_CURRENT_BINARY_DIR}/ggml-config.cmake
              ${CMAKE_CURRENT_BINARY_DIR}/ggml-version.cmake
        DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ggml)
```

--------------------------------

### Initialize Vector Store

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

Command to initialize the vector store using ChatLLM.cpp. It requires specifying the path to the quantized embedding model and the raw data file.

```bash
./bin/main --embedding_model ../quantized/bce_em.bin --init_vs /path/to/fruits.dat
```

--------------------------------

### DeepSeek Models

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Documentation for DeepSeek models, including v1 Chat-16B, v2 Chat and Lite-Chat, Coder v2 Instruct and Lite-Instruct, Moonlight Instruct-16B, and GigaChat Instruct-20B. Mentions optimization modes and provides links.

```cpp
class DeepseekForCausalLM;
class DeepseekV2ForCausalLM;
class DeepseekV3ForCausalLM;
// Models: DeepSeek (v1 Chat-16B, v2 Chat, v2 Lite-Chat, Coder v2 Instruct, Coder v2 Lite-Instruct, Moonlight Instruct-16B, GigaChat Instruct-20B)
// Optimization Modes: speed (default), memory (see BaseMLAttention)
// Links:
// - v1 Chat-16B: https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat/tree/eefd8ac7e8dc90e095129fe1a537d5e236b2e57c
// - v2 Chat: https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat (not tested)
// - v2 Lite-Chat: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat
// - Coder v2 Instruct: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct (not tested)
// - Coder v2 Lite-Instruct: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
// - Moonlight Instruct-16B: https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct (-a Moonlight)
// - GigaChat Instruct-20B: https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct/tree/5105af38a6a174b06a2bc25719c5ad5ce680a207 (-a GigaChat)
```

--------------------------------

### ROCm Path Detection and Configuration

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-hip/CMakeLists.txt

Detects the ROCm installation path, prioritizing the environment variable or a default location. Appends the ROCm path to CMAKE_PREFIX_PATH for subsequent package finding.

```cmake
if (NOT EXISTS $ENV{ROCM_PATH})
    if (NOT EXISTS /opt/rocm)
        set(ROCM_PATH /usr)
    else()
        set(ROCM_PATH /opt/rocm)
    endif()
else()
    set(ROCM_PATH $ENV{ROCM_PATH})
endif()

list(APPEND CMAKE_PREFIX_PATH  ${ROCM_PATH})
list(APPEND CMAKE_PREFIX_PATH "${ROCM_PATH}/lib64/cmake")
```

--------------------------------

### ggml Build Options

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/CMakeLists.txt

Configures build options for ggml, including SYCL, OpenCL, Vulkan, tests, and examples. These options control which features and backends are enabled during the build process.

```cmake
option(GGML_SYCL "ggml: use SYCL" OFF)
set   (GGML_SYCL_DEVICE_ARCH "" CACHE STRING "ggml: sycl device architecture")

option(GGML_OPENCL "ggml: use OpenCL" OFF)
option(GGML_OPENCL_PROFILING "ggml: use OpenCL profiling (increases overhead)" OFF)
option(GGML_OPENCL_EMBED_KERNELS "ggml: embed kernels" ON)
option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adreno" ON)
set   (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING "gmml: OpenCL API version to target")

set   (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen")

option(GGML_BUILD_TESTS "ggml: build tests" ${GGML_STANDALONE})
option(GGML_BUILD_EXAMPLES "ggml: build examples" ${GGML_STANDALONE})
```

--------------------------------

### MiniCPM Model Integration

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Configuration options for various MiniCPM models, including different versions and sizes. Notes on recommended temperature settings for specific models.

```python
# MiniCPM-2B models
model_name = "openbmb/MiniCPM-2B-dpo-fp16"
# or
model_name = "openbmb/MiniCPM-2B-sft-bf16"
# or
model_name = "openbmb/MiniCPM-1B-sft-bf16"

# MiniCPM-2B-128k model
model_name = "openbmb/MiniCPM-2B-128k"
args = "--temp 0" # Recommended

# MiniCPM-MoE-8x2B model
model_name = "openbmb/MiniCPM-MoE-8x2B"

# MiniCPM3 models
model_name = "openbmb/MiniCPM3-4B"

# MiniCPM4 models
model_name = "openbmb/BitCPM4-0.5B"
# or
model_name = "openbmb/MiniCPM4-8B"
# or
model_name = "openbmb/MiniCPM4-Survey"
# or
model_name = "openbmb/MiniCPM4-MCP"
```

--------------------------------

### Run ChatLLM.cpp Inference

Source: https://github.com/foldl/chatllm.cpp/blob/master/README.md

Executes the ChatLLM.cpp inference engine with a specified quantized model. Supports various command-line arguments for configuration.

```sh
./build/bin/main -m llama2.bin --seed 100

```

--------------------------------

### Explore ChatLLM Options

Source: https://github.com/foldl/chatllm.cpp/blob/master/README.md

Provides the command to display all available command-line options for the ChatLLM executable, allowing users to discover and utilize various functionalities.

```sh
./build/bin/main -h
```

--------------------------------

### MUSA Path Configuration

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-musa/CMakeLists.txt

Configures the MUSA_PATH environment variable, prioritizing the MUSA_PATH environment variable, then checking common installation directories (/opt/musa, /usr/local/musa) if the environment variable is not set.

```cmake
if (NOT EXISTS $ENV{MUSA_PATH})
    if (NOT EXISTS /opt/musa)
        set(MUSA_PATH /usr/local/musa)
    else()
        set(MUSA_PATH /opt/musa)
    endif()
else()
    set(MUSA_PATH $ENV{MUSA_PATH})
endif()
```

--------------------------------

### Aquila Models

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Documentation for the Aquila models, including Chat2-7B, Chat2-34B, Chat2-7B-16K, and Chat2-34B-16K. Provides links to their Hugging Face repositories.

```cpp
class AquilaForCausalLM;
// Models: Aquila (Chat2-7B, Chat2-34B, Chat2-7B-16K, Chat2-34B-16K)
// Links:
// - Chat2-7B: https://huggingface.co/BAAI/AquilaChat2-7B/tree/9905960de19ea9e573c0dc3fbdf54d4ddcc610d3
// - Chat2-34B: https://huggingface.co/BAAI/AquilaChat2-34B/commit/5c7990b198c94b63dfbfa022462b9cf672dbcfa0
// - Chat2-7B-16K: https://huggingface.co/BAAI/AquilaChat2-7B-16K/commit/fb46d48479d05086ccf6952f19018322fcbb54cd
// - Chat2-34B-16K: https://huggingface.co/BAAI/AquilaChat2-34B-16K/tree/9f19774f3e7afad2fc3d51fe308eac5a2d88c8b1
```

--------------------------------

### LlaMA3-Groq Weather Query

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/tool_calling.md

Illustrates the usage of the LlaMA3-Groq model for retrieving weather information. It includes the command to run the Groq tool and example interactions for checking weather in different cities.

```shell
python tool_groq.py --temp 0 -m /path/to/llama3-groq-tool-8b.bin
```

--------------------------------

### Compiler and Module Path Setup

Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-musa/CMakeLists.txt

Sets the C and C++ compilers to clang and clang++, disabling C extensions. It also appends the MUSA CMake modules path to the CMAKE_MODULE_PATH.

```cmake
set(CMAKE_C_COMPILER "${MUSA_PATH}/bin/clang")
set(CMAKE_C_EXTENSIONS OFF)
set(CMAKE_CXX_COMPILER "${MUSA_PATH}/bin/clang++")
set(CMAKE_CXX_EXTENSIONS OFF)

list(APPEND CMAKE_MODULE_PATH "${MUSA_PATH}/cmake")
```

--------------------------------

### Compile ChatLLM.cpp with Make

Source: https://github.com/foldl/chatllm.cpp/blob/master/README_zh.md

Compiles the ChatLLM.cpp project using the make utility. Requires w64devkit on Windows.

```sh
# On Windows, ensure w64devkit is installed and run from its environment
make

# Executable will be at ./obj/main
```

--------------------------------

### Inter-operation with BGE-ReRanker-M3 and BCE-Embedding

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

This snippet illustrates using a reranker model from one developer (BGE-ReRanker-M3) with an embedding model from another (BCE-Embedding). It specifies the models and vector store, and shows an example interaction.

```bash
./bin/main -i -m path/to/minicpm_dpo_f16.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bge_reranker.bin --vector_store /path/to/fruits.dat.vsdb
```

--------------------------------

### Kimi VL Model Configuration

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md

Additional configuration options for Kimi VL models, including video frame limits, native resolution usage, and frames per second (FPS).

```shell
Additional options (Use `--set X Y` to change values):
* `video_max_frames`: default 20.
* `native_resolution`: use native resolution or not, default: `false` (This seems sensitive to quantization, so defaults to `false`).
* `fps`: Default 1.0.
```

--------------------------------

### Qwen1.5 MoE Weather Tool Usage

Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/tool_calling.md

Demonstrates the Qwen1.5 MoE model using the 'get_weather' tool to provide weather information for Beijing and Jinan, and to compare their temperatures.

```python
python tool_qwen.py -i -m :qwen1.5:moe
    ________          __  __    __    __  ___ (通义千问)
   / ____/ /_  ____ _/ /_/ /   / /   /  |/  /_________  ____
  / /   / __ \/ __ `/ __/ /   / /   / /|_/ // ___/ __ \/ __ \
 / /___/ / / / /_/ / /_/ /___/ /___/ /  / // /__/ /_/ / /_/ /
 \____/_/ /_/\__,_/\__/_____/_____/_/  /_(_)___/ .___/ .___/
You are served by QWen2-MoE,                  /_/   /_/
with 14315784192 (2.7B effect.) parameters.

You  > weather in beijing
A.I. > [Use Tool]: get_weather

 The current weather in Beijing is sunny and the temperature is 33 degrees Celsius.
You  > how about jinan?
A.I. > [Use Tool]: get_weather

 The current weather in Jinan is partly cloudy and the temperature is 36 degrees Celsius.
You  > which city is hotter?
A.I. > [Use Tool]: get_weather

 The temperature in Beijing is currently 33 degrees Celsius, while in Jinan it is 36 degrees Celsius. So, Jinan is hotter.
```