### Start OpenAI/Ollama Compatible API Server Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md Example commands to start the API server with different model configurations for chatting and code completion. ```python python openai_api.py ---chat path/to/deepseekcoder-1.3b.bin ---fim /path/to/deepseekcoder-1.3b-base.bin ``` ```python python openai_api.py ---chat path/to/chat/model --top_k 2 ---fim /path/to/fim/model --temp 0.8 ``` ```python python openai_api.py ---chat :qwen2.5 ``` -------------------------------- ### Run ChatLLM with Qwen2 Model (Nim) Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/quick_start.md This snippet demonstrates how to run the ChatLLM application using the Nim executable (`main_nim`) with the Qwen2 0.5B model. It shows the command-line arguments for specifying the model and the expected output during model download and initial interaction. ```nim main_nim -i -m :qwen2:0.5b ``` -------------------------------- ### ChatLLM Model Downloader Script Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/quick_start.md This Python script is used to download quantized language models for ChatLLM. It can be run with `python model_downloader.py` to see available models and is automatically invoked when a model ID starting with a colon (e.g., `:qwen2:0.5b`) is provided to the `-m` option. ```python python model_downloader.py ``` -------------------------------- ### Python Quick Sort Function Example Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md A standard Python implementation of the quick sort algorithm, provided as an example of code generation by the AI. ```python def quicksort(arr): if len(arr) <= 1: return arr else: ... ``` -------------------------------- ### Fuyu Model Multimodal Prompt Example Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/multimodal.md An example of how to run the Fuyu model with a multimodal prompt. It specifies the model path, the prompt containing an embedded image, and other necessary arguments like number of GPU layers and maximum output length. ```shell main -m /path/to/fuyu-8b.bin -p "{{image:path/to/bus.png}}Generate a coco-style caption." --multimedia_file_tags {{ }} -ngl all --max_length 4000 ``` -------------------------------- ### Run Streamlit Web Demo Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md Command to start the Streamlit-powered web demo for chatllm, including model path and interactive mode. ```sh streamlit run chatllm_st.py -- -i -m path/to/model ``` -------------------------------- ### Install JAX for Model Conversion Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/grok.md Installs the JAX library with CPU support, which is required for converting the Grok-1 model. ```sh pip install jax[cpu] ``` -------------------------------- ### Build Nim Binding Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md Command to build the Nim binding example, enabling release and SSL modes. ```nim nim c -d:Release -d:ssl main.nim ``` -------------------------------- ### Install Python Dependencies Source: https://github.com/foldl/chatllm.cpp/blob/master/README_zh.md Installs the necessary Python dependencies for the model conversion script. ```python pip install -r requirements.txt ``` -------------------------------- ### Compile and Run C Binding on Linux Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md Commands to compile and run the C binding example on Linux, setting the library path. ```c export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH gcc main.c libchatllm.so ``` -------------------------------- ### Compile and Run C Binding on Windows Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md Commands to compile and run the C binding example on Windows using MSVC. ```c cl main.c libchatllm.lib ``` -------------------------------- ### Start RPC Server Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rpc.md Starts an RPC server for distributed inference. The SPEC can define the host, port, and device ID for the server. Use `--log_level 2` for more detailed logs. ```sh main --serve_rpc SPEC --log_level 2 Examples of SPEC: * `8080`: start a server on `0.0.0.0:8080` with device #0. * `127.0.0.1:9000`: start a server on `127.0.0.1:9000` with device #0. * `8080@1`: start a server on `0.0.0.0:8080` with device #1. * `127.0.0.1:9000@1`: start a server on `127.0.0.1:9000` with device #1. ``` -------------------------------- ### Using Qwen-QAnything with Qwen-QAnything-7B Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md This snippet shows how to run Qwen-QAnything using the Qwen-QAnything-7B model, specifying embedding and reranker models, and a custom RAG template. It includes example interactions. ```bash ./bin/main -i --temp 0 -m path/to/qwen-qany-7b.bin --embedding_model /path/to/bce_em.bin --reranker_model path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb --rag_template "参考信息:\n{context}\n---\n我的问题或指令:\n{question}\n---\n请根据上述参考信息回答我的问题或回复我的指令。前面的参考信息可能有用,也可能没用,你需要从我给出的参考信息中选出与我的问题最相关的那些,来为你的回答提供依据。回答一定要忠于原文,简洁但不丢信息,不要胡乱编造。我的问题或指令是什么语种,你就用什么语种回复." ``` -------------------------------- ### Chat with RAG using MiniCPM Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md Command to start an interactive chat session with ChatLLM.cpp using the MiniCPM model. It requires specifying paths for the LLM, embedding model, reranker model, and the vector store. ```bash ./bin/main -i -m /path/to/minicpm_dpo_f16.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb ``` -------------------------------- ### CPU Backend Initialization and Variant Examples Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/CMakeLists.txt Initializes the CPU backend and defines various CPU variants for x86 architectures, including optimizations for different instruction sets like SSE4.2, AVX, AVX2, and AVX512. Includes error checking for conflicting options. ```cmake ggml_add_backend(CPU) if (GGML_CPU_ALL_VARIANTS) if (NOT GGML_BACKEND_DL) message(FATAL_ERROR "GGML_CPU_ALL_VARIANTS requires GGML_BACKEND_DL") elseif (GGML_CPU_ARM_ARCH) message(FATAL_ERROR "Cannot use both GGML_CPU_ARM_ARCH and GGML_CPU_ALL_VARIANTS") endif() if (GGML_SYSTEM_ARCH STREQUAL "x86") ggml_add_cpu_backend_variant(x64) ggml_add_cpu_backend_variant(sse42 SSE42) ggml_add_cpu_backend_variant(sandybridge SSE42 AVX) ggml_add_cpu_backend_variant(haswell SSE42 AVX F16C AVX2 BMI2 FMA) ggml_add_cpu_backend_variant(skylakex SSE42 AVX F16C AVX2 BMI2 FMA AVX512) ggml_add_cpu_backend_variant(icelake SSE42 AVX F16C AVX2 BMI2 FMA AVX512 AVX512_VBMI AVX512_VNNI) ggml_add_cpu_backend_variant(alderlake SSE42 AVX F16C AVX2 BMI2 FMA AVX_VNNI) if (NOT MSVC) ``` -------------------------------- ### MUSA Toolkit Integration and Architecture Setup Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-musa/CMakeLists.txt Finds the MUSAToolkit and sets default MUSA architectures if not defined. It then proceeds to collect MUSA-specific header and source files. ```cmake find_package(MUSAToolkit) if (MUSAToolkit_FOUND) message(STATUS "MUSA Toolkit found") if (NOT DEFINED MUSA_ARCHITECTURES) set(MUSA_ARCHITECTURES "21;22;31") endif() message(STATUS "Using MUSA architectures: ${MUSA_ARCHITECTURES}") file(GLOB GGML_HEADERS_MUSA "../ggml-cuda/*.cuh") list(APPEND GGML_HEADERS_MUSA "../../include/ggml-cuda.h") list(APPEND GGML_HEADERS_MUSA "../ggml-musa/mudnn.cuh") file(GLOB GGML_SOURCES_MUSA "../ggml-cuda/*.cu") file(GLOB SRCS "../ggml-cuda/template-instances/fattn-mma*.cu") list(APPEND GGML_SOURCES_MUSA ${SRCS}) file(GLOB SRCS "../ggml-cuda/template-instances/mmq*.cu") list(APPEND GGML_SOURCES_MUSA ${SRCS}) ``` -------------------------------- ### Run Python ChatLLM CLI Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/binding.md Example command to run the Python chatllm script from the command line, specifying the model path. ```python python3 chatllm.py -i -m path/to/model ``` ```python python chatllm.py -i -m path/to/model ``` -------------------------------- ### Role Play with RAG and Custom Template Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md Initiates a role-playing session using RAG with a specific character model and a detailed template. The template guides the LLM on how to act, including character persona, dialogue style, and context utilization. ```bash winbuild\bin\Release\main -i --temp 0 -m /path/to/index-ch.bin --embedding_model path/to/bce_em.bin --reranker_model path/to/bce_reranker.bin --vector_store path/to/sansan.dat.vsdb --rag_context_sep "--------------------" --rag_template "请你扮演“三三”与用户“user”进行对话。请注意:\n1.请永远记住你正在扮演三三。\n2.下文给出了一些三三与其他人物的对话,请参考给定对话中三三的语言风格,用一致性的语气与user进行对话。\n3.如果给出了三三的人设,请保证三三的对话语气符合三三的人设。\n\n以下是一些三三的对话:\n{context}\n\n以下是三三的人设:\n姓名:三三性别:女年龄:十四岁身高:146cm职业:B站的站娘。平时负责网站服务器的维护,也喜欢鼓捣各种网站程序。性格:三三是个机娘,个性沉默寡言,情感冷静、少起伏,略带攻属性。因为姐姐的冒失,妹妹经常腹黑地吐槽姐姐,但是心里还是十分喜欢姐姐的。有着惊人的知识量与记忆力。兴趣爱好:一是平时没事喜欢啃插座;二是虽说是个机娘,但是睡觉的时候不抱着东西,就无法入睡。人物关系:有一个叫“二二”的姐姐\n\n基于以上材料,请你扮演三三与user对话。结果只用返回一轮三三的回复。user:{question}\n三三:" ``` -------------------------------- ### Retrieving Only Mode with Qwen-QAnything Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md This example demonstrates running Qwen-QAnything in retrieving-only mode, without an LLM. It uses `+rag_dump` to output the retrieved and re-ranked documents, showing the interaction and references. ```bash ./bin/main -i --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bce_reranker.bin --vector_store /path/to/fruits.dat.vsdb +rag_dump ``` -------------------------------- ### ggml Vulkan Backend Library Setup Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-vulkan/CMakeLists.txt Adds the ggml Vulkan backend library using CMake. This involves specifying the source files and header files for the library and linking it against the Vulkan library. ```cmake if (Vulkan_FOUND) message(STATUS "Vulkan found") ggml_add_backend_library(ggml-vulkan ggml-vulkan.cpp ../../include/ggml-vulkan.h ) target_link_libraries(ggml-vulkan PRIVATE Vulkan::Vulkan) target_include_directories(ggml-vulkan PRIVATE ${CMAKE_CURRENT_BINARY_DIR}) # ... other configurations ... endif() ``` -------------------------------- ### CMake Project Setup and Compiler Flags Source: https://github.com/foldl/chatllm.cpp/blob/master/CMakeLists.txt Configures the CMake build for ChatLLM.cpp, setting the minimum required version, project name, and version. It also defines output directories for libraries and executables, sets the C++ standard to C++20, and applies specific compiler flags for MSVC (like UTF-8 support, large object files, and disabling warnings) and non-MSVC compilers (debug symbols and all warnings). ```cmake cmake_minimum_required(VERSION 3.12) project(ChatLLM.cpp VERSION 0.0.1 LANGUAGES CXX) set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib CACHE STRING "") set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib CACHE STRING "") set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin CACHE STRING "") set(CMAKE_CXX_STANDARD 20) if (MSVC) add_compile_options("$<$:/utf-8>") add_compile_options("$<$:/bigobj>") add_compile_options("$<$:/D_CRT_SECURE_NO_WARNINGS>") add_compile_options("$<$:/wd4996>") add_compile_options("$<$:/wd4722>") add_compile_options("$<$:/utf-8>") add_compile_options("$<$:/bigobj>") add_compile_options("$<$:/D_CRT_SECURE_NO_WARNINGS>") add_compile_options("$<$:/wd4996>") add_compile_options("$<$:/wd4722>") add_compile_options("$<$:/MP>") endif () if (NOT MSVC) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall") endif () if (NOT CMAKE_BUILD_TYPE) set(CMAKE_BUILD_TYPE Release) endif () ``` -------------------------------- ### Generation Steering with AI Prefix Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/fun.md Utilizes the `--ai_prefix` argument to guide the LLM's generation process, for example, to encourage step-by-step reasoning (CoT). ```sh --ai_prefix "let's breakdown the problem and think step by step:\n" ``` -------------------------------- ### EXAONE Models Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Documentation for EXAONE v3.5 models, including Instruct-2.4B, Instruct-7.8B, and Instruct-32B. Provides links to their Hugging Face repositories. ```cpp class ExaoneForCausalLM; // Models: EXAONE v3.5 (Instruct-2.4B, Instruct-7.8B, Instruct-32B) // Links: // - Instruct-2.4B: https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct // - Instruct-7.8B: https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct // - Instruct-32B: https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct ``` -------------------------------- ### CANN Environment Setup and SOC Detection Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-cann/CMakeLists.txt This snippet shows how the build system checks for the CANN installation directory and automatically detects the SOC type and version using `npu-smi`. It handles cases where detection fails and sets up compile options based on the detected SOC. ```cmake if ("cann${CANN_INSTALL_DIR}" STREQUAL "cann" AND DEFINED ENV{ASCEND_TOOLKIT_HOME}) set(CANN_INSTALL_DIR $ENV{ASCEND_TOOLKIT_HOME}) message(STATUS "CANN: updated CANN_INSTALL_DIR from ASCEND_TOOLKIT_HOME=$ENV{ASCEND_TOOLKIT_HOME}") endif() # Auto-detech Soc type and Soc version, if detect failed, will abort build set(SOC_VERSION "") function(detect_ascend_soc_type SOC_VERSION) execute_process( COMMAND bash -c "npu-smi info|awk -F' ' 'NF > 0 && NR==7 {print $3}'" OUTPUT_VARIABLE npu_info RESULT_VARIABLE npu_result OUTPUT_STRIP_TRAILING_WHITESPACE ) if("${npu_info}" STREQUAL "" OR ${npu_result}) message(FATAL_ERROR "Auto-detech ascend soc type failed, please specify manually or check ascend device working normally.") endif() set(${SOC_VERSION} "Ascend${npu_info}" PARENT_SCOPE) endfunction() if(NOT SOC_TYPE) detect_ascend_soc_type(SOC_VERSION) set(SOC_TYPE "${SOC_VERSION}") message(STATUS "CANN: SOC_VERSION auto-detected is:${SOC_VERSION}") endif() string(TOLOWER ${SOC_TYPE} SOC_VERSION) # SOC_VERSION need lower # Construct Soc specify compile option: ASCEND_#Soc_Major_SN. Such as ASCEND_910B, ASCEND_310P. string(REGEX MATCH "[0-9]+[a-zA-Z]" SOC_TYPE_MAJOR_SN "${SOC_VERSION}") set(SOC_TYPE_COMPILE_OPTION "ASCEND_${SOC_TYPE_MAJOR_SN}") string(TOUPPER ${SOC_TYPE_COMPILE_OPTION} SOC_TYPE_COMPILE_OPTION) message(STATUS "CANN: SOC_VERSION = ${SOC_VERSION}") ``` ```bash npu-smi info|awk -F' ' 'NF > 0 && NR==7 {print $3}' ``` -------------------------------- ### ggml Library Installation Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/CMakeLists.txt Configures the installation of the ggml library, including public headers, the base library, and pkgconfig files. It handles conditional installation based on GGML_STANDALONE. ```cmake set(GGML_PUBLIC_HEADERS include/ggml.h include/ggml-cpu.h include/ggml-alloc.h include/ggml-backend.h include/ggml-blas.h include/ggml-cann.h include/ggml-cpp.h include/ggml-cuda.h include/ggml-opt.h include/ggml-metal.h include/ggml-rpc.h include/ggml-sycl.h include/ggml-vulkan.h include/ggml-webgpu.h include/gguf.h) set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}") install(TARGETS ggml LIBRARY PUBLIC_HEADER) install(TARGETS ggml-base LIBRARY) if (GGML_STANDALONE) configure_file(${CMAKE_CURRENT_SOURCE_DIR}/ggml.pc.in ${CMAKE_CURRENT_BINARY_DIR}/ggml.pc @ONLY) install(FILES ${CMAKE_CURRENT_BINARY_DIR}/ggml.pc DESTINATION share/pkgconfig) endif() ``` -------------------------------- ### BlueLM Models Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Documentation for BlueLM models, including Chat-7B and Chat-7B 32K versions. Provides links to their Hugging Face repositories. ```cpp class BlueLMForCausalLM; // Models: BlueLM (Chat-7B, Chat-7B 32K) // Links: // - Chat-7B: https://huggingface.co/vivo-ai/BlueLM-7B-Chat // - Chat-7B 32K: https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K ``` -------------------------------- ### Installing Metal Files Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-metal/CMakeLists.txt This snippet handles the installation of Metal-related files. It installs the source Metal file (`ggml-metal.metal`) and the compiled `.metallib` file to the appropriate binary directory when the library is not embedded. This ensures that the Metal shaders are available at runtime. ```cmake if (NOT GGML_METAL_EMBED_LIBRARY) install( FILES src/ggml-metal/ggml-metal.metal PERMISSIONS OWNER_READ OWNER_WRITE GROUP_READ WORLD_READ DESTINATION ${CMAKE_INSTALL_BINDIR}) install( FILES ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/default.metallib DESTINATION ${CMAKE_INSTALL_BINDIR} ) endif() ``` -------------------------------- ### Gemma Model Conversion Notes Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Specific instructions for converting Gemma models, including which tokenizer files to download and how to enable Pan and Scan functionality. ```shell Note: Only download `tokenizer.model` and DO NOT download `tokenizer.json` when converting. Use `--set do-pan-and-scan 1` to enable _Pan and Scan_. ``` -------------------------------- ### CMake Package Configuration and Installation Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/CMakeLists.txt Configures the CMake package for GGML, sets installation paths for headers, libraries, and binaries, and writes version files. It also defines compile-time versions and commits. ```cmake set(GGML_INSTALL_VERSION 0.0.${GGML_BUILD_NUMBER}) set(GGML_INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR} CACHE PATH "Location of header files") set(GGML_LIB_INSTALL_DIR ${CMAKE_INSTALL_LIBDIR} CACHE PATH "Location of library files") set(GGML_BIN_INSTALL_DIR ${CMAKE_INSTALL_BINDIR} CACHE PATH "Location of binary files") configure_package_config_file( ${CMAKE_CURRENT_SOURCE_DIR}/cmake/ggml-config.cmake.in ${CMAKE_CURRENT_BINARY_DIR}/ggml-config.cmake INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ggml PATH_VARS GGML_INCLUDE_INSTALL_DIR GGML_LIB_INSTALL_DIR GGML_BIN_INSTALL_DIR) write_basic_package_version_file( ${CMAKE_CURRENT_BINARY_DIR}/ggml-version.cmake VERSION ${GGML_INSTALL_VERSION} COMPATIBILITY SameMajorVersion) target_compile_definitions(ggml-base PRIVATE GGML_VERSION="${GGML_INSTALL_VERSION}" GGML_COMMIT="${GGML_BUILD_COMMIT}" ) message(STATUS "ggml version: ${GGML_INSTALL_VERSION}") message(STATUS "ggml commit: ${GGML_BUILD_COMMIT}") install(FILES ${CMAKE_CURRENT_BINARY_DIR}/ggml-config.cmake ${CMAKE_CURRENT_BINARY_DIR}/ggml-version.cmake DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ggml) ``` -------------------------------- ### Initialize Vector Store Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md Command to initialize the vector store using ChatLLM.cpp. It requires specifying the path to the quantized embedding model and the raw data file. ```bash ./bin/main --embedding_model ../quantized/bce_em.bin --init_vs /path/to/fruits.dat ``` -------------------------------- ### DeepSeek Models Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Documentation for DeepSeek models, including v1 Chat-16B, v2 Chat and Lite-Chat, Coder v2 Instruct and Lite-Instruct, Moonlight Instruct-16B, and GigaChat Instruct-20B. Mentions optimization modes and provides links. ```cpp class DeepseekForCausalLM; class DeepseekV2ForCausalLM; class DeepseekV3ForCausalLM; // Models: DeepSeek (v1 Chat-16B, v2 Chat, v2 Lite-Chat, Coder v2 Instruct, Coder v2 Lite-Instruct, Moonlight Instruct-16B, GigaChat Instruct-20B) // Optimization Modes: speed (default), memory (see BaseMLAttention) // Links: // - v1 Chat-16B: https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat/tree/eefd8ac7e8dc90e095129fe1a537d5e236b2e57c // - v2 Chat: https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat (not tested) // - v2 Lite-Chat: https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat // - Coder v2 Instruct: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct (not tested) // - Coder v2 Lite-Instruct: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct // - Moonlight Instruct-16B: https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct (-a Moonlight) // - GigaChat Instruct-20B: https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct/tree/5105af38a6a174b06a2bc25719c5ad5ce680a207 (-a GigaChat) ``` -------------------------------- ### ROCm Path Detection and Configuration Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-hip/CMakeLists.txt Detects the ROCm installation path, prioritizing the environment variable or a default location. Appends the ROCm path to CMAKE_PREFIX_PATH for subsequent package finding. ```cmake if (NOT EXISTS $ENV{ROCM_PATH}) if (NOT EXISTS /opt/rocm) set(ROCM_PATH /usr) else() set(ROCM_PATH /opt/rocm) endif() else() set(ROCM_PATH $ENV{ROCM_PATH}) endif() list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH}) list(APPEND CMAKE_PREFIX_PATH "${ROCM_PATH}/lib64/cmake") ``` -------------------------------- ### ggml Build Options Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/CMakeLists.txt Configures build options for ggml, including SYCL, OpenCL, Vulkan, tests, and examples. These options control which features and backends are enabled during the build process. ```cmake option(GGML_SYCL "ggml: use SYCL" OFF) set (GGML_SYCL_DEVICE_ARCH "" CACHE STRING "ggml: sycl device architecture") option(GGML_OPENCL "ggml: use OpenCL" OFF) option(GGML_OPENCL_PROFILING "ggml: use OpenCL profiling (increases overhead)" OFF) option(GGML_OPENCL_EMBED_KERNELS "ggml: embed kernels" ON) option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adreno" ON) set (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING "gmml: OpenCL API version to target") set (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen") option(GGML_BUILD_TESTS "ggml: build tests" ${GGML_STANDALONE}) option(GGML_BUILD_EXAMPLES "ggml: build examples" ${GGML_STANDALONE}) ``` -------------------------------- ### MiniCPM Model Integration Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Configuration options for various MiniCPM models, including different versions and sizes. Notes on recommended temperature settings for specific models. ```python # MiniCPM-2B models model_name = "openbmb/MiniCPM-2B-dpo-fp16" # or model_name = "openbmb/MiniCPM-2B-sft-bf16" # or model_name = "openbmb/MiniCPM-1B-sft-bf16" # MiniCPM-2B-128k model model_name = "openbmb/MiniCPM-2B-128k" args = "--temp 0" # Recommended # MiniCPM-MoE-8x2B model model_name = "openbmb/MiniCPM-MoE-8x2B" # MiniCPM3 models model_name = "openbmb/MiniCPM3-4B" # MiniCPM4 models model_name = "openbmb/BitCPM4-0.5B" # or model_name = "openbmb/MiniCPM4-8B" # or model_name = "openbmb/MiniCPM4-Survey" # or model_name = "openbmb/MiniCPM4-MCP" ``` -------------------------------- ### Run ChatLLM.cpp Inference Source: https://github.com/foldl/chatllm.cpp/blob/master/README.md Executes the ChatLLM.cpp inference engine with a specified quantized model. Supports various command-line arguments for configuration. ```sh ./build/bin/main -m llama2.bin --seed 100 ``` -------------------------------- ### Explore ChatLLM Options Source: https://github.com/foldl/chatllm.cpp/blob/master/README.md Provides the command to display all available command-line options for the ChatLLM executable, allowing users to discover and utilize various functionalities. ```sh ./build/bin/main -h ``` -------------------------------- ### MUSA Path Configuration Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-musa/CMakeLists.txt Configures the MUSA_PATH environment variable, prioritizing the MUSA_PATH environment variable, then checking common installation directories (/opt/musa, /usr/local/musa) if the environment variable is not set. ```cmake if (NOT EXISTS $ENV{MUSA_PATH}) if (NOT EXISTS /opt/musa) set(MUSA_PATH /usr/local/musa) else() set(MUSA_PATH /opt/musa) endif() else() set(MUSA_PATH $ENV{MUSA_PATH}) endif() ``` -------------------------------- ### Aquila Models Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Documentation for the Aquila models, including Chat2-7B, Chat2-34B, Chat2-7B-16K, and Chat2-34B-16K. Provides links to their Hugging Face repositories. ```cpp class AquilaForCausalLM; // Models: Aquila (Chat2-7B, Chat2-34B, Chat2-7B-16K, Chat2-34B-16K) // Links: // - Chat2-7B: https://huggingface.co/BAAI/AquilaChat2-7B/tree/9905960de19ea9e573c0dc3fbdf54d4ddcc610d3 // - Chat2-34B: https://huggingface.co/BAAI/AquilaChat2-34B/commit/5c7990b198c94b63dfbfa022462b9cf672dbcfa0 // - Chat2-7B-16K: https://huggingface.co/BAAI/AquilaChat2-7B-16K/commit/fb46d48479d05086ccf6952f19018322fcbb54cd // - Chat2-34B-16K: https://huggingface.co/BAAI/AquilaChat2-34B-16K/tree/9f19774f3e7afad2fc3d51fe308eac5a2d88c8b1 ``` -------------------------------- ### LlaMA3-Groq Weather Query Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/tool_calling.md Illustrates the usage of the LlaMA3-Groq model for retrieving weather information. It includes the command to run the Groq tool and example interactions for checking weather in different cities. ```shell python tool_groq.py --temp 0 -m /path/to/llama3-groq-tool-8b.bin ``` -------------------------------- ### Compiler and Module Path Setup Source: https://github.com/foldl/chatllm.cpp/blob/master/ggml/src/ggml-musa/CMakeLists.txt Sets the C and C++ compilers to clang and clang++, disabling C extensions. It also appends the MUSA CMake modules path to the CMAKE_MODULE_PATH. ```cmake set(CMAKE_C_COMPILER "${MUSA_PATH}/bin/clang") set(CMAKE_C_EXTENSIONS OFF) set(CMAKE_CXX_COMPILER "${MUSA_PATH}/bin/clang++") set(CMAKE_CXX_EXTENSIONS OFF) list(APPEND CMAKE_MODULE_PATH "${MUSA_PATH}/cmake") ``` -------------------------------- ### Compile ChatLLM.cpp with Make Source: https://github.com/foldl/chatllm.cpp/blob/master/README_zh.md Compiles the ChatLLM.cpp project using the make utility. Requires w64devkit on Windows. ```sh # On Windows, ensure w64devkit is installed and run from its environment make # Executable will be at ./obj/main ``` -------------------------------- ### Inter-operation with BGE-ReRanker-M3 and BCE-Embedding Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md This snippet illustrates using a reranker model from one developer (BGE-ReRanker-M3) with an embedding model from another (BCE-Embedding). It specifies the models and vector store, and shows an example interaction. ```bash ./bin/main -i -m path/to/minicpm_dpo_f16.bin --embedding_model /path/to/bce_em.bin --reranker_model /path/to/bge_reranker.bin --vector_store /path/to/fruits.dat.vsdb ``` -------------------------------- ### Kimi VL Model Configuration Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md Additional configuration options for Kimi VL models, including video frame limits, native resolution usage, and frames per second (FPS). ```shell Additional options (Use `--set X Y` to change values): * `video_max_frames`: default 20. * `native_resolution`: use native resolution or not, default: `false` (This seems sensitive to quantization, so defaults to `false`). * `fps`: Default 1.0. ``` -------------------------------- ### Qwen1.5 MoE Weather Tool Usage Source: https://github.com/foldl/chatllm.cpp/blob/master/docs/tool_calling.md Demonstrates the Qwen1.5 MoE model using the 'get_weather' tool to provide weather information for Beijing and Jinan, and to compare their temperatures. ```python python tool_qwen.py -i -m :qwen1.5:moe ________ __ __ __ __ ___ (通义千问) / ____/ /_ ____ _/ /_/ / / / / |/ /_________ ____ / / / __ \/ __ `/ __/ / / / / /|_/ // ___/ __ \/ __ \ / /___/ / / / /_/ / /_/ /___/ /___/ / / // /__/ /_/ / /_/ / \____/_/ /_/\__,_/\__/_____/_____/_/ /_(_)___/ .___/ .___/ You are served by QWen2-MoE, /_/ /_/ with 14315784192 (2.7B effect.) parameters. You > weather in beijing A.I. > [Use Tool]: get_weather The current weather in Beijing is sunny and the temperature is 33 degrees Celsius. You > how about jinan? A.I. > [Use Tool]: get_weather The current weather in Jinan is partly cloudy and the temperature is 36 degrees Celsius. You > which city is hotter? A.I. > [Use Tool]: get_weather The temperature in Beijing is currently 33 degrees Celsius, while in Jinan it is 36 degrees Celsius. So, Jinan is hotter. ```