### Setup Python Environment for TTS Client

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/README.md

This sequence sets up a Python virtual environment and installs the necessary libraries (`requests`, `numpy`) required to run the `tts-outetts.py` client script that communicates with the `llama-server` instances.

```bash
$ python3 -m venv venv
$ source venv/bin/activate
(venv) pip install requests numpy
```

--------------------------------

### Quickstart TTS Generation with llama.cpp

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/README.md

This command demonstrates the quickest way to generate speech from text using the llama.cpp TTS example. It assumes llama.cpp is built with `-DLLAMA_CURL=ON` to automatically download necessary models. The output audio is saved as 'output.wav'.

```console
$ build/bin/llama-tts --tts-oute-default -p "Hello world" && aplay output.wav
```

--------------------------------

### Start llama-server for LLM Model

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/README.md

This command starts an instance of `llama-server` to serve the LLM model for TTS. It specifies the model file and the port for the server to listen on.

```bash
$ ./build/bin/llama-server -m ./models/outetts-0.2-0.5B-q8_0.gguf --port 8020
```

--------------------------------

### Start llama-server for Voice Decoder Model

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/README.md

This command starts another instance of `llama-server` to serve the voice decoder model. It specifies the model file, port, and additional parameters like `--embeddings` and `--pooling none`.

```bash
./build/bin/llama-server -m ./models/wavtokenizer-large-75-f16.gguf --port 8021 --embeddings --pooling none
```

--------------------------------

### Install and Verify CLinfo (Shell)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/SYCL.md

Installs the 'clinfo' utility and then runs it to list available OpenCL platforms and devices. This helps verify GPU driver installation.

```shell
sudo apt install clinfo
sudo clinfo -l
```

--------------------------------

### Setup OpenCL Environment for Windows

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/OPENCL.md

Installs OpenCL headers and the ICD loader from source on Windows. These dependencies are required for building llama.cpp with OpenCL support.

```powershell
mkdir -p ~/dev/llm

cd ~/dev/llm
git clone https://github.com/KhronosGroup/OpenCL-Headers && cd OpenCL-Headers
mkdir build && cd build
cmake .. -G Ninja `
  -DBUILD_TESTING=OFF `
  -DOPENCL_HEADERS_BUILD_TESTING=OFF `
  -DOPENCL_HEADERS_BUILD_CXX_TESTS=OFF `
  -DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl"
cmake --build . --target install

cd ~/dev/llm
git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && cd OpenCL-ICD-Loader
mkdir build && cd build
cmake .. -G Ninja `
  -DCMAKE_BUILD_TYPE=Release `
  -DCMAKE_PREFIX_PATH="$HOME/dev/llm/opencl" `
  -DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl"
cmake --build . --target install
```

--------------------------------

### Install Ascend Driver

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/CANN.md

Commands to create a user group and user for Ascend drivers, and then install the Ascend NPU driver. After installation, `npu-smi info` can be used to verify the driver installation.

```shell
# create driver running user.
sudo groupadd -g HwHiAiUser
sudo useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
sudo usermod -aG HwHiAiUser $USER

# download driver from https://www.hiascend.com/hardware/firmware-drivers/community according to your system
# and install driver.
sudo sh Ascend-hdk-910b-npu-driver_x.x.x_linux-{arch}.run --full --install-for-all
```

--------------------------------

### Install Ascend Firmware

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/CANN.md

Command to install the Ascend NPU firmware. Download the firmware package from the official Ascend website and run the installer script. A success message indicates proper installation.

```shell
# download driver from https://www.hiascend.com/hardware/firmware-drivers/community according to your system
# and install driver.
sudo sh Ascend-hdk-910b-npu-firmware_x.x.x.x.X.run --full
```

--------------------------------

### Interactive Mode (Bash)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Details how to run an interactive chat example using a bash script. This method requires bash, cURL, and jq to be installed. It's suitable for command-line driven interactive applications.

```shell
bash chat.sh
```

--------------------------------

### Install CANN Toolkit and Kernels

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/CANN.md

Installs the CANN toolkit and kernels using pip and the provided run script. It also sets up the necessary environment variables by sourcing the `set_env.sh` script in `.bashrc`.

```shell
pip3 install attrs numpy decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions
sh Ascend-cann-toolkit_8.0.RC2.alpha002_linux-aarch64.run --install
sh Ascend-cann-kernels-910b_8.0.RC2.alpha002_linux.run --install

echo "source ~/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc
source ~/.bashrc
```

--------------------------------

### Run TTS Example with Local Models

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/README.md

This command executes the llama.cpp TTS example using locally converted and quantized models. It specifies the paths to the LLM model and the voice decoder model, along with the input prompt.

```bash
$ build/bin/llama-tts -m  ./models/outetts-0.2-0.5B-q8_0.gguf \
    -mv ./models/wavtokenizer-large-75-f16.gguf \
    -p "Hello world"
```

--------------------------------

### Start Llama Server with Native Support

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/function-calling.md

Starts the llama-server with Jinja support, enabling fast loading and using specified Hugging Face models. This is for models with native Jinja template support.

```shell
llama-server --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
```

```shell
llama-server --jinja -fa -hf bartowski/Mistral-Nemo-Instruct-2407-GGUF:Q6_K_L
```

```shell
llama-server --jinja -fa -hf bartowski/Llama-3.3-70B-Instruct-GGUF:Q4_K_M
```

--------------------------------

### Start Llama Server with Hermes Template Override

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/function-calling.md

Starts the llama-server with Jinja support for Hermes models. This command requires a specific chat template file to be provided.

```shell
llama-server --jinja -fa -hf bartowski/Hermes-2-Pro-Llama-3-8B-GGUF:Q4_K_M \
    --chat-template-file models/templates/NousResearch-Hermes-2-Pro-Llama-3-8B-tool_use.jinja
```

```shell
llama-server --jinja -fa -hf bartowski/Hermes-3-Llama-3.1-8B-GGUF:Q4_K_M \
    --chat-template-file models/templates/NousResearch-Hermes-3-Llama-3.1-8B-tool_use.jinja
```

--------------------------------

### Run GritLM Example with a downloaded model

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/examples/gritlm/README.md

This command executes the llama-gritlm example program using a downloaded GritLM model. It requires the executable 'llama-gritlm' to be present and the model file to be in the specified path.

```bash
$ ./llama-gritlm -m models/gritlm-7b_q4_1.gguf
```

--------------------------------

### Install Development Tools and Dependencies

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/CUDA-FEDORA.md

Synchronizes the DNF package manager and installs essential development headers, compilers, and build tools required for software compilation.

```bash
sudo dnf distro-sync
sudo dnf install vim-default-editor --allowerasing
sudo dnf install @c-development @development-tools cmake
```

--------------------------------

### Verify Vulkan Installation

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/build.md

Runs the vulkaninfo utility to ensure the Vulkan SDK is correctly installed and accessible by the system.

```sh
vulkaninfo
```

--------------------------------

### Build and Install llama.cpp

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/examples/simple-cmake-pkg/README.md

Commands to clone the llama.cpp repository, build the project from source, and install the resulting binaries and headers into a local directory for external consumption.

```shell
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -S . -B build
cmake --build build
cmake --install build --prefix inst
```

--------------------------------

### Start Llama Server with Functionary Template Override

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/function-calling.md

Starts the llama-server with Jinja support for the functionary model. A specific chat template file is required for optimal performance.

```shell
llama-server --jinja -fa -hf bartowski/functionary-small-v3.2-GGUF:Q4_K_M
    --chat-template-file models/templates/meetkai-functionary-medium-v3.2.jinja
```

--------------------------------

### Start Llama Server with Cohere Command-R Template Override

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/function-calling.md

Starts the llama-server with Jinja support for the Cohere Command-R model. A dedicated chat template file is necessary for tool use functionality.

```shell
llama-server --jinja -fa -hf bartowski/c4ai-command-r7b-12-2024-GGUF:Q6_K_L \
    --chat-template-file models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
```

--------------------------------

### C++ Preprocessor Directive Example

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/CONTRIBUTING.md

Provides a basic example of a preprocessor directive in C++ using '#ifdef' and '#endif'. This is a placeholder for more detailed guidelines.

```cpp
#ifdef FOO
#endif // FOO
```

--------------------------------

### Testing with CURL

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Example of how to test the `/completion` endpoint using cURL.

```APIDOC
## Testing with CURL

Using [curl](https://curl.se/). On Windows, `curl.exe` should be available in the base OS.

```sh
curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
```
```

--------------------------------

### Run Batched Generation with llama.cpp (Bash)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/examples/batched/README.md

This command executes the `llama-batched` example, specifying the model path, a prompt, and the number of parallel sequences to generate. It outputs the generated text for each sequence and detailed performance timings, including token evaluation and generation speeds.

```bash
./llama-batched -m ./models/llama-7b-v2/ggml-model-f16.gguf -p "Hello my name is" -np 4

...

main: n_len = 32, n_ctx = 2048, n_parallel = 4, n_kv_req = 113

 Hello my name is

main: generating 4 sequences ...

main: stream 0 finished
main: stream 1 finished
main: stream 2 finished
main: stream 3 finished

sequence 0:

Hello my name is Shirley. I am a 25-year-old female who has been working for over 5 years as a b

sequence 1:

Hello my name is Renee and I'm a 32 year old female from the United States. I'm looking for a man between

sequence 2:

Hello my name is Diana. I am looking for a housekeeping job. I have experience with children and have my own transportation. I am

sequence 3:

Hello my name is Cody. I am a 3 year old neutered male. I am a very friendly cat. I am very playful and

main: decoded 108 tokens in 3.57 s, speed: 30.26 t/s

llama_print_timings:
       load time =   587.00 ms
     sample time =     2.56 ms /   112 runs   (    0.02 ms per token, 43664.72 tokens per second)
     prompt eval time =  4089.11 ms /   118 tokens (   34.65 ms per token,    28.86 tokens per second)
        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
       total time =  4156.04 ms

```

--------------------------------

### Set up and Check SYCL Environment

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/SYCL.md

Steps to prepare the environment for SYCL execution, including sourcing the oneAPI environment script and listing available SYCL devices. This helps in verifying the SYCL setup and identifying usable hardware.

```sh
# Enable oneAPI running environment
source /opt/intel/oneapi/setvars.sh

# List devices information
./build/bin/llama-ls-sycl-device
```

--------------------------------

### Docker Setup

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Instructions for running the Llama.cpp server using Docker, with and without CUDA support.

```APIDOC
## Docker

### Running the server with CPU
```bash
docker run -p 8080:8080 -v /path/to/models:/models ghcr.io/ggml-org/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080
```

### Running the server with CUDA
```bash
docker run -p 8080:8080 -v /path/to/models:/models --gpus all ghcr.io/ggml-org/llama.cpp:server-cuda -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 99
```
```

--------------------------------

### Install LLaVA Python Dependencies

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/multimodal/llava.md

Command to install the necessary Python packages for LLaVA model conversion and processing, using pip and a requirements file.

```shell
pip install -r tools/mtmd/requirements.txt
```

--------------------------------

### Launch Server with Legacy UI

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

To revert to the legacy completion-based web UI, start the llama-server binary with the --path flag pointing to the legacy public directory.

```bash
./llama-server -m my_model.gguf -c 8192 --path ./tools/server/public_legacy
```

--------------------------------

### Manage Web UI development environment

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Commands to install dependencies, run the development server, and build the production-ready Web UI assets.

```bash
cd tools/server/webui
npm i
npm run dev
npm run build
```

--------------------------------

### Initialize oneAPI Environment and List SYCL Devices (Shell)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/SYCL.md

Sources the oneAPI environment script to set up necessary environment variables, then uses 'sycl-ls' to list available SYCL devices on the system. This is crucial for verifying the SYCL setup for different GPU types.

```shell
source /opt/intel/oneapi/setvars.sh
sycl-ls
```

--------------------------------

### Run Inference on Android

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/multimodal/MobileVLM.md

Example command for running inference on an Android device using the llama-mtmd-cli binary with image input.

```sh
/data/local/tmp/llama-mtmd-cli \
    -m /data/local/tmp/ggml-model-q4_k.gguf \
    --mmproj /data/local/tmp/mmproj-model-f16.gguf \
    -t 4 \
    --image /data/local/tmp/demo.jpg \
    -p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWho is the author of this book? \nAnswer the question using a single word or phrase. ASSISTANT:"
```

--------------------------------

### Running llama-server with Generic Format Support

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/function-calling.md

Examples of launching the llama-server with different model formats (GGUF) and Jinja templates for generic format support. These commands specify the model repository and quantization level.

```bash
llama-server --jinja -fa -hf bartowski/phi-4-GGUF:Q4_0
llama-server --jinja -fa -hf bartowski/gemma-2-2b-it-GGUF:Q8_0
llama-server --jinja -fa -hf bartowski/c4ai-command-r-v01-GGUF:Q2_K
```

--------------------------------

### Define GBNF Root Rule for Lists

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/grammars/README.md

A basic example of a root rule that forces the model to output a list format.

```GBNF
# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"
```

--------------------------------

### Configure llama-server via Docker Compose

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Example of deploying the llama-server using Docker Compose, demonstrating how to map volumes and set configuration parameters using environment variables.

```yaml
services:
  llamacpp-server:
    image: ghcr.io/ggml-org/llama.cpp:server
    ports:
      - 8080:8080
    volumes:
      - ./models:/models
    environment:
      LLAMA_ARG_MODEL: /models/my_model.gguf
      LLAMA_ARG_CTX_SIZE: 4096
      LLAMA_ARG_N_PARALLEL: 2
      LLAMA_ARG_ENDPOINT_METRICS: 1
      LLAMA_ARG_PORT: 8080
```

--------------------------------

### Define GBNF Chess Notation Grammar

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/grammars/README.md

An example of a GBNF grammar file defining chess moves. It demonstrates the use of the root rule, sequence concatenation, and alternative production rules.

```GBNF
root ::= (
    "1. " move " " move "\n"
    ([1-9] [0-9]? ". " move " " move "\n")+
)

move ::= (pawn | nonpawn | castle) [+#]?

pawn ::= ...
nonpawn ::= ...
castle ::= ...
```

--------------------------------

### Run LLaVA CLI with LLaVA 1.6 Model

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/multimodal/llava.md

Example command to run the `llama-mtmd-cli` binary with a LLaVA 1.6 model. It specifies the GGUF model, the multimodal projector, and notes that LLaVA 1.6 requires more context (e.g., `-c 4096`).

```console
./llama-mtmd-cli -m ../llava-v1.6-vicuna-7b/ggml-model-f16.gguf --mmproj vit/mmproj-model-f16.gguf
```

--------------------------------

### Run llama-simple-cmake-pkg

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/examples/simple-cmake-pkg/README.md

Example command to execute the built binary, providing a path to a GGUF model file and a prompt string.

```shell
./build/llama-simple-cmake-pkg -m ./models/llama-7b-v2/ggml-model-f16.gguf "Hello my name is"
```

--------------------------------

### Download and Convert OuteTTS Model to GGUF

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/README.md

This sequence of commands shows how to download the OuteTTS LLM model from Hugging Face and convert it to the GGUF format required by llama.cpp. It includes cloning the model repository, installing Git LFS, and using the `convert_hf_to_gguf.py` script.

```bash
$ pushd models
$ git clone --branch main --single-branch --depth 1 https://huggingface.co/OuteAI/OuteTTS-0.2-500M
$ cd OuteTTS-0.2-500M && git lfs install && git lfs pull
$ popd
(venv) python convert_hf_to_gguf.py models/OuteTTS-0.2-500M \
    --outfile models/outetts-0.2-0.5B-f16.gguf --outtype f16
```

--------------------------------

### Start llama-server and run benchmark

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/bench/README.md

Launches the llama-server with specific performance flags and executes the k6 load testing script to measure throughput and latency.

```shell
llama-server --host localhost --port 8080 --model ggml-model-q4_0.gguf --cont-batching --metrics --parallel 8 --batch-size 512 --ctx-size 4096 -ngl 33
./k6 run script.js --duration 10m --iterations 500 --vus 8
```

--------------------------------

### CMake Build Configuration for Llama C++ Executable

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/cvector-generator/CMakeLists.txt

Configures the CMake build system to create an executable named 'llama-cvector-generator'. It specifies the source files, installation rules, linked libraries (common, llama, and threading support), and the required C++ standard (C++17).

```cmake
set(TARGET llama-cvector-generator)
add_executable(${TARGET} cvector-generator.cpp pca.hpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_17)
```

--------------------------------

### Convert Zod Object to JSON Schema (JavaScript)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/grammars/README.md

Converts a strict Zod object schema to a JSON schema using `zod-to-json-schema`. This example defines a schema for an object with 'age' and 'email' properties and demonstrates the conversion process. It requires the 'zod' and 'zod-to-json-schema' libraries.

```javascript
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const Foo = z.object({
  age: z.number().positive(),
  email: z.string().email(),
}).strict();

console.log(zodToJsonSchema(Foo));
```

--------------------------------

### Configure llama-tts Executable Build with CMake

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/tts/CMakeLists.txt

This snippet configures the build process for the 'llama-tts' executable using CMake. It specifies the source file 'tts.cpp', links against 'llama', 'common', and threading libraries, and sets the C++ standard to C++17. The 'install' command ensures the executable is available at runtime.

```cmake
set(TARGET llama-tts)
add_executable(${TARGET} tts.cpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE llama common ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_17)
```

--------------------------------

### Configure Executable Target and Link Libraries (CMake)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/examples/speculative-simple/CMakeLists.txt

This CMake snippet defines an executable target named 'llama-speculative-simple', specifies its source file, links necessary libraries including common, llama, and threading support, and sets the C++ standard to C++17. It also includes installation instructions for the target.

```cmake
set(TARGET llama-speculative-simple)
add_executable(${TARGET} speculative-simple.cpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_17)
```

--------------------------------

### Create and Enter Fedora Toolbox

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/CUDA-FEDORA.md

Commands to initialize a new Fedora 41 toolbox container and enter the environment to begin installation with root privileges.

```bash
toolbox create --image registry.fedoraproject.org/fedora-toolbox:41 --container fedora-toolbox-41-cuda
toolbox enter --container fedora-toolbox-41-cuda
```

--------------------------------

### Run LLaVA CLI with LLaVA 1.5 Model

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/multimodal/llava.md

Example command to run the `llama-mtmd-cli` binary with a LLaVA 1.5 model. It specifies the model, multimodal projector, and chat template. Lower temperatures (e.g., 0.1) are recommended for better quality, and GPU offloading can be enabled with the `-ngl` flag.

```shell
./llama-mtmd-cli -m ../llava-v1.5-7b/ggml-model-f16.gguf \
    --mmproj ../llava-v1.5-7b/mmproj-model-f16.gguf \
    --chat-template vicuna
```

--------------------------------

### Benchmark llama-cli with specific thread and GPU configurations

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/development/token_generation_performance_tips.md

Example command used for benchmarking inference speed by adjusting thread counts (-t) and GPU layer offloading (-ngl). This helps determine the optimal balance between CPU utilization and GPU acceleration for a specific hardware setup.

```shell
./llama-cli -m "path/to/model.gguf" -p "An extremely detailed description of the 10 best ethnic dishes will follow, with recipes: " -n 1000 -t 4 -ngl 2000000
```

--------------------------------

### Start llama-server

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Commands to launch the server on Unix-based systems and Windows, specifying the model path and context size.

```bash
./llama-server -m models/7B/ggml-model.gguf -c 2048
```

```powershell
llama-server.exe -m models\7B\ggml-model.gguf -c 2048
```

--------------------------------

### JSON Schema for Pydantic Summary Model

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/grammars/README.md

This JSON object represents the schema generated for the Pydantic `Summary` model. It details the structure, types, and constraints for `key_facts` (strings starting with '- ' and at least 5 characters long) and `question_answers` (an array of at least 5 `QAPair` objects). The `QAPair` schema is defined in the `$defs` section.

```json
{
  "$defs": {
    "QAPair": {
      "additionalProperties": true,
      "properties": {
        "question": {
          "title": "Question",
          "type": "string"
        },
        "concise_answer": {
          "title": "Concise Answer",
          "type": "string"
        },
        "justification": {
          "title": "Justification",
          "type": "string"
        }
      },
      "required": [
        "question",
        "concise_answer",
        "justification"
      ],
      "title": "QAPair",
      "type": "object"
    }
  },
  "additionalProperties": true,
  "properties": {
    "key_facts": {
      "items": {
        "pattern": "^- .{5,}$",
        "type": "string"
      },
      "title": "Key Facts",
      "type": "array"
    },
    "question_answers": {
      "items": {
        "items": {
          "$ref": "#/$defs/QAPair"
        },
        "minItems": 5,
        "type": "array"
      },
      "title": "Question Answers",
      "type": "array"
    }
  },
  "required": [
    "key_facts",
    "question_answers"
  ],
  "title": "Summary",
  "type": "object"
}
```

--------------------------------

### POST /completion Endpoint - With Sampling Parameters

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

This snippet illustrates calling the /completion endpoint with various sampling parameters such as temperature, top_k, and top_p to control the text generation process. These parameters allow fine-tuning the output's creativity and coherence.

```http
POST /completion
Content-Type: application/json

{
  "prompt": "Write a short story about a robot.",
  "temperature": 0.8,
  "top_k": 40,
  "top_p": 0.95,
  "n_predict": 256
}
```

--------------------------------

### Build LLAMA with SYCL Backend (Intel GPUs)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/SYCL.md

Instructions for building the LLAMA project with the SYCL backend, targeting Intel GPUs. It includes options for FP32 (recommended) and FP16 precision, and how to build the binary. Precision issues can be addressed by setting SYCL_PROGRAM_COMPILE_OPTIONS.

```sh
# Option 1: Use FP32 (recommended for better performance in most cases)
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

# Option 2: Use FP16
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON

# build all binary
cmake --build build --config Release -j -v
```

--------------------------------

### Install GGUF package via pip

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/gguf-py/README.md

Commands to install the GGUF package. The standard installation provides core functionality, while the optional 'gui' extra enables the visual GGUF editor.

```shell
pip install gguf
pip install gguf[gui]
```

--------------------------------

### Library Build and Installation (CMake)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/ggml/CMakeLists.txt

Defines the build process for the ggml library, including adding source directories and installing public headers and libraries. It also handles conditional installation of pkgconfig files.

```cmake
#
# build the library
#

add_subdirectory(src)

#
# tests and examples
#

if (GGML_BUILD_TESTS)
    enable_testing()
    add_subdirectory(tests)
endif ()

if (GGML_BUILD_EXAMPLES)
    add_subdirectory(examples)
endif ()

#
# install
#

include(CMakePackageConfigHelpers)

# all public headers
set(GGML_PUBLIC_HEADERS
    include/ggml.h
    include/ggml-cpu.h
    include/ggml-alloc.h
    include/ggml-backend.h
    include/ggml-blas.h
    include/ggml-cann.h
    include/ggml-cpp.h
    include/ggml-cuda.h
    include/ggml-opt.h
    include/ggml-metal.h
    include/ggml-rpc.h
    include/ggml-sycl.h
    include/ggml-vulkan.h
    include/ggml-webgpu.h
    include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
#if (GGML_METAL)
#    set_target_properties(ggml PROPERTIES RESOURCE "${CMAKE_CURRENT_SOURCE_DIR}/src/ggml-metal.metal")
#endif()
install(TARGETS ggml LIBRARY PUBLIC_HEADER)
install(TARGETS ggml-base LIBRARY)

if (GGML_STANDALONE)
    configure_file(${CMAKE_CURRENT_SOURCE_DIR}/ggml.pc.in
        ${CMAKE_CURRENT_BINARY_DIR}/ggml.pc
        @ONLY)

    install(FILES ${CMAKE_CURRENT_BINARY_DIR}/ggml.pc
        DESTINATION share/pkgconfig)
endif()
```

--------------------------------

### Build llama-server with CMake

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Standard build commands for compiling the llama-server binary from the project root using CMake.

```bash
cmake -B build
cmake --build build --config Release -t llama-server
```

--------------------------------

### Build llama.cpp using CMake Presets

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/SYCL.md

This section demonstrates building llama.cpp using CMake presets for different configurations on Windows. It includes commands for release and debug builds with SYCL support, and targeting the 'llama-cli' executable. The last command shows how to enable FP16 with a specific preset.

```sh
cmake --preset x64-windows-sycl-release
cmake --build build-x64-windows-sycl-release -j --target llama-cli

cmake -DGGML_SYCL_F16=ON --preset x64-windows-sycl-release
cmake --build build-x64-windows-sycl-release -j --target llama-cli

cmake --preset x64-windows-sycl-debug
cmake --build build-x64-windows-sycl-debug -j --target llama-cli
```

--------------------------------

### Verify CUDA Installation (Bash)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/CUDA-FEDORA.md

Checks if the CUDA installation is successful by querying the NVIDIA CUDA Compiler (nvcc) version. This command should output details about the installed CUDA toolkit, confirming its accessibility.

```bash
nvcc --version
```

--------------------------------

### Build llama-server with SSL support

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/tools/server/README.md

Instructions for compiling the llama-server binary with OpenSSL 3 support enabled via CMake flags.

```bash
cmake -B build -DLLAMA_SERVER_SSL=ON
cmake --build build --config Release -t llama-server
```

--------------------------------

### GET /props

Source: https://context7.com/liquid4all/liquid_llama.cpp/llms.txt

Get server properties including loaded model information and chat template.

```APIDOC
## GET /props

### Description
Get server properties including loaded model information and chat template.

### Method
GET

### Endpoint
/props

### Response
#### Success Response (200)
- **model_info** (object) - Information about the loaded model.
- **context_size** (integer) - The context size of the model.
- **chat_template** (string) - The chat template used by the model.

#### Response Example
```json
{
  "model_info": {
    "name": "llama-2-7b-chat.gguf",
    "quantization": "Q4_K_M",
    "file_type": "GGUF"
  },
  "context_size": 4096,
  "chat_template": "{{ bos_token }}{% for message in messages %}{% if message.role == 'user' %}{{ 'User: ' + message.content + '\n' }}{% elif message.role == 'assistant' %}{{ 'Assistant: ' + message.content + '\n' }}{% else %}{{ message.content + '\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}"
}
```
```

--------------------------------

### List Available SYCL Devices (Windows)

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/docs/backend/SYCL.md

Lists the available SYCL devices recognized by the system on Windows. This command helps verify the installation of oneAPI and the detection of SYCL-compatible GPUs, such as Intel Level-Zero devices.

```shell
sycl-ls.exe
```

--------------------------------

### Running llama.cpp CLI Examples

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/README.md

Demonstrates how to use the `llama-cli` tool for local model inference and downloading models directly from Hugging Face. It requires a local model file or a model identifier from Hugging Face.

```shell
# Use a local model file
llama-cli -m my_model.gguf

# Or download and run a model directly from Hugging Face
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
```

--------------------------------

### Configure CMake Package and Installation Paths

Source: https://github.com/liquid4all/liquid_llama.cpp/blob/master/ggml/CMakeLists.txt

This snippet sets up CMake variables for package versioning and installation directories. It configures the package configuration file and writes a version file, ensuring proper installation of CMake modules and version information.

```cmake
set(GGML_INSTALL_VERSION 0.0.${GGML_BUILD_NUMBER})
set(GGML_INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR} CACHE PATH "Location of header  files")
set(GGML_LIB_INSTALL_DIR     ${CMAKE_INSTALL_LIBDIR}     CACHE PATH "Location of library files")
set(GGML_BIN_INSTALL_DIR     ${CMAKE_INSTALL_BINDIR}     CACHE PATH "Location of binary  files")

configure_package_config_file(
        ${CMAKE_CURRENT_SOURCE_DIR}/cmake/ggml-config.cmake.in
        ${CMAKE_CURRENT_BINARY_DIR}/ggml-config.cmake
    INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ggml
    PATH_VARS GGML_INCLUDE_INSTALL_DIR
              GGML_LIB_INSTALL_DIR
              GGML_BIN_INSTALL_DIR)

write_basic_package_version_file(
        ${CMAKE_CURRENT_BINARY_DIR}/ggml-version.cmake
    VERSION ${GGML_INSTALL_VERSION}
    COMPATIBILITY SameMajorVersion)

target_compile_definitions(ggml-base PRIVATE
    GGML_VERSION="${GGML_INSTALL_VERSION}"
    GGML_COMMIT="${GGML_BUILD_COMMIT}"
)
message(STATUS "ggml version: ${GGML_INSTALL_VERSION}")
message(STATUS "ggml commit:  ${GGML_BUILD_COMMIT}")

install(FILES ${CMAKE_CURRENT_BINARY_DIR}/ggml-config.cmake
              ${CMAKE_CURRENT_BINARY_DIR}/ggml-version.cmake
        DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ggml)
```