### Configure Project with CMake Source: https://github.com/google/gemma.cpp/blob/main/examples/hello_world/README.md Use CMake to configure the build environment for the hello_world example. This command fetches libgemma from a git commit hash on GitHub. ```sh cmake -B build ``` -------------------------------- ### Install CMake and Visual Studio Build Tools on Windows Source: https://github.com/google/gemma.cpp/blob/main/README.md Use winget to install CMake and Visual Studio 2022 Build Tools with the Clang/LLVM C++ frontend for native Windows development. ```sh winget install --id Kitware.CMake winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--passive --wait --add Microsoft.VisualStudio.Workload.VCTools;installRecommended --add Microsoft.VisualStudio.Component.VC.Llvm.Clang --add Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset" ``` -------------------------------- ### Start Interactive gemma CLI Session Source: https://context7.com/google/gemma.cpp/llms.txt Launch the interactive REPL for Gemma models. Specify tokenizer and weights files. Adjust verbosity for output. ```sh # Start interactive session with Gemma 2 2B instruction-tuned (SFP weights) ./gemma \ --tokenizer tokenizer.spm \ --weights gemma2-2b-it-sfp.sbs # With single-file format (post-2025, tokenizer embedded in weights) ./gemma --weights gemma2-2b-it-sfp-single.sbs # Verbosity levels: 0=silent, 1=default UI, 2=debug info ./gemma --tokenizer tokenizer.spm --weights gemma2-2b-it-sfp.sbs --verbosity 0 ``` -------------------------------- ### Start gemma_api_server Source: https://context7.com/google/gemma.cpp/llms.txt Exposes a Google Gemini-compatible REST API over HTTP with support for non-streaming and SSE streaming endpoints. Requires tokenizer and weights files. ```bash # Start the server ./build/gemma_api_server \ --tokenizer path/to/tokenizer.spm \ --weights path/to/gemma2-2b-it-sfp.sbs \ --port 8080 \ --model gemma3-4b ``` -------------------------------- ### List Models Response (JSON) Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Example JSON response from the `/v1beta/models` endpoint, listing available Gemma models. ```json { "models": [ { "name": "models/gemma3-4b", "displayName": "Gemma3 4B", "description": "Gemma3 4B model running locally" } ] } ``` -------------------------------- ### Start Local Gemma API Server Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Launch the local API server with specified tokenizer and weights files. The port can be customized, defaulting to 8080. ```bash ./build/gemma_api_server \ --tokenizer path/to/tokenizer.spm \ --weights path/to/model.sbs \ --port 8080 ``` -------------------------------- ### Generate Content with Generation Configuration (Python) Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Example of generating content using the Python requests library, including generation configuration options. ```APIDOC ## POST /v1beta/models/gemma3-4b:generateContent (Python) ### Description Generates content using the Gemma.cpp API with specified generation configurations. ### Method POST ### Endpoint /v1beta/models/gemma3-4b:generateContent ### Parameters #### Request Body - **contents** (array) - Required - The content to generate from. - **parts** (array) - Required - A list of content parts. - **text** (string) - Required - The input text. - **generationConfig** (object) - Optional - Configuration for content generation. - **temperature** (number) - Optional - Controls randomness (0.0 to 2.0, default: 1.0). - **topK** (integer) - Optional - Top-K sampling parameter (default: 1). - **maxOutputTokens** (integer) - Optional - Maximum number of tokens to generate (default: 8192). ### Request Example ```python import requests response = requests.post('http://localhost:8080/v1beta/models/gemma3-4b:generateContent', json={ 'contents': [{'parts': [{'text': 'Explain quantum computing in simple terms'}]}], 'generationConfig': { 'temperature': 0.9, 'topK': 1, 'maxOutputTokens': 1024 } } ) result = response.json() if 'candidates' in result and result['candidates']: text = result['candidates'][0]['content']['parts'][0]['text'] print(text) ``` ### Response #### Success Response (200) - **candidates** (array) - Contains the generated content. - **content** (object) - The generated content. - **parts** (array) - A list of content parts. - **text** (string) - The generated text. ``` -------------------------------- ### Build the Hello World Executable Source: https://github.com/google/gemma.cpp/blob/main/examples/hello_world/README.md Navigate to the build directory and use 'make' to compile the hello_world executable. The -j flag can be used for parallel builds. ```sh cd build make hello_world ``` -------------------------------- ### GET /v1beta/models Source: https://context7.com/google/gemma.cpp/llms.txt Returns the list of models served by the local API server. ```APIDOC ## GET /v1beta/models ### Description Returns the list of models served by the local API server. ### Method GET ### Endpoint /v1beta/models ### Response #### Success Response (200) - **models** (array) - A list of available models. - **name** (string) - The name of the model. - **displayName** (string) - The display name of the model. - **description** (string) - A description of the model. #### Response Example ```json { "models": [{ "name": "models/gemma3-4b", "displayName": "Gemma3 4B", "description": "Gemma3 4B model running locally" }] } ``` ``` -------------------------------- ### Curl: List Models Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Use curl to send a GET request to the `/v1beta/models` endpoint to retrieve a list of available models. ```bash curl http://localhost:8080/v1beta/models ``` -------------------------------- ### Generate Content with Conversation History (curl) Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Example of how to generate content using the /v1beta/models/gemma3-4b:generateContent endpoint with conversation history. ```APIDOC ## POST /v1beta/models/gemma3-4b:generateContent ### Description Generates content based on the provided model and conversation history. ### Method POST ### Endpoint /v1beta/models/gemma3-4b:generateContent ### Request Body - **contents** (array) - Required - An array of content parts, including conversation history. - **parts** (array) - Required - A list of content parts. - **text** (string) - Required - The text content. ### Request Example ```json { "contents": [ {"parts": [{"text": "Hi, my name is Alice"}]}, {"parts": [{"text": "Hello Alice! Nice to meet you."}]}, {"parts": [{"text": "What is my name?"}]} ] } ``` ``` -------------------------------- ### Run Hello World Executable Source: https://github.com/google/gemma.cpp/blob/main/examples/hello_world/README.md Execute the compiled hello_world program, providing paths to the tokenizer, weights file, and specifying the model type. ```sh ./hello_world --tokenizer tokenizer.spm --weights 2b-it-sfp.sbs --model 2b-it ``` -------------------------------- ### Generate Content Request (JSON) Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Example JSON payload for the `generateContent` endpoint. Includes input text and generation configuration. ```json { "contents": [ { "parts": [ {"text": "Why is the sky blue?"} ] } ], "generationConfig": { "temperature": 0.9, "topK": 1, "maxOutputTokens": 1024 } } ``` -------------------------------- ### Python API via requests Source: https://context7.com/google/gemma.cpp/llms.txt Example of using the Python `requests` library to interact with the Gemma API for non-streaming content generation. ```APIDOC ## Python API via `requests` (Full Example) ### Description This example demonstrates how to use the Python `requests` library to perform non-streaming content generation with the Gemma API. ### Code Example ```python import requests BASE = "http://localhost:8080" MODEL = "gemma3-4b" def generate(prompt, temperature=0.9, top_k=1, max_tokens=1024): """Non-streaming generation.""" resp = requests.post( f"{BASE}/v1beta/models/{MODEL}:generateContent", json={ "contents": [{"parts": [{"text": prompt}]}], "generationConfig": { "temperature": temperature, "topK": top_k, "maxOutputTokens": max_tokens, }, }, ) resp.raise_for_status() data = resp.json() if "candidates" in data and data["candidates"]: return data["candidates"][0]["content"]["parts"][0]["text"] raise RuntimeError(f"Unexpected response: {data}") print(generate("What are the three laws of thermodynamics?")) # Output: The three laws of thermodynamics are: # 1. Energy cannot be created or destroyed... ``` ``` -------------------------------- ### Run Hello World with Constrained Decoding Source: https://github.com/google/gemma.cpp/blob/main/examples/hello_world/README.md Execute the hello_world program with the --reject flag to prevent specific words (identified by token IDs) from being generated. This flag must be the last one in the command. ```sh ./hello_world [...] --reject 32338 42360 78107 106837 132832 143859 154230 190205 ``` -------------------------------- ### Stream Generate Content Response (SSE) Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Example Server-Sent Events (SSE) response for the `streamGenerateContent` endpoint, showing incremental output. ```text data: {"candidates":[{"content":{"parts":[{"text":"The"}],"role":"model"},"index":0}],"promptFeedback":{"safetyRatings":[]}} data: {"candidates":[{"content":{"parts":[{"text":" sky"}],"role":"model"},"index":0}],"promptFeedback":{"safetyRatings":[]}} data: [DONE] ``` -------------------------------- ### Build gemma Executable with Bazel Source: https://github.com/google/gemma.cpp/blob/main/README.md Build the gemma executable using Bazel with optimized settings and C++20 standard. ```bash bazel build -c opt --cxxopt=-std=c++20 :gemma ``` -------------------------------- ### Build gemma Executable with CMake Presets (Windows) Source: https://github.com/google/gemma.cpp/blob/main/README.md Configure the build directory using the Windows CMake preset and then build the project using Visual Studio Build Tools. Specify the number of parallel threads. ```bash # Configure `build` directory cmake --preset windows # Build project using Visual Studio Build Tools cmake --build --preset windows -j [number of parallel threads to use] ``` -------------------------------- ### Build gemma Executable with CMake Presets (Unix-like) Source: https://github.com/google/gemma.cpp/blob/main/README.md Configure the build directory using a CMake preset and then build the project. Specify the number of parallel threads for faster compilation. ```bash # Configure `build` directory cmake --preset make # Build project using make cmake --build --preset make -j [number of parallel threads to use] ``` -------------------------------- ### Generate Content Response (JSON) Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Example JSON response from the `generateContent` endpoint, detailing the model's output, finish reason, and token usage. ```json { "candidates": [ { "content": { "parts": [ {"text": "The sky appears blue because..."} ], "role": "model" }, "finishReason": "STOP", "index": 0 } ], "promptFeedback": { "safetyRatings": [] }, "usageMetadata": { "promptTokenCount": 5, "candidatesTokenCount": 25, "totalTokenCount": 30 } } ``` -------------------------------- ### Build gemma.cpp with CMake (Unix) Source: https://context7.com/google/gemma.cpp/llms.txt Configure and build the gemma CLI binary or libgemma static library using CMake presets. Use -j flag for parallel builds. ```sh git clone https://github.com/google/gemma.cpp cd gemma.cpp # Configure and build (Release) cmake --preset make cmake --build --preset make -j$(nproc) # Build only the library target cmake -B build cd build && make -j$(nproc) libgemma # Build the API server and client cmake --build build --target gemma_api_server gemma_api_client -j8 ``` -------------------------------- ### Use Unified Client with Public Google API Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Connect to the public Google API using the unified client. Requires setting the GOOGLE_API_KEY environment variable or passing it directly. ```bash # Set API key and use public API export GOOGLE_API_KEY="your-api-key-here" ./build/gemma_api_client --interactive 1 # Or pass API key directly ./build/gemma_api_client --api_key "your-api-key" --interactive 1 ``` -------------------------------- ### gemma_api_client CLI Source: https://context7.com/google/gemma.cpp/llms.txt The `gemma_api_client` binary provides a unified CLI for interacting with Gemma models, supporting both local and public APIs. ```APIDOC ## gemma_api_client — Unified CLI Client ### Description The `gemma_api_client` binary connects to either the local `gemma_api_server` or the public Google Generative Language API using identical flags. ### Usage Examples **Interactive chat with local server:** ```bash ./build/gemma_api_client --interactive 1 --host localhost --port 8080 ``` **Single prompt with local server:** ```bash ./build/gemma_api_client --prompt "Summarize the French Revolution in 3 sentences." ``` **Use public Google Gemini API (requires API key):** ```bash export GOOGLE_API_KEY="your-api-key" ./build/gemma_api_client --interactive 1 ``` **Use public Google Gemini API with custom model and API key:** ```bash ./build/gemma_api_client \ --api_key "your-api-key" \ --model gemini-1.5-flash \ --prompt "Write a limerick about compilers." ``` ``` -------------------------------- ### Use gemma CLI in a Shell Pipeline Source: https://context7.com/google/gemma.cpp/llms.txt Pipe code or text into the gemma CLI for analysis or processing. This example pipes C++ code into the aliased gemma2b command. ```sh # Pipe code into gemma for analysis cat configs.h | tail -n 35 | tr '\n' ' ' | \ xargs -0 echo "What does this C++ code do: " | gemma2b ``` -------------------------------- ### Build Project with Make Source: https://github.com/google/gemma.cpp/blob/main/examples/simplified_gemma/README.md Build the simplified_gemma executable using make after configuring the project. The -j flag can be used for parallel builds. ```sh cd build make simplified_gemma ``` -------------------------------- ### Build Gemma API Server and Client Source: https://github.com/google/gemma.cpp/blob/main/API_SERVER_README.md Configure and build the Gemma API server and client binaries using CMake. Ensure the build type is set to Release for optimal performance. ```bash cmake -B build -DCMAKE_BUILD_TYPE=Release cmake --build build --target gemma_api_server gemma_api_client -j 8 ```