### Install litert-lm-builder

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md

Install the litert-lm-builder package using uv and pip within a virtual environment.

```bash
uv venv
source .venv/bin/activate
uv pip install litert-lm-builder
```

--------------------------------

### Run Example with Bazel

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

Command to run the example chat application using Bazel. Replace `<abs_model_path>` with the absolute path to your LiteRT-LM model file.

```bash
bazel run -c opt //kotlin/java/com/google/ai/edge/litertlm/example:main -- <abs_model_path>
```

--------------------------------

### Example Preface with System Instruction and Tools

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md

Use this preface to provide initial system instructions and define tools for the LLM. It also demonstrates disabling the thinking mode.

```cpp
Preface preface = JsonPreface({
  .messages = {
      {"role", "system"},
      {"content", {"You are a model that can do function calling."}}
    },
  .tools = {
    {
      {"name", "get_weather"},
      {"description", "Returns the weather for a given location."},
      {"parameters", {
        {"type", "object"},
        {"properties", {
          {"location", {
            {"type", "string"},
            {"description", "The location to get the weather for."}
          }}
        }},
        {"required", {"location"}}
      }}
    },
    {
      {"name", "get_stock_price"},
      {"description", "Returns the stock price for a given stock symbol."},
      {"parameters", {
        {"type", "object"},
        {"properties", {
          {"stock_symbol", {
            {"type", "string"},
            {"description", "The stock symbol to get the price for."}
          }}
        }},
        {"required", {"stock_symbol"}}
      }}
    }
  },
  .extra_context = {
    {"enable_thinking": false}
  }
});
```

--------------------------------

### Install LiteRT-LM API

Source: https://github.com/google-ai-edge/litert-lm/blob/main/python/colabs/Getting Started with LiteRT-LM Python API.ipynb

Install the necessary packages from PyPI to use the LiteRT-LM API and Hugging Face hub.

```python
!pip install litert-lm huggingface_hub
```

--------------------------------

### Install LiteRT-LM Core Package

Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md

Instructions for installing the LiteRT-LM core package using npm or importing it directly from a CDN.

```shell
# From npm
npm i --save @litert-lm/core
```

```javascript
# From a CDN (in your JavaScript file)
import * as litertlm from 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm';
```

--------------------------------

### Start and Attach to Container Session

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md

Start the container and attach your shell to it. To re-enter an existing container, simply rerun this command.

```bash
podman start --attach litert_lm
```

--------------------------------

### Example Prompt for LiteRT-LM Android Demo App Skill

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/README.md

This is an example prompt to trigger the LiteRT-LM Android demo application creation skill. Ensure your prompt specifies the necessary parameters for the task.

```text
Please create a LiteRT-LM Android demo app

root: ~/litert_lm_litert_lm_maven_integration
Maven Integration scenario
Target: pixel 10
model: gemma 4
```

--------------------------------

### Example LiteRT-LM Output

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md

This is an example of the expected output when running the LiteRT-LM verification test. It includes initialization messages, the model's response, and benchmark metrics.

```text
dev-sh@:LiteRT-LM$ cmake/build/litert_lm_main --model_path=$model_path/gemma-3n-E2B-it-int4.litertlm --backend=cpu --input_prompt="What is the tallest building in the world?"
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
input_prompt: What is the tallest building in the world?
The tallest building in the world is the **Burj Khalifa** in Dubai, United Arab Emirates.

It stands at a staggering **828 meters (2,717 feet)** tall.

It was completed in 2010 and continues to hold the record.

BenchmarkInfo:
  Init Phases (2):
    - Executor initialization: 844.54 ms
    - Tokenizer initialization: 66.70 ms
    Total init time: 911.25 ms
--------------------------------------------------
  Time to first token: 2.40 s
--------------------------------------------------
  Prefill Turns (Total 1 turns):
    Prefill Turn 1: Processed 18 tokens in 2.311920273s duration.
      Prefill Speed: 7.79 tokens/sec.
--------------------------------------------------
  Decode Turns (Total 1 turns):
    Decode Turn 1: Processed 62 tokens in 5.53092314s duration.
      Decode Speed: 11.21 tokens/sec.
--------------------------------------------------
--------------------------------------------------
```

--------------------------------

### Run LiteRT-LM from Terminal

Source: https://github.com/google-ai-edge/litert-lm/blob/main/README.md

Installs and runs a Gemma model directly from the terminal using the `uv` package manager and `litert-lm` CLI. Useful for quick testing without writing code.

```bash
uv tool install litert-lm

litert-lm run \
  --from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
  gemma-3n-E2B-it-int4 \
  --prompt="What is the capital of France?"
```

--------------------------------

### Engine Initialization with Cascading Fallback

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/inference_implementation.md

Implement a 3-stage cascading fallback strategy for engine initialization, starting with multi-modal on a selected backend, falling back to multi-modal on CPU, and finally to text-only on CPU. Ensure UI state is mapped accordingly.

```kotlin
val config = EngineConfig(
    modelPath = modelPath,
    backend = Backend.GPU(),
    visionBackend = Backend.GPU(),
    audioBackend = Backend.CPU()
)
try {
    engine.initialize(config)
    // Transition UI to Multi-modal Model state
} catch (e1: Exception) {
    Log.w("LiteRT-LM", "Multi-modal GPU initialization failed, falling back to CPU.", e1)
    val configCpu = EngineConfig(
        modelPath = modelPath,
        backend = Backend.CPU(),
        visionBackend = Backend.CPU(),
        audioBackend = Backend.CPU()
    )
    try {
        engine.initialize(configCpu)
        // Transition UI to Multi-modal Model state
    } catch (e2: Exception) {
        Log.e("LiteRT-LM", "Multi-modal CPU initialization failed, falling back to text-only.", e2)
        val configTextOnly = EngineConfig(modelPath = modelPath, backend = Backend.CPU())
        try {
            engine.initialize(configTextOnly)
            // Transition UI to Text-only Model state
        } catch (e3: Exception) {
            Log.e("LiteRT-LM", "Text-only CPU initialization failed.", e3)
            // Bubble error up to UI
        }
    }
}
```

--------------------------------

### Configure NPU Backend with Native Library Directory

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

Example of configuring the LiteRT-LM Engine to use the NPU backend, specifying the native library directory. This is particularly relevant for Android applications where libraries might be bundled.

```kotlin
val engineConfig = EngineConfig(
    modelPath = modelPath,
    backend = Backend.NPU(nativeLibraryDir = context.applicationInfo.nativeLibraryDir)
)
```

--------------------------------

### REPL Chat App Example

Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md

A sample REPL chat application demonstrating the LiteRT-LM JavaScript API for interactive conversations in a web browser. It initializes the engine with a Gemma model and handles user input and AI responses.

```html
<div id="out" style="white-space: pre-wrap; font-family: monospace;"></div>
<input id="in" onkeydown="if(event.key === 'Enter') repl(this)">

<script type="module">
  import { Engine } from 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm';
  const engine = await Engine.create({
    // Load the Gemma 4 E2B model
    model: 'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it-web.litertlm'
    // Or use the E4B model by swapping in this line
    // model: 'https://huggingface.co/litert-community/gemma-4-E4B-it-litert-lm/resolve/main/gemma-4-E4B-it-web.litertlm'
  });
  const chat = await engine.createConversation();

  window.repl = async (el) => {
    const text = el.value;
    el.value = ''; // Clear immediately
    out.append(`
>>> ${text}
AI: `);

    for await (const chunk of chat.sendMessageStreaming(text)) {
      out.append(chunk.content[0].text);
    }
  };
</script>
```

--------------------------------

### Tool Execution Result

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md

Example JSON object returned by the local tool execution function.

```json
{
  "tool_name": "get_weather",
  "location":"Paris",
  "temperature":72,
  "unit":"F",
  "humidity":50,
  "condition":"Sunny"
}
```

--------------------------------

### Initialize and Run LiteRT LLM Engine

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md

Demonstrates the full workflow of setting up the engine, creating a conversation, and sending messages synchronously or asynchronously.

```cpp
#include "runtime/engine/engine.h"

// ...

// 1. Define model assets and engine settings.
auto model_assets = ModelAssets::Create(model_path);
CHECK_OK(model_assets);

auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU);

// 2. Create the main Engine object.
absl::StatusOr<std::unique_ptr<Engine>> engine = Engine::CreateEngine(engine_settings);
CHECK_OK(engine);

// 3. Create a Conversation
auto conversation_config = ConversationConfig::CreateDefault(**engine);
CHECK_OK(conversation_config)
absl::StatusOr<std::unique_ptr<Conversation>> conversation = Conversation::Create(**engine, *conversation_config);
CHECK_OK(conversation);

// 4. Send message to the LLM with blocking call.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    Message{
        {"role", "user"},
        {"content", "What is the tallest building in the world?"}
    });
CHECK_OK(model_message);

// 5. Print the model message.
std::cout << *model_message << std::endl;

// 6. Send message to the LLM with asynchronous call
// where CreatePrintMessageCallback is a users implemented callback that would
// process the message once a chunk of message output is received.
std::stringstream captured_output;
(*conversation)->SendMessageAsync(
    Message{
        {"role", "user"},
        {"content", "What is the tallest building in the world?"}
    },
    CreatePrintMessageCallback(std::stringstream& captured_output)
);
// Wait until asynchronous finish or timeout.
*engine->WaitUntilDone(absl::Seconds(10));
```

--------------------------------

### Build Litert LM Binary for Windows

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Use Bazel to build the litert_lm_main binary for Windows. Ensure you are using Bazelisk and the correct configuration.

```bash
# Build litert_lm_main for Windows.
bazelisk build //runtime/engine:litert_lm_main --config=windows
```

--------------------------------

### Deploy and Run LiteRT LM on GPU

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Requires additional prebuilt .so files for the arm64 architecture. Ensure the LD_LIBRARY_PATH is set to the device folder containing the libraries.

```bash
# Skip model push if it is already there
adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm

adb push prebuilt/android_arm64/*.so $DEVICE_FOLDER
adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER

adb shell LD_LIBRARY_PATH=$DEVICE_FOLDER \
    $DEVICE_FOLDER/litert_lm_main \
    --backend=gpu \
    --model_path=$DEVICE_FOLDER/model.litertlm
```

--------------------------------

### Deploy and Run LiteRT LM on CPU

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Use these commands to push the model and binary to an Android device and execute the model using the CPU backend.

```bash
# Skip model push if it is already there
adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm

adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER

adb shell $DEVICE_FOLDER/litert_lm_main \
    --backend=cpu \
    --model_path=$DEVICE_FOLDER/model.litertlm
```

--------------------------------

### Prepare Android Device Directory

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Create a target directory on the Android device for binary and model deployment.

```bash
export DEVICE_FOLDER=/data/local/tmp/
adb shell mkdir -p $DEVICE_FOLDER
```

--------------------------------

### Model Tool Call Response

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md

Example JSON structure returned by the model when a tool call is requested.

```json
{
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": {
          "location": "Paris"
        }
      }
    }
  ]
}
```

--------------------------------

### Initialize Conversation with Tools

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md

Configure the Conversation object by passing the prepared Preface to the ConversationConfig builder.

```c++
// Set model file path and backend.
std::string model_path = absl::GetFlag(FLAGS_model_path);
ASSIGN_OR_RETURN(ModelAssets model_assets, ModelAssets::Create(model_path));
ASSIGN_OR_RETURN(
  EngineSettings engine_settings,
  EngineSettings::CreateDefault(std::move(model_assets), Backend::CPU));

// Create `Engine`.
ASSIGN_OR_RETURN(
    std::unique_ptr<litert::lm::Engine> engine,
    litert::lm::Engine::CreateEngine(std::move(engine_settings)));

// Create `Conversation`.
auto session_config = litert::lm::SessionConfig::CreateDefault();
ASSIGN_OR_RETURN(auto conversation_config,
                   ConversationConfig::Builder()
                       .SetSessionConfig(session_config)
                       .SetPreface(preface)
                       .Build(*engine));
ASSIGN_OR_RETURN(std::unique_ptr<Conversation> conversation,
                   Conversation::Create(*engine, conversation_config));
```

--------------------------------

### Model Final Response

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md

Example JSON structure containing the model's natural language interpretation of the tool result.

```json
{
  "content": [
    {
      "type": "text",
      "text": "The weather in Paris is sunny with a temperature of 72°F and humidity of 50%."
    }
  ]
}
```

--------------------------------

### Build LiteRT-LM Binary

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Compile the litert_lm_main executable using Bazel.

```bash
bazel build //runtime/engine:litert_lm_main
```

--------------------------------

### Initialize Multi-modal EngineConfig

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md

Explicitly initialize visionBackend and audioBackend in EngineConfig. Ensure audioBackend is strictly set to Backend.CPU().

```java
new EngineConfig(..., audioBackend = Backend.CPU(), visionBackend = ...)
```

--------------------------------

### Resolving Filename from Content URI

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/ui_layout_and_state.md

When handling content URIs, query ContentResolver for OpenableColumns.DISPLAY_NAME to get the actual filename, rather than relying on Uri.lastPathSegment.

```kotlin
fun getFileName(uri: Uri, contentResolver: ContentResolver):
        String? {
    var result: String? = null
    if (uri.scheme == "content") {
        contentResolver.query(uri, null, null, null, null)?.use {
            it.moveToFirst()
            val displayNameIndex = it.getColumnIndex(OpenableColumns.DISPLAY_NAME)
            result = it.getString(displayNameIndex)
        }
    }
    if (result == null) {
        result = uri.lastPathSegment
    }
    return result
}
```

--------------------------------

### Build LiteRT-LM file using TOML configuration

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md

Dynamically drive the `litert-lm-builder` CLI by reading configuration from a TOML file.

```bash
litert-lm-builder toml --path example.toml output --path real_via_toml.litertlm
```

--------------------------------

### Run LiteRT-LM on Linux or MacOS

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Execute the built binary on Unix-like systems.

```bash
bazel-bin/runtime/engine/litert_lm_main \
    --backend=cpu \
    --model_path=$MODEL_PATH
```

--------------------------------

### Send Message (Non-Streaming)

Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md

Example of sending a message to the conversation without expecting a streamed response. The full response is returned at once. Supports both simple string input and structured message objects.

```typescript
// Simple string input
let response = await conversation.sendMessage("What is the capital of France?");
console.log(response.content[0].text);

// Or with full message structure
response = await conversation.sendMessage({role: 'user', content: '...'});
```

--------------------------------

### Example of Asynchronous Message Chunks

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md

Illustrates how the MessageCallback might be invoked multiple times with sequential parts of a model's response. Implementers need to accumulate these chunks for the complete message.

```json
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "He"
  ]
}
```

```json
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "llo"
  ]
}
```

```json
{
  "role": "model",
  "content": [
    "type": "text",
    "text": " Wo"
  ]
}
```

```json
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "rl"
  ]
}
```

```json
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "d!"
  ]
}
```

--------------------------------

### Initialize LiteRT-LM Engine

Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md

Demonstrates how to initialize the LiteRT-LM Engine, which is the primary entry point for model loading and session management. Ensure to delete the engine when it's no longer needed to release resources. Model initialization can take several seconds.

```typescript
import {Engine, EngineSettings} from '@litert-lm/core';

const engineSettings = {
  model: 'url/path/to/model.litertlm', // or a ReadableStream, or a Blob

  // You can configure context length and other settings here
  mainExecutorSettings: {
    maxNumTokens: 8192,
  },
} satisfies EngineSettings;

const engine = await Engine.create(engineSettings);

// ... Use the engine to create a conversation ...

// Delete the engine when done.
await engine.delete();
```

--------------------------------

### Create a Conversation Instance

Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md

Shows how to create a Conversation instance from an initialized engine. Customization is possible using ConversationConfig, such as setting a system preface for the assistant's behavior.

```typescript
const conversation = await engine.createConversation({
  preface: {
    messages: [
      {role: 'system', content: 'You are a helpful assistant'}
    ]
  }
});

conversation.sendMessage({
  role: 'user',
  content: 'Write a poem',
});
```

--------------------------------

### Configure LiteRT-LM Prebuilt Dependencies in BUILD

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/dependency_source_build.md

This Starlark code defines `java_import` rules to register the LiteRT-LM Kotlin and native JNI JARs. It includes an example of exporting transitive dependencies required by the Kotlin bindings, ensuring they are available to downstream targets.

```starlark
load(@rules_java//java:defs.bzl, "java_import")

package(default_visibility = ["//visibility:public"])

java_import(
    name = "litertlm_kotlin",
    jars = ["litertlm-android.jar"],
    exports = [
        # Export external libraries used inside the Kotlin binding prebuilts
        # so they are automatically added to down-stream classpaths depending on this rule.
        @maven//:com_example_library_transitive_dependency,
    ],
)

java_import(
    name = "litertlm_native",
    jars = ["litertlm_native.jar"],
)
```

--------------------------------

### Run LiteRT-LM on Windows

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Execute the LiteRT-LM binary on Windows using PowerShell.

```powershell
bazel-bin\runtime\engine\litert_lm_main.exe `
    --backend=cpu `
    --model_path=$Env:MODEL_PATH
```

--------------------------------

### Run LLM with default prompt

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Executes the model using the CPU backend with a default prompt.

```bash
<path to binary directory>/litert_lm_main \
    --backend=cpu \
    --model_path=$MODEL_PATH
```

--------------------------------

### Extract components from LiteRT-LM file with litert-lm-peek

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md

Use the `litert-lm-peek` CLI to extract byte-for-byte components from a .litertlm file to a specified directory.

```bash
# Extract byte-for-byte components directly
litert-lm-peek --litertlm_file demo.litertlm --dump_files_dir ./extracted_files
```

--------------------------------

### Define Main Executable

Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/engine/CMakeLists.txt

Sets up the primary litert_lm_main executable. It configures linker options to export LiteRt* symbols, specifies include directories including a JSON third-party path, and links against a comprehensive set of LiteRTLM runtime and executor libraries. Android-specific libraries (EGL, GLESv3) are linked if the ANDROID flag is set.

```cmake
add_litertlm_executable(litert_lm_main
    litert_lm_main.cc
  )
  target_link_options(litert_lm_main PRIVATE ${LITERTLM_UNIFIED_LINK_SPEC})
  target_link_options(litert_lm_main PRIVATE
    "LINKER:--export-dynamic-symbol=LiteRt*"
  )

  target_include_directories(litert_lm_main PRIVATE
    ${LITERTLM_INCLUDE_PATHS}
    ${LITERT_INCLUDE_PATHS}
    ${THIRD_PARTY_DIR}/json/include
  )

  target_link_libraries(litert_lm_main
    PUBLIC
      LiteRTLM::Runtime::Engine::Interface
      LiteRTLM::Runtime::Engine::Settings
      LiteRTLM::Runtime::Engine::IoTypes
      LiteRTLM::Runtime::Conversation
      LiteRTLM::Runtime::Conversation::IoTypes
      LiteRTLM::Runtime::Core::EngineImpl
      runtime_executor_executor_settings_base
      runtime_executor_llm_executor_settings
      runtime_util_litert_status_util
      LITERTLM_DEPS
  )
  add_litertlm_executable(litert_lm_main
      litert_lm_main.cc
  )
  target_link_options(litert_lm_main PRIVATE
    "LINKER:--export-dynamic-symbol=LiteRt*"
  )
  target_include_directories(litert_lm_main PRIVATE
      ${LITERTLM_INCLUDE_PATHS}
      ${LITERT_INCLUDE_PATHS}
      ${THIRD_PARTY_DIR}/json/include
  )

  # Android Specifics
  if(ANDROID)
    target_link_libraries(litert_lm_main PRIVATE EGL GLESv3)
  endif()
endif()
```

--------------------------------

### Clone LiteRT-LM Repository

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Clone the LiteRT-LM repository to your local machine and navigate into the project directory. This is the first step to obtaining the source code.

```bash
git clone https://github.com/google-ai-edge/LiteRT-LM.git
cd LiteRT-LM
```

--------------------------------

### Configure Build Environment

Source: https://github.com/google-ai-edge/litert-lm/blob/main/cmake/packages/litert_lm/CMakeLists.txt

Sets standard C++ versions, compiler flags, and build options for the project.

```cmake
if(NOT CMAKE_CXX_STANDARD)
    set(CMAKE_CXX_STANDARD 20)
endif()
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_POLICY_VERSION_MINIMUM 3.20 CACHE STRING "Required CMake policy version." FORCE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fpermissive" CACHE STRING "Required CXX flags" FORCE)

add_compile_options("-fpermissive")

option(LITERT_BUILD_CONFIG_DISABLE_GPU_VAL
    "LiteRT definition to disable GPU in build config"
    TRUE
)

option(LITERT_BUILD_CONFIG_DISABLE_NPU_VAL
    "LiteRT definition to disable NPU in build config"
    TRUE
)

set_property(GLOBAL PROPERTY LITERTLM_LOCAL_ARCHIVE_REGISTRY "")
set_property(GLOBAL PROPERTY LITERTLM_LOCAL_TARGET_REGISTRY "")
```

--------------------------------

### Include Project Modules

Source: https://github.com/google-ai-edge/litert-lm/blob/main/cmake/packages/litert_lm/CMakeLists.txt

Loads utility, macro, and dependency management modules.

```cmake
# --- Utilities
include("${LITERTLM_MODULES_DIR}/utils.cmake")
include("${LITERTLM_MODULES_DIR}/macros.cmake")

# --- Dependencies
include("${LITERTLM_MODULES_DIR}/fetch_content.cmake")
include("${LITERTLM_MODULES_DIR}/external_project.cmake")
include("${LITERTLM_MODULES_DIR}/collect_dependencies.cmake")
```

--------------------------------

### Run Gemma4-E4B with MTP using LiteRT-LM CLI

Source: https://github.com/google-ai-edge/litert-lm/blob/main/README.md

This command demonstrates how to run the Gemma4-E4B model with MTP on various platforms using the LiteRT-LM CLI. It specifies the model repository, backend, speculative decoding, and a sample prompt.

```bash
litert-lm run  \
   --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
   gemma-4-E4B-it.litertlm \
   --backend=gpu \
   --enable-speculative-decoding=true \
   --prompt="What is the capital of France?"
```

--------------------------------

### Engine Initialization

Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md

Initializes the LiteRT-LM Engine, which is the primary entry point for interacting with the API. It handles model loading and session management. Remember to delete the engine to free up resources.

```APIDOC
## Initialize the Engine

The `Engine` is the entry point to the API. It handles model loading, session
creation, and resource management. Remember to `delete` the engine to release
resources when the model is no longer needed.

**Note:** Initializing the engine can take several seconds to load the model.

```ts
import {Engine, EngineSettings} from '@litert-lm/core';

const engineSettings = {
  model: 'url/path/to/model.litertlm', // or a ReadableStream, or a Blob

  // You can configure context length and other settings here
  mainExecutorSettings: {
    maxNumTokens: 8192,
  },
} satisfies EngineSettings;

const engine = await Engine.create(engineSettings);

// ... Use the engine to create a conversation ...

// Delete the engine when done.
await engine.delete();
```
```

--------------------------------

### Initialize LiteRT-LM Engine

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md

Ensure that engine.initialize() is explicitly called before creating any conversations or sessions to properly set up the inference engine.

```java
engine.initialize()
```

--------------------------------

### Define Thread Options Interface

Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/framework/CMakeLists.txt

Creates an interface library for thread options and sets up include directories.

```cmake
add_litertlm_library(runtime_framework_thread_options INTERFACE)
add_library(LiteRTLM::Framework::ThreadOptions ALIAS runtime_framework_thread_options)

target_include_directories(runtime_framework_thread_options
  INTERFACE
    ${PKG_ROOT}
    ${LITERTLM_INCLUDE_PATHS}
)
```

--------------------------------

### Build LiteRT-LM file programmatically with Python API

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md

Demonstrates building a .litertlm file using the `LitertLmFileBuilder` Python class, adding metadata, TFLite models, and tokenizers, then serializing the output. It also shows how to use the `peek_litertlm_file` function.

```python
import os
import sys

# Core classes directly importable from the top level
from litert_lm_builder import (
    LitertLmFileBuilder,
    Metadata,
    DType,
    TfLiteModelType,
    Backend,
    peek_litertlm_file
)

def build_demo_model():
    """Builds a .litertlm file programmatically using the Python API."""
    # Paths to your assets
    model_path = "schema/testdata/attention.tflite"
    sp_path = "runtime/components/testdata/sentencepiece.model"
    output_path = "demo_api.litertlm"

    # Initialize the core builder object
    builder = LitertLmFileBuilder()

    # Add metadata
    builder.add_system_metadata(Metadata(key="Authors", value="ODML Team", dtype=DType.STRING))
    builder.add_system_metadata(Metadata(key="TargetBackend", value=Backend.CPU.name, dtype=DType.STRING))

    # Add main TfLite model
    builder.add_tflite_model(
        tflite_model_path=model_path,
        model_type=TfLiteModelType.PREFILL_DECODE
    )

    # Add auxiliary tokens & tokenizers
    builder.add_sentencepiece_tokenizer(
        sp_tokenizer_path=sp_path
    )

    # Serialize stream to your output file
    with open(output_path, "wb") as f:
        builder.build(f)
    print(f"Successfully built {output_path}")

    # You can also use the peek programmatic API identically
    print(f"\n--- Peeking at {output_path} ---")
    peek_litertlm_file(output_path, None, sys.stdout)

if __name__ == "__main__":
    build_demo_model()
```

--------------------------------

### Verify LiteRT-LM Binary

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md

Run the LiteRT-LM main binary to verify its integrity and functionality. This command performs a CPU-based inference test with a specified model and input prompt.

```bash
./litert_lm_main \
  --model_path=/path/to/gemma-3n-E2B-it-int4.litertlm \
  --backend=cpu \
  --input_prompt="What is the tallest building in the world?"
```

--------------------------------

### Build LiteRT-LM Container Image

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md

Build the Docker image for LiteRT-LM using the provided Containerfile. This command pulls necessary build dependencies and prepares the container environment.

```bash
podman build -f /path/to/repo/cmake/Containerfile -t litert_lm /path/to/repo
```

--------------------------------

### Initialize LiteRT-LM Engine

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

This Kotlin code initializes the LiteRT-LM Engine with a specified model path and backend configuration. It's recommended to call `engine.initialize()` on a background thread to prevent UI blocking. Remember to close the engine when done.

```kotlin
import com.google.ai.edge.litertlm.Backend
import com.google.ai.edge.litertlm.Engine
import com.google.ai.edge.litertlm.EngineConfig

val engineConfig = EngineConfig(
    modelPath = "/path/to/your/model.litertlm", // Replace with your model path
    backend = Backend.CPU(), // Or Backend.GPU() and Backend.NPU("...")
    // Optional: Pick a writable dir. This can improve 2nd load time.
    // cacheDir = "/tmp/" or context.cacheDir.path (for Android)
)

val engine = Engine(engineConfig)
engine.initialize()
// ... Use the engine to create a conversation ...

// Close the engine when done
engine.close()
```

--------------------------------

### Verify APK Contents with zipinfo

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md

Run this command in your terminal to verify that .so files are correctly located in the compiled APK. Replace <app_name> with your application's name.

```bash
zipinfo bazel-bin/<app_name>.apk 'lib/*'
```

--------------------------------

### Configure Multi-modal Engine on CPU

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md

If GPU initialization fails, this code attempts to initialize the multi-modal engine strictly on a CPU-only configuration.

```java
engine.initialize(new EngineConfig(backend = Backend.CPU(), audioBackend = Backend.CPU(), visionBackend = Backend.CPU()))
```

--------------------------------

### Run LLM with custom prompt

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Executes the model using a user-provided input prompt.

```bash
<path to binary directory>/litert_lm_main \
    --backend=cpu \
    --input_prompt="Write me a song"
    --model_path=$MODEL_PATH
```

--------------------------------

### Build LiteRT-LM for Android

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Compile the binary specifically for the Android ARM64 architecture.

```bash
bazel build --config=android_arm64 //runtime/engine:litert_lm_main
```

--------------------------------

### Define Tools with OpenAPI Specification

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

Implement the OpenApiTool interface and provide the tool's description as a JSON string conforming to the OpenAPI specification. This offers fine-grained control and is useful if you already have an OpenAPI schema.

```kotlin
import com.google.ai.edge.litertlm.OpenApiTool

class SampleOpenApiTool : OpenApiTool {

    override fun getToolDescriptionJsonString(): String {
        return """
        {
          "name": "addition",
          "description": "Add all numbers.",
          "parameters": {
            "type": "object",
            "properties": {
              "numbers": {
                "type": "array",
                "items": {
                  "type": "number"
                }
              },
              "description": "The list of numbers to sum."
            },
            "required": [
              "numbers"
            ]
          }
        }
        """.trimIndent() // Tip: trim to save tokens
    }

    override fun execute(paramsJsonString: String): String {
        // Parse paramsJsonString with your choice of parser/deserializer and
        // execute the tool.

        // Return the result as a JSON string
        return """{"result": 1.4142}"""
    }
}
```

--------------------------------

### Configure Android Manifest for GPU Backend

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

Add these lines to your AndroidManifest.xml within the `<application>` tag to enable the GPU backend for LiteRT-LM. These are optional native libraries.

```xml
  <application>
    <uses-native-library android:name="libvndksupport.so" android:required="false"/>
    <uses-native-library android:name="libOpenCL.so" android:required="false"/>
  </application>
```

--------------------------------

### Build Main Executable

Source: https://github.com/google-ai-edge/litert-lm/blob/main/cmake/packages/litert_lm/CMakeLists.txt

Adds subdirectories and defines the main executable target with necessary dependencies and definitions.

```cmake
add_subdirectory(${LITERTLM_PROJECT_ROOT}/c ${GENERATED_SRC_DIR}/c)
add_subdirectory(${LITERTLM_PROJECT_ROOT}/schema ${GENERATED_SRC_DIR}/schema)
add_subdirectory(${LITERTLM_PROJECT_ROOT}/runtime ${GENERATED_SRC_DIR}/runtime)

include("${LITERTLM_MODULES_DIR}/local_aggregate.cmake")
generate_local_aggregate()

add_executable(litert_lm_main
    "${LITERTLM_PROJECT_ROOT}/runtime/engine/litert_lm_main.cc"
)

if(TARGET litertlm_local_anchor)
    add_dependencies(litert_lm_main litertlm_local_anchor)
endif()

target_compile_definitions(litert_lm_main PRIVATE
    ENABLE_HUGGINGFACE_TOKENIZER
    ENABLE_SENTENCEPIECE_TOKENIZER
)

target_include_directories(litert_lm_main PRIVATE

    ${LITERTLM_INCLUDE_PATHS}

)

target_link_options(litert_lm_main PRIVATE
  "LINKER:--export-dynamic-symbol=LiteRt*"
)
```

--------------------------------

### Inspect LiteRT-LM file with litert-lm-peek

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md

Use the `litert-lm-peek` CLI to dump diagnostic information of a .litertlm file to standard output.

```bash
# Dump diagnostic info to stdout
litert-lm-peek --litertlm_file demo.litertlm
```

--------------------------------

### Configure Application BUILD File Dependencies

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/dependency_source_build.md

In your application's `BUILD` file, depend directly on the prebuilt LiteRT-LM bindings (`litertlm_kotlin` and `litertlm_native`). Transitive dependencies are exported by `litertlm_kotlin`, so no redundant target declarations are needed in the app target.

```starlark
# In your android_binary or kt_android_library target:
deps = [
    "//litert_lm_prebuilt:litertlm_kotlin",
    "//litert_lm_prebuilt:litertlm_native",
    # ... other dependencies
]
```

--------------------------------

### Build LiteRT-LM file with chained CLI commands

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md

Use the `litert-lm-builder` CLI to chain subcommands for adding metadata, TFLite models, tokenizers, and specifying output.

```bash
litert-lm-builder \
  system_metadata --str Authors "ODML Team" \
  tflite_model --path schema/testdata/attention.tflite --model_type prefill_decode \
  sp_tokenizer --path runtime/components/testdata/sentencepiece.model \
  output --path demo.litertlm
```

--------------------------------

### Build a Terminal Chat App with Kotlin API

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

This snippet demonstrates how to build a basic terminal chat application using the LiteRT-LM Kotlin API. It includes engine initialization, conversation creation, and message handling. Ensure the model path is correctly set.

```kotlin
import com.google.ai.edge.litertlm.*

suspend fun main() {
  Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // Hide log for TUI app

  val engineConfig = EngineConfig(modelPath = "/path/to/model.litertlm")
  Engine(engineConfig).use { engine ->
    engine.initialize()

    engine.createConversation().use { conversation ->
      while (true) {
        print("\n>>> ")
        conversation.sendMessageAsync(readln()).collect { print(it) }
      }
    }
  }
}
```

--------------------------------

### Markdown Rendering Library

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/ui_layout_and_state.md

Integrate the Markwon library to support rendering Markdown in text messages. Ensure version 4.6.2 or compatible.

```gradle
implementation "io.noties.markwon:core:4.6.2"
```

--------------------------------

### Add LiteRTLM Top P CPU Sampler Library

Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/components/CMakeLists.txt

Defines a static library for the Top P CPU sampler, specifying its source file, include directories, and linked libraries.

```cmake
add_litertlm_library(runtime_components_top_p_cpu_sampler STATIC
  top_p_cpu_sampler.cc
)
add_library(LiteRTLM::Runtime::Components::Sampler::TopP ALIAS runtime_components_top_p_cpu_sampler)

target_include_directories(runtime_components_top_p_cpu_sampler
  PUBLIC
    ${PKG_ROOT}
    ${LITERTLM_INCLUDE_PATHS}
)

target_link_libraries(runtime_components_top_p_cpu_sampler
  PUBLIC
    LiteRTLM::Runtime::Components::Sampler::Interface
    LiteRTLM::Runtime::Components::SamplingCpuUtil
    runtime_util_convert_tensor_buffer
    runtime_util_tensor_buffer_util

    LITERTLM_DEPS
)
```

--------------------------------

### Define Folder Facade Library

Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/engine/CMakeLists.txt

Creates an INTERFACE library that acts as a facade for the LiteRTLM runtime engine components. This simplifies linking by providing a single target that re-exports the necessary interface, settings, I/O types, library, and shared flags libraries.

```cmake
add_library(runtime_engine_libs INTERFACE)
add_library(LiteRTLM::Runtime::Engine ALIAS runtime_engine_libs)

target_link_libraries(runtime_engine_libs INTERFACE
  LiteRTLM::Runtime::Engine::Interface
  LiteRTLM::Runtime::Engine::Settings
  LiteRTLM::Runtime::Engine::IoTypes
  LiteRTLM::Runtime::Engine::Lib
  LiteRTLM::Runtime::Engine::SharedFlags
)
```

--------------------------------

### Benchmark LLM performance

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md

Runs the model with specific prefill and decode token counts for performance analysis.

```bash
<path to binary directory>/litert_lm_main \
    --backend=cpu \
    --model_path=$MODEL_PATH \
    --benchmark \
    --benchmark_prefill_tokens=1024 \
    --benchmark_decode_tokens=256 \
    --async=false
```

--------------------------------

### Declare Tools for LiteRT-LM

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md

Define available tools using a JSON schema string and assign them to a JsonPreface object.

```c++
constexpr absl::string_view kToolString = R"([
{
  "name": "get_weather",
  "description": "Returns the weather for a given location.",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "The location to get the weather for."
      }
    },
    "required": [
      "location"
    ]
  }
},
{
  "name": "get_stock_price",
  "description": "Returns the stock price for a given stock symbol.",
  "parameters": {
    "type": "object",
    "properties": {
      "stock_symbol": {
        "type": "string",
        "description": "The stock symbol to get the price for."
      }
    },
    "required": [
      "stock_symbol"
    ]
  }
}
])";

JsonPreface preface;
preface.tools = nlohmann::ordered_json::parse(kToolString);
```

--------------------------------

### Configure Multi-modal Engine on GPU

Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md

This code attempts to configure and initialize a multi-modal engine on a selected general backend like GPU, setting CPU-locked audio and unified vision backends.

```java
engine.initialize(new EngineConfig(backend = Backend.GPU(), audioBackend = Backend.CPU(), visionBackend = Backend.CPU()))
```

--------------------------------

### Implement Callback and Print Helpers

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md

Provides the implementation for the asynchronous callback and the helper function to print message content to the console.

```cpp
absl::AnyInvocable<void(absl::StatusOr<Message>)> CreatePrintMessageCallback(
    std::stringstream& captured_output) {
  return [&captured_output](absl::StatusOr<Message> message) {
    if (!message.ok()) {
      std::cout << message.status().message() << std::endl;
      return;
    }
    if (message->empty()) {
      std::cout << std::endl << std::flush;
      return;
    }
    ABSL_CHECK_OK(PrintMessage(*message, captured_output,
                               /*streaming=*/true));
  };
}

absl::Status PrintMessage(const Message& message,
                              std::stringstream& captured_output,
                              bool streaming = false) {
  if (message["content"].is_array()) {
    for (const auto& content : message["content"]) {
      if (content["type"] == "text") {
        captured_output << content["text"].get<std::string>();
        std::cout << content["text"].get<std::string>();
      }
    }
    if (!streaming) {
      captured_output << std::endl << std::flush;
      std::cout << std::endl << std::flush;
    } else {
      captured_output << std::flush;
      std::cout << std::flush;
    }
  } else if (message["content"]["text"].is_string()) {
    if (!streaming) {
      captured_output << message["content"]["text"].get<std::string>()
                      << std::endl
                      << std::flush;
      std::cout << message["content"]["text"].get<std::string>() << std::endl
                << std::flush;
    } else {
      captured_output << message["content"]["text"].get<std::string>()
                      << std::flush;
      std::cout << message["content"]["text"].get<std::string>() << std::flush;
    }
  } else {
    return absl::InvalidArgumentError("Invalid message: " + message.dump());
  }
  return absl::OkStatus();
}
```

--------------------------------

### Aggregate Framework Libraries

Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/framework/CMakeLists.txt

Creates an interface library that aggregates core framework components.

```cmake
add_library(runtime_framework_libs INTERFACE)
add_library(LiteRTLM::Framework ALIAS runtime_framework_libs)

target_link_libraries(runtime_framework_libs INTERFACE
  LiteRTLM::Framework::ThreadOptions
  LiteRTLM::Framework::ThreadPool
)
```

--------------------------------

### Register Tools in ConversationConfig

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

Include instances of your defined tools (SampleToolSet and SampleOpenApiTool) in the tools list within ConversationConfig to make them available to the language model. The model will automatically decide when to call these tools based on the conversation.

```kotlin
val conversation = engine.createConversation(
    ConversationConfig(
        tools = listOf(
            tool(SampleToolSet()),
            tool(SampleOpenApiTool()),
        ),
        // ... other configs
    )
)

// Send messages that might trigger the tool
conversation.sendMessageAsync("What's the weather like in London?", callback)
```

--------------------------------

### Perform Multi-Modal Inference

Source: https://github.com/google-ai-edge/litert-lm/blob/main/python/colabs/Getting Started with LiteRT-LM Python API.ipynb

Download an audio file and process it using the engine with an audio backend.

```python
!wget https://github.com/google-ai-edge/LiteRT-LM/raw/refs/heads/main/runtime/testdata/have_a_wonderful_day.wav -O have_a_wonderful_day.wav
```

```python
import litert_lm

user_message = {
    "role": "user",
    "content": [
        {"type": "audio", "path": "/content/have_a_wonderful_day.wav"},
        {"type": "text", "text": "Describe this audio. What does it say?"},
    ],
}

# Initialize the engine with Audio backend.
with litert_lm.Engine("model.litertlm", audio_backend=litert_lm.Backend.CPU) as engine:
  with engine.create_conversation() as conversation:
    for chunk in conversation.send_message_async(user_message):
      print(chunk["content"][0]["text"], end="", flush=True)
```

--------------------------------

### Configure LiteRT-LM Build

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md

Use CMake to configure the build directory for LiteRT-LM. This command sets up the build type to Release and specifies the C++ standard to 20.

```bash
cmake -B cmake/build -G "Unix Makefiles" \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CXX_STANDARD=20
```

--------------------------------

### Create and Manage Conversations

Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md

Initialize a conversation with optional configuration and manage its lifecycle using manual closing or the use block.

```kotlin
import com.google.ai.edge.litertlm.ConversationConfig
import com.google.ai.edge.litertlm.Message
import com.google.ai.edge.litertlm.SamplerConfig

// Optional: Configure the system instruction, initial messages, sampling
// parameters, etc.
val conversationConfig = ConversationConfig(
    systemInstruction = Contents.of("You are a helpful assistant."),
    initialMessages = listOf(
        Message.user("What is the capital city of the United States?"),
        Message.model("Washington, D.C."),
    ),
    samplerConfig = SamplerConfig(topK = 10, topP = 0.95, temperature = 0.8),
)

val conversation = engine.createConversation(conversationConfig)
// Or with default config:
// val conversation = engine.createConversation()

// ... Use the conversation ...

// Close the conversation when done
conversation.close()
```

```kotlin
engine.createConversation(conversationConfig).use { conversation ->
    // Interact with the conversation
}
```