### Install litert-lm-builder Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md Install the litert-lm-builder package using uv and pip within a virtual environment. ```bash uv venv source .venv/bin/activate uv pip install litert-lm-builder ``` -------------------------------- ### Run Example with Bazel Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md Command to run the example chat application using Bazel. Replace `` with the absolute path to your LiteRT-LM model file. ```bash bazel run -c opt //kotlin/java/com/google/ai/edge/litertlm/example:main -- ``` -------------------------------- ### Example Preface with System Instruction and Tools Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md Use this preface to provide initial system instructions and define tools for the LLM. It also demonstrates disabling the thinking mode. ```cpp Preface preface = JsonPreface({ .messages = { {"role", "system"}, {"content", {"You are a model that can do function calling."}} }, .tools = { { {"name", "get_weather"}, {"description", "Returns the weather for a given location."}, {"parameters", { {"type", "object"}, {"properties", { {"location", { {"type", "string"}, {"description", "The location to get the weather for."} }} }}, {"required", {"location"}} }} }, { {"name", "get_stock_price"}, {"description", "Returns the stock price for a given stock symbol."}, {"parameters", { {"type", "object"}, {"properties", { {"stock_symbol", { {"type", "string"}, {"description", "The stock symbol to get the price for."} }} }}, {"required", {"stock_symbol"}} }} } }, .extra_context = { {"enable_thinking": false} } }); ``` -------------------------------- ### Install LiteRT-LM API Source: https://github.com/google-ai-edge/litert-lm/blob/main/python/colabs/Getting Started with LiteRT-LM Python API.ipynb Install the necessary packages from PyPI to use the LiteRT-LM API and Hugging Face hub. ```python !pip install litert-lm huggingface_hub ``` -------------------------------- ### Install LiteRT-LM Core Package Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md Instructions for installing the LiteRT-LM core package using npm or importing it directly from a CDN. ```shell # From npm npm i --save @litert-lm/core ``` ```javascript # From a CDN (in your JavaScript file) import * as litertlm from 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm'; ``` -------------------------------- ### Start and Attach to Container Session Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md Start the container and attach your shell to it. To re-enter an existing container, simply rerun this command. ```bash podman start --attach litert_lm ``` -------------------------------- ### Example Prompt for LiteRT-LM Android Demo App Skill Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/README.md This is an example prompt to trigger the LiteRT-LM Android demo application creation skill. Ensure your prompt specifies the necessary parameters for the task. ```text Please create a LiteRT-LM Android demo app root: ~/litert_lm_litert_lm_maven_integration Maven Integration scenario Target: pixel 10 model: gemma 4 ``` -------------------------------- ### Example LiteRT-LM Output Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md This is an example of the expected output when running the LiteRT-LM verification test. It includes initialization messages, the model's response, and benchmark metrics. ```text dev-sh@:LiteRT-LM$ cmake/build/litert_lm_main --model_path=$model_path/gemma-3n-E2B-it-int4.litertlm --backend=cpu --input_prompt="What is the tallest building in the world?" INFO: Created TensorFlow Lite XNNPACK delegate for CPU. input_prompt: What is the tallest building in the world? The tallest building in the world is the **Burj Khalifa** in Dubai, United Arab Emirates. It stands at a staggering **828 meters (2,717 feet)** tall. It was completed in 2010 and continues to hold the record. BenchmarkInfo: Init Phases (2): - Executor initialization: 844.54 ms - Tokenizer initialization: 66.70 ms Total init time: 911.25 ms -------------------------------------------------- Time to first token: 2.40 s -------------------------------------------------- Prefill Turns (Total 1 turns): Prefill Turn 1: Processed 18 tokens in 2.311920273s duration. Prefill Speed: 7.79 tokens/sec. -------------------------------------------------- Decode Turns (Total 1 turns): Decode Turn 1: Processed 62 tokens in 5.53092314s duration. Decode Speed: 11.21 tokens/sec. -------------------------------------------------- -------------------------------------------------- ``` -------------------------------- ### Run LiteRT-LM from Terminal Source: https://github.com/google-ai-edge/litert-lm/blob/main/README.md Installs and runs a Gemma model directly from the terminal using the `uv` package manager and `litert-lm` CLI. Useful for quick testing without writing code. ```bash uv tool install litert-lm litert-lm run \ --from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \ gemma-3n-E2B-it-int4 \ --prompt="What is the capital of France?" ``` -------------------------------- ### Engine Initialization with Cascading Fallback Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/inference_implementation.md Implement a 3-stage cascading fallback strategy for engine initialization, starting with multi-modal on a selected backend, falling back to multi-modal on CPU, and finally to text-only on CPU. Ensure UI state is mapped accordingly. ```kotlin val config = EngineConfig( modelPath = modelPath, backend = Backend.GPU(), visionBackend = Backend.GPU(), audioBackend = Backend.CPU() ) try { engine.initialize(config) // Transition UI to Multi-modal Model state } catch (e1: Exception) { Log.w("LiteRT-LM", "Multi-modal GPU initialization failed, falling back to CPU.", e1) val configCpu = EngineConfig( modelPath = modelPath, backend = Backend.CPU(), visionBackend = Backend.CPU(), audioBackend = Backend.CPU() ) try { engine.initialize(configCpu) // Transition UI to Multi-modal Model state } catch (e2: Exception) { Log.e("LiteRT-LM", "Multi-modal CPU initialization failed, falling back to text-only.", e2) val configTextOnly = EngineConfig(modelPath = modelPath, backend = Backend.CPU()) try { engine.initialize(configTextOnly) // Transition UI to Text-only Model state } catch (e3: Exception) { Log.e("LiteRT-LM", "Text-only CPU initialization failed.", e3) // Bubble error up to UI } } } ``` -------------------------------- ### Configure NPU Backend with Native Library Directory Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md Example of configuring the LiteRT-LM Engine to use the NPU backend, specifying the native library directory. This is particularly relevant for Android applications where libraries might be bundled. ```kotlin val engineConfig = EngineConfig( modelPath = modelPath, backend = Backend.NPU(nativeLibraryDir = context.applicationInfo.nativeLibraryDir) ) ``` -------------------------------- ### REPL Chat App Example Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md A sample REPL chat application demonstrating the LiteRT-LM JavaScript API for interactive conversations in a web browser. It initializes the engine with a Gemma model and handles user input and AI responses. ```html
``` -------------------------------- ### Tool Execution Result Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md Example JSON object returned by the local tool execution function. ```json { "tool_name": "get_weather", "location":"Paris", "temperature":72, "unit":"F", "humidity":50, "condition":"Sunny" } ``` -------------------------------- ### Initialize and Run LiteRT LLM Engine Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md Demonstrates the full workflow of setting up the engine, creating a conversation, and sending messages synchronously or asynchronously. ```cpp #include "runtime/engine/engine.h" // ... // 1. Define model assets and engine settings. auto model_assets = ModelAssets::Create(model_path); CHECK_OK(model_assets); auto engine_settings = EngineSettings::CreateDefault( model_assets, /*backend=*/litert::lm::Backend::CPU); // 2. Create the main Engine object. absl::StatusOr> engine = Engine::CreateEngine(engine_settings); CHECK_OK(engine); // 3. Create a Conversation auto conversation_config = ConversationConfig::CreateDefault(**engine); CHECK_OK(conversation_config) absl::StatusOr> conversation = Conversation::Create(**engine, *conversation_config); CHECK_OK(conversation); // 4. Send message to the LLM with blocking call. absl::StatusOr model_message = (*conversation)->SendMessage( Message{ {"role", "user"}, {"content", "What is the tallest building in the world?"} }); CHECK_OK(model_message); // 5. Print the model message. std::cout << *model_message << std::endl; // 6. Send message to the LLM with asynchronous call // where CreatePrintMessageCallback is a users implemented callback that would // process the message once a chunk of message output is received. std::stringstream captured_output; (*conversation)->SendMessageAsync( Message{ {"role", "user"}, {"content", "What is the tallest building in the world?"} }, CreatePrintMessageCallback(std::stringstream& captured_output) ); // Wait until asynchronous finish or timeout. *engine->WaitUntilDone(absl::Seconds(10)); ``` -------------------------------- ### Build Litert LM Binary for Windows Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Use Bazel to build the litert_lm_main binary for Windows. Ensure you are using Bazelisk and the correct configuration. ```bash # Build litert_lm_main for Windows. bazelisk build //runtime/engine:litert_lm_main --config=windows ``` -------------------------------- ### Deploy and Run LiteRT LM on GPU Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Requires additional prebuilt .so files for the arm64 architecture. Ensure the LD_LIBRARY_PATH is set to the device folder containing the libraries. ```bash # Skip model push if it is already there adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm adb push prebuilt/android_arm64/*.so $DEVICE_FOLDER adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER adb shell LD_LIBRARY_PATH=$DEVICE_FOLDER \ $DEVICE_FOLDER/litert_lm_main \ --backend=gpu \ --model_path=$DEVICE_FOLDER/model.litertlm ``` -------------------------------- ### Deploy and Run LiteRT LM on CPU Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Use these commands to push the model and binary to an Android device and execute the model using the CPU backend. ```bash # Skip model push if it is already there adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER adb shell $DEVICE_FOLDER/litert_lm_main \ --backend=cpu \ --model_path=$DEVICE_FOLDER/model.litertlm ``` -------------------------------- ### Prepare Android Device Directory Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Create a target directory on the Android device for binary and model deployment. ```bash export DEVICE_FOLDER=/data/local/tmp/ adb shell mkdir -p $DEVICE_FOLDER ``` -------------------------------- ### Model Tool Call Response Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md Example JSON structure returned by the model when a tool call is requested. ```json { "tool_calls": [ { "type": "function", "function": { "name": "get_weather", "arguments": { "location": "Paris" } } } ] } ``` -------------------------------- ### Initialize Conversation with Tools Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md Configure the Conversation object by passing the prepared Preface to the ConversationConfig builder. ```c++ // Set model file path and backend. std::string model_path = absl::GetFlag(FLAGS_model_path); ASSIGN_OR_RETURN(ModelAssets model_assets, ModelAssets::Create(model_path)); ASSIGN_OR_RETURN( EngineSettings engine_settings, EngineSettings::CreateDefault(std::move(model_assets), Backend::CPU)); // Create `Engine`. ASSIGN_OR_RETURN( std::unique_ptr engine, litert::lm::Engine::CreateEngine(std::move(engine_settings))); // Create `Conversation`. auto session_config = litert::lm::SessionConfig::CreateDefault(); ASSIGN_OR_RETURN(auto conversation_config, ConversationConfig::Builder() .SetSessionConfig(session_config) .SetPreface(preface) .Build(*engine)); ASSIGN_OR_RETURN(std::unique_ptr conversation, Conversation::Create(*engine, conversation_config)); ``` -------------------------------- ### Model Final Response Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md Example JSON structure containing the model's natural language interpretation of the tool result. ```json { "content": [ { "type": "text", "text": "The weather in Paris is sunny with a temperature of 72°F and humidity of 50%." } ] } ``` -------------------------------- ### Build LiteRT-LM Binary Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Compile the litert_lm_main executable using Bazel. ```bash bazel build //runtime/engine:litert_lm_main ``` -------------------------------- ### Initialize Multi-modal EngineConfig Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md Explicitly initialize visionBackend and audioBackend in EngineConfig. Ensure audioBackend is strictly set to Backend.CPU(). ```java new EngineConfig(..., audioBackend = Backend.CPU(), visionBackend = ...) ``` -------------------------------- ### Resolving Filename from Content URI Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/ui_layout_and_state.md When handling content URIs, query ContentResolver for OpenableColumns.DISPLAY_NAME to get the actual filename, rather than relying on Uri.lastPathSegment. ```kotlin fun getFileName(uri: Uri, contentResolver: ContentResolver): String? { var result: String? = null if (uri.scheme == "content") { contentResolver.query(uri, null, null, null, null)?.use { it.moveToFirst() val displayNameIndex = it.getColumnIndex(OpenableColumns.DISPLAY_NAME) result = it.getString(displayNameIndex) } } if (result == null) { result = uri.lastPathSegment } return result } ``` -------------------------------- ### Build LiteRT-LM file using TOML configuration Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md Dynamically drive the `litert-lm-builder` CLI by reading configuration from a TOML file. ```bash litert-lm-builder toml --path example.toml output --path real_via_toml.litertlm ``` -------------------------------- ### Run LiteRT-LM on Linux or MacOS Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Execute the built binary on Unix-like systems. ```bash bazel-bin/runtime/engine/litert_lm_main \ --backend=cpu \ --model_path=$MODEL_PATH ``` -------------------------------- ### Send Message (Non-Streaming) Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md Example of sending a message to the conversation without expecting a streamed response. The full response is returned at once. Supports both simple string input and structured message objects. ```typescript // Simple string input let response = await conversation.sendMessage("What is the capital of France?"); console.log(response.content[0].text); // Or with full message structure response = await conversation.sendMessage({role: 'user', content: '...'}); ``` -------------------------------- ### Example of Asynchronous Message Chunks Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md Illustrates how the MessageCallback might be invoked multiple times with sequential parts of a model's response. Implementers need to accumulate these chunks for the complete message. ```json { "role": "model", "content": [ "type": "text", "text": "He" ] } ``` ```json { "role": "model", "content": [ "type": "text", "text": "llo" ] } ``` ```json { "role": "model", "content": [ "type": "text", "text": " Wo" ] } ``` ```json { "role": "model", "content": [ "type": "text", "text": "rl" ] } ``` ```json { "role": "model", "content": [ "type": "text", "text": "d!" ] } ``` -------------------------------- ### Initialize LiteRT-LM Engine Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md Demonstrates how to initialize the LiteRT-LM Engine, which is the primary entry point for model loading and session management. Ensure to delete the engine when it's no longer needed to release resources. Model initialization can take several seconds. ```typescript import {Engine, EngineSettings} from '@litert-lm/core'; const engineSettings = { model: 'url/path/to/model.litertlm', // or a ReadableStream, or a Blob // You can configure context length and other settings here mainExecutorSettings: { maxNumTokens: 8192, }, } satisfies EngineSettings; const engine = await Engine.create(engineSettings); // ... Use the engine to create a conversation ... // Delete the engine when done. await engine.delete(); ``` -------------------------------- ### Create a Conversation Instance Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md Shows how to create a Conversation instance from an initialized engine. Customization is possible using ConversationConfig, such as setting a system preface for the assistant's behavior. ```typescript const conversation = await engine.createConversation({ preface: { messages: [ {role: 'system', content: 'You are a helpful assistant'} ] } }); conversation.sendMessage({ role: 'user', content: 'Write a poem', }); ``` -------------------------------- ### Configure LiteRT-LM Prebuilt Dependencies in BUILD Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/dependency_source_build.md This Starlark code defines `java_import` rules to register the LiteRT-LM Kotlin and native JNI JARs. It includes an example of exporting transitive dependencies required by the Kotlin bindings, ensuring they are available to downstream targets. ```starlark load(@rules_java//java:defs.bzl, "java_import") package(default_visibility = ["//visibility:public"]) java_import( name = "litertlm_kotlin", jars = ["litertlm-android.jar"], exports = [ # Export external libraries used inside the Kotlin binding prebuilts # so they are automatically added to down-stream classpaths depending on this rule. @maven//:com_example_library_transitive_dependency, ], ) java_import( name = "litertlm_native", jars = ["litertlm_native.jar"], ) ``` -------------------------------- ### Run LiteRT-LM on Windows Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Execute the LiteRT-LM binary on Windows using PowerShell. ```powershell bazel-bin\runtime\engine\litert_lm_main.exe ` --backend=cpu ` --model_path=$Env:MODEL_PATH ``` -------------------------------- ### Run LLM with default prompt Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Executes the model using the CPU backend with a default prompt. ```bash /litert_lm_main \ --backend=cpu \ --model_path=$MODEL_PATH ``` -------------------------------- ### Extract components from LiteRT-LM file with litert-lm-peek Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md Use the `litert-lm-peek` CLI to extract byte-for-byte components from a .litertlm file to a specified directory. ```bash # Extract byte-for-byte components directly litert-lm-peek --litertlm_file demo.litertlm --dump_files_dir ./extracted_files ``` -------------------------------- ### Define Main Executable Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/engine/CMakeLists.txt Sets up the primary litert_lm_main executable. It configures linker options to export LiteRt* symbols, specifies include directories including a JSON third-party path, and links against a comprehensive set of LiteRTLM runtime and executor libraries. Android-specific libraries (EGL, GLESv3) are linked if the ANDROID flag is set. ```cmake add_litertlm_executable(litert_lm_main litert_lm_main.cc ) target_link_options(litert_lm_main PRIVATE ${LITERTLM_UNIFIED_LINK_SPEC}) target_link_options(litert_lm_main PRIVATE "LINKER:--export-dynamic-symbol=LiteRt*" ) target_include_directories(litert_lm_main PRIVATE ${LITERTLM_INCLUDE_PATHS} ${LITERT_INCLUDE_PATHS} ${THIRD_PARTY_DIR}/json/include ) target_link_libraries(litert_lm_main PUBLIC LiteRTLM::Runtime::Engine::Interface LiteRTLM::Runtime::Engine::Settings LiteRTLM::Runtime::Engine::IoTypes LiteRTLM::Runtime::Conversation LiteRTLM::Runtime::Conversation::IoTypes LiteRTLM::Runtime::Core::EngineImpl runtime_executor_executor_settings_base runtime_executor_llm_executor_settings runtime_util_litert_status_util LITERTLM_DEPS ) add_litertlm_executable(litert_lm_main litert_lm_main.cc ) target_link_options(litert_lm_main PRIVATE "LINKER:--export-dynamic-symbol=LiteRt*" ) target_include_directories(litert_lm_main PRIVATE ${LITERTLM_INCLUDE_PATHS} ${LITERT_INCLUDE_PATHS} ${THIRD_PARTY_DIR}/json/include ) # Android Specifics if(ANDROID) target_link_libraries(litert_lm_main PRIVATE EGL GLESv3) endif() endif() ``` -------------------------------- ### Clone LiteRT-LM Repository Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Clone the LiteRT-LM repository to your local machine and navigate into the project directory. This is the first step to obtaining the source code. ```bash git clone https://github.com/google-ai-edge/LiteRT-LM.git cd LiteRT-LM ``` -------------------------------- ### Configure Build Environment Source: https://github.com/google-ai-edge/litert-lm/blob/main/cmake/packages/litert_lm/CMakeLists.txt Sets standard C++ versions, compiler flags, and build options for the project. ```cmake if(NOT CMAKE_CXX_STANDARD) set(CMAKE_CXX_STANDARD 20) endif() set(CMAKE_CXX_STANDARD_REQUIRED ON) set(CMAKE_CXX_EXTENSIONS OFF) set(CMAKE_POSITION_INDEPENDENT_CODE ON) set(CMAKE_POLICY_VERSION_MINIMUM 3.20 CACHE STRING "Required CMake policy version." FORCE) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fpermissive" CACHE STRING "Required CXX flags" FORCE) add_compile_options("-fpermissive") option(LITERT_BUILD_CONFIG_DISABLE_GPU_VAL "LiteRT definition to disable GPU in build config" TRUE ) option(LITERT_BUILD_CONFIG_DISABLE_NPU_VAL "LiteRT definition to disable NPU in build config" TRUE ) set_property(GLOBAL PROPERTY LITERTLM_LOCAL_ARCHIVE_REGISTRY "") set_property(GLOBAL PROPERTY LITERTLM_LOCAL_TARGET_REGISTRY "") ``` -------------------------------- ### Include Project Modules Source: https://github.com/google-ai-edge/litert-lm/blob/main/cmake/packages/litert_lm/CMakeLists.txt Loads utility, macro, and dependency management modules. ```cmake # --- Utilities include("${LITERTLM_MODULES_DIR}/utils.cmake") include("${LITERTLM_MODULES_DIR}/macros.cmake") # --- Dependencies include("${LITERTLM_MODULES_DIR}/fetch_content.cmake") include("${LITERTLM_MODULES_DIR}/external_project.cmake") include("${LITERTLM_MODULES_DIR}/collect_dependencies.cmake") ``` -------------------------------- ### Run Gemma4-E4B with MTP using LiteRT-LM CLI Source: https://github.com/google-ai-edge/litert-lm/blob/main/README.md This command demonstrates how to run the Gemma4-E4B model with MTP on various platforms using the LiteRT-LM CLI. It specifies the model repository, backend, speculative decoding, and a sample prompt. ```bash litert-lm run \ --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \ gemma-4-E4B-it.litertlm \ --backend=gpu \ --enable-speculative-decoding=true \ --prompt="What is the capital of France?" ``` -------------------------------- ### Engine Initialization Source: https://github.com/google-ai-edge/litert-lm/blob/main/js/packages/core/README.md Initializes the LiteRT-LM Engine, which is the primary entry point for interacting with the API. It handles model loading and session management. Remember to delete the engine to free up resources. ```APIDOC ## Initialize the Engine The `Engine` is the entry point to the API. It handles model loading, session creation, and resource management. Remember to `delete` the engine to release resources when the model is no longer needed. **Note:** Initializing the engine can take several seconds to load the model. ```ts import {Engine, EngineSettings} from '@litert-lm/core'; const engineSettings = { model: 'url/path/to/model.litertlm', // or a ReadableStream, or a Blob // You can configure context length and other settings here mainExecutorSettings: { maxNumTokens: 8192, }, } satisfies EngineSettings; const engine = await Engine.create(engineSettings); // ... Use the engine to create a conversation ... // Delete the engine when done. await engine.delete(); ``` ``` -------------------------------- ### Initialize LiteRT-LM Engine Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md Ensure that engine.initialize() is explicitly called before creating any conversations or sessions to properly set up the inference engine. ```java engine.initialize() ``` -------------------------------- ### Define Thread Options Interface Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/framework/CMakeLists.txt Creates an interface library for thread options and sets up include directories. ```cmake add_litertlm_library(runtime_framework_thread_options INTERFACE) add_library(LiteRTLM::Framework::ThreadOptions ALIAS runtime_framework_thread_options) target_include_directories(runtime_framework_thread_options INTERFACE ${PKG_ROOT} ${LITERTLM_INCLUDE_PATHS} ) ``` -------------------------------- ### Build LiteRT-LM file programmatically with Python API Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md Demonstrates building a .litertlm file using the `LitertLmFileBuilder` Python class, adding metadata, TFLite models, and tokenizers, then serializing the output. It also shows how to use the `peek_litertlm_file` function. ```python import os import sys # Core classes directly importable from the top level from litert_lm_builder import ( LitertLmFileBuilder, Metadata, DType, TfLiteModelType, Backend, peek_litertlm_file ) def build_demo_model(): """Builds a .litertlm file programmatically using the Python API.""" # Paths to your assets model_path = "schema/testdata/attention.tflite" sp_path = "runtime/components/testdata/sentencepiece.model" output_path = "demo_api.litertlm" # Initialize the core builder object builder = LitertLmFileBuilder() # Add metadata builder.add_system_metadata(Metadata(key="Authors", value="ODML Team", dtype=DType.STRING)) builder.add_system_metadata(Metadata(key="TargetBackend", value=Backend.CPU.name, dtype=DType.STRING)) # Add main TfLite model builder.add_tflite_model( tflite_model_path=model_path, model_type=TfLiteModelType.PREFILL_DECODE ) # Add auxiliary tokens & tokenizers builder.add_sentencepiece_tokenizer( sp_tokenizer_path=sp_path ) # Serialize stream to your output file with open(output_path, "wb") as f: builder.build(f) print(f"Successfully built {output_path}") # You can also use the peek programmatic API identically print(f"\n--- Peeking at {output_path} ---") peek_litertlm_file(output_path, None, sys.stdout) if __name__ == "__main__": build_demo_model() ``` -------------------------------- ### Verify LiteRT-LM Binary Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md Run the LiteRT-LM main binary to verify its integrity and functionality. This command performs a CPU-based inference test with a specified model and input prompt. ```bash ./litert_lm_main \ --model_path=/path/to/gemma-3n-E2B-it-int4.litertlm \ --backend=cpu \ --input_prompt="What is the tallest building in the world?" ``` -------------------------------- ### Build LiteRT-LM Container Image Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md Build the Docker image for LiteRT-LM using the provided Containerfile. This command pulls necessary build dependencies and prepares the container environment. ```bash podman build -f /path/to/repo/cmake/Containerfile -t litert_lm /path/to/repo ``` -------------------------------- ### Initialize LiteRT-LM Engine Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md This Kotlin code initializes the LiteRT-LM Engine with a specified model path and backend configuration. It's recommended to call `engine.initialize()` on a background thread to prevent UI blocking. Remember to close the engine when done. ```kotlin import com.google.ai.edge.litertlm.Backend import com.google.ai.edge.litertlm.Engine import com.google.ai.edge.litertlm.EngineConfig val engineConfig = EngineConfig( modelPath = "/path/to/your/model.litertlm", // Replace with your model path backend = Backend.CPU(), // Or Backend.GPU() and Backend.NPU("...") // Optional: Pick a writable dir. This can improve 2nd load time. // cacheDir = "/tmp/" or context.cacheDir.path (for Android) ) val engine = Engine(engineConfig) engine.initialize() // ... Use the engine to create a conversation ... // Close the engine when done engine.close() ``` -------------------------------- ### Verify APK Contents with zipinfo Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md Run this command in your terminal to verify that .so files are correctly located in the compiled APK. Replace with your application's name. ```bash zipinfo bazel-bin/.apk 'lib/*' ``` -------------------------------- ### Configure Multi-modal Engine on CPU Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md If GPU initialization fails, this code attempts to initialize the multi-modal engine strictly on a CPU-only configuration. ```java engine.initialize(new EngineConfig(backend = Backend.CPU(), audioBackend = Backend.CPU(), visionBackend = Backend.CPU())) ``` -------------------------------- ### Run LLM with custom prompt Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Executes the model using a user-provided input prompt. ```bash /litert_lm_main \ --backend=cpu \ --input_prompt="Write me a song" --model_path=$MODEL_PATH ``` -------------------------------- ### Build LiteRT-LM for Android Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Compile the binary specifically for the Android ARM64 architecture. ```bash bazel build --config=android_arm64 //runtime/engine:litert_lm_main ``` -------------------------------- ### Define Tools with OpenAPI Specification Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md Implement the OpenApiTool interface and provide the tool's description as a JSON string conforming to the OpenAPI specification. This offers fine-grained control and is useful if you already have an OpenAPI schema. ```kotlin import com.google.ai.edge.litertlm.OpenApiTool class SampleOpenApiTool : OpenApiTool { override fun getToolDescriptionJsonString(): String { return """ { "name": "addition", "description": "Add all numbers.", "parameters": { "type": "object", "properties": { "numbers": { "type": "array", "items": { "type": "number" } }, "description": "The list of numbers to sum." }, "required": [ "numbers" ] } } """.trimIndent() // Tip: trim to save tokens } override fun execute(paramsJsonString: String): String { // Parse paramsJsonString with your choice of parser/deserializer and // execute the tool. // Return the result as a JSON string return """{"result": 1.4142}""" } } ``` -------------------------------- ### Configure Android Manifest for GPU Backend Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md Add these lines to your AndroidManifest.xml within the `` tag to enable the GPU backend for LiteRT-LM. These are optional native libraries. ```xml ``` -------------------------------- ### Build Main Executable Source: https://github.com/google-ai-edge/litert-lm/blob/main/cmake/packages/litert_lm/CMakeLists.txt Adds subdirectories and defines the main executable target with necessary dependencies and definitions. ```cmake add_subdirectory(${LITERTLM_PROJECT_ROOT}/c ${GENERATED_SRC_DIR}/c) add_subdirectory(${LITERTLM_PROJECT_ROOT}/schema ${GENERATED_SRC_DIR}/schema) add_subdirectory(${LITERTLM_PROJECT_ROOT}/runtime ${GENERATED_SRC_DIR}/runtime) include("${LITERTLM_MODULES_DIR}/local_aggregate.cmake") generate_local_aggregate() add_executable(litert_lm_main "${LITERTLM_PROJECT_ROOT}/runtime/engine/litert_lm_main.cc" ) if(TARGET litertlm_local_anchor) add_dependencies(litert_lm_main litertlm_local_anchor) endif() target_compile_definitions(litert_lm_main PRIVATE ENABLE_HUGGINGFACE_TOKENIZER ENABLE_SENTENCEPIECE_TOKENIZER ) target_include_directories(litert_lm_main PRIVATE ${LITERTLM_INCLUDE_PATHS} ) target_link_options(litert_lm_main PRIVATE "LINKER:--export-dynamic-symbol=LiteRt*" ) ``` -------------------------------- ### Inspect LiteRT-LM file with litert-lm-peek Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md Use the `litert-lm-peek` CLI to dump diagnostic information of a .litertlm file to standard output. ```bash # Dump diagnostic info to stdout litert-lm-peek --litertlm_file demo.litertlm ``` -------------------------------- ### Configure Application BUILD File Dependencies Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/dependency_source_build.md In your application's `BUILD` file, depend directly on the prebuilt LiteRT-LM bindings (`litertlm_kotlin` and `litertlm_native`). Transitive dependencies are exported by `litertlm_kotlin`, so no redundant target declarations are needed in the app target. ```starlark # In your android_binary or kt_android_library target: deps = [ "//litert_lm_prebuilt:litertlm_kotlin", "//litert_lm_prebuilt:litertlm_native", # ... other dependencies ] ``` -------------------------------- ### Build LiteRT-LM file with chained CLI commands Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/litert_lm_builder.md Use the `litert-lm-builder` CLI to chain subcommands for adding metadata, TFLite models, tokenizers, and specifying output. ```bash litert-lm-builder \ system_metadata --str Authors "ODML Team" \ tflite_model --path schema/testdata/attention.tflite --model_type prefill_decode \ sp_tokenizer --path runtime/components/testdata/sentencepiece.model \ output --path demo.litertlm ``` -------------------------------- ### Build a Terminal Chat App with Kotlin API Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md This snippet demonstrates how to build a basic terminal chat application using the LiteRT-LM Kotlin API. It includes engine initialization, conversation creation, and message handling. Ensure the model path is correctly set. ```kotlin import com.google.ai.edge.litertlm.* suspend fun main() { Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // Hide log for TUI app val engineConfig = EngineConfig(modelPath = "/path/to/model.litertlm") Engine(engineConfig).use { engine -> engine.initialize() engine.createConversation().use { conversation -> while (true) { print("\n>>> ") conversation.sendMessageAsync(readln()).collect { print(it) } } } } } ``` -------------------------------- ### Markdown Rendering Library Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/ui_layout_and_state.md Integrate the Markwon library to support rendering Markdown in text messages. Ensure version 4.6.2 or compatible. ```gradle implementation "io.noties.markwon:core:4.6.2" ``` -------------------------------- ### Add LiteRTLM Top P CPU Sampler Library Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/components/CMakeLists.txt Defines a static library for the Top P CPU sampler, specifying its source file, include directories, and linked libraries. ```cmake add_litertlm_library(runtime_components_top_p_cpu_sampler STATIC top_p_cpu_sampler.cc ) add_library(LiteRTLM::Runtime::Components::Sampler::TopP ALIAS runtime_components_top_p_cpu_sampler) target_include_directories(runtime_components_top_p_cpu_sampler PUBLIC ${PKG_ROOT} ${LITERTLM_INCLUDE_PATHS} ) target_link_libraries(runtime_components_top_p_cpu_sampler PUBLIC LiteRTLM::Runtime::Components::Sampler::Interface LiteRTLM::Runtime::Components::SamplingCpuUtil runtime_util_convert_tensor_buffer runtime_util_tensor_buffer_util LITERTLM_DEPS ) ``` -------------------------------- ### Define Folder Facade Library Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/engine/CMakeLists.txt Creates an INTERFACE library that acts as a facade for the LiteRTLM runtime engine components. This simplifies linking by providing a single target that re-exports the necessary interface, settings, I/O types, library, and shared flags libraries. ```cmake add_library(runtime_engine_libs INTERFACE) add_library(LiteRTLM::Runtime::Engine ALIAS runtime_engine_libs) target_link_libraries(runtime_engine_libs INTERFACE LiteRTLM::Runtime::Engine::Interface LiteRTLM::Runtime::Engine::Settings LiteRTLM::Runtime::Engine::IoTypes LiteRTLM::Runtime::Engine::Lib LiteRTLM::Runtime::Engine::SharedFlags ) ``` -------------------------------- ### Benchmark LLM performance Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/build-and-run.md Runs the model with specific prefill and decode token counts for performance analysis. ```bash /litert_lm_main \ --backend=cpu \ --model_path=$MODEL_PATH \ --benchmark \ --benchmark_prefill_tokens=1024 \ --benchmark_decode_tokens=256 \ --async=false ``` -------------------------------- ### Declare Tools for LiteRT-LM Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/tool-use.md Define available tools using a JSON schema string and assign them to a JsonPreface object. ```c++ constexpr absl::string_view kToolString = R"([ { "name": "get_weather", "description": "Returns the weather for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the weather for." } }, "required": [ "location" ] } }, { "name": "get_stock_price", "description": "Returns the stock price for a given stock symbol.", "parameters": { "type": "object", "properties": { "stock_symbol": { "type": "string", "description": "The stock symbol to get the price for." } }, "required": [ "stock_symbol" ] } } ])"; JsonPreface preface; preface.tools = nlohmann::ordered_json::parse(kToolString); ``` -------------------------------- ### Configure Multi-modal Engine on GPU Source: https://github.com/google-ai-edge/litert-lm/blob/main/agents/skills/create-litert-lm-android-demo-app/references/compliance_checklist_inference.md This code attempts to configure and initialize a multi-modal engine on a selected general backend like GPU, setting CPU-locked audio and unified vision backends. ```java engine.initialize(new EngineConfig(backend = Backend.GPU(), audioBackend = Backend.CPU(), visionBackend = Backend.CPU())) ``` -------------------------------- ### Implement Callback and Print Helpers Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/cpp/conversation.md Provides the implementation for the asynchronous callback and the helper function to print message content to the console. ```cpp absl::AnyInvocable)> CreatePrintMessageCallback( std::stringstream& captured_output) { return [&captured_output](absl::StatusOr message) { if (!message.ok()) { std::cout << message.status().message() << std::endl; return; } if (message->empty()) { std::cout << std::endl << std::flush; return; } ABSL_CHECK_OK(PrintMessage(*message, captured_output, /*streaming=*/true)); }; } absl::Status PrintMessage(const Message& message, std::stringstream& captured_output, bool streaming = false) { if (message["content"].is_array()) { for (const auto& content : message["content"]) { if (content["type"] == "text") { captured_output << content["text"].get(); std::cout << content["text"].get(); } } if (!streaming) { captured_output << std::endl << std::flush; std::cout << std::endl << std::flush; } else { captured_output << std::flush; std::cout << std::flush; } } else if (message["content"]["text"].is_string()) { if (!streaming) { captured_output << message["content"]["text"].get() << std::endl << std::flush; std::cout << message["content"]["text"].get() << std::endl << std::flush; } else { captured_output << message["content"]["text"].get() << std::flush; std::cout << message["content"]["text"].get() << std::flush; } } else { return absl::InvalidArgumentError("Invalid message: " + message.dump()); } return absl::OkStatus(); } ``` -------------------------------- ### Aggregate Framework Libraries Source: https://github.com/google-ai-edge/litert-lm/blob/main/runtime/framework/CMakeLists.txt Creates an interface library that aggregates core framework components. ```cmake add_library(runtime_framework_libs INTERFACE) add_library(LiteRTLM::Framework ALIAS runtime_framework_libs) target_link_libraries(runtime_framework_libs INTERFACE LiteRTLM::Framework::ThreadOptions LiteRTLM::Framework::ThreadPool ) ``` -------------------------------- ### Register Tools in ConversationConfig Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md Include instances of your defined tools (SampleToolSet and SampleOpenApiTool) in the tools list within ConversationConfig to make them available to the language model. The model will automatically decide when to call these tools based on the conversation. ```kotlin val conversation = engine.createConversation( ConversationConfig( tools = listOf( tool(SampleToolSet()), tool(SampleOpenApiTool()), ), // ... other configs ) ) // Send messages that might trigger the tool conversation.sendMessageAsync("What's the weather like in London?", callback) ``` -------------------------------- ### Perform Multi-Modal Inference Source: https://github.com/google-ai-edge/litert-lm/blob/main/python/colabs/Getting Started with LiteRT-LM Python API.ipynb Download an audio file and process it using the engine with an audio backend. ```python !wget https://github.com/google-ai-edge/LiteRT-LM/raw/refs/heads/main/runtime/testdata/have_a_wonderful_day.wav -O have_a_wonderful_day.wav ``` ```python import litert_lm user_message = { "role": "user", "content": [ {"type": "audio", "path": "/content/have_a_wonderful_day.wav"}, {"type": "text", "text": "Describe this audio. What does it say?"}, ], } # Initialize the engine with Audio backend. with litert_lm.Engine("model.litertlm", audio_backend=litert_lm.Backend.CPU) as engine: with engine.create_conversation() as conversation: for chunk in conversation.send_message_async(user_message): print(chunk["content"][0]["text"], end="", flush=True) ``` -------------------------------- ### Configure LiteRT-LM Build Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/getting-started/cmake.md Use CMake to configure the build directory for LiteRT-LM. This command sets up the build type to Release and specifies the C++ standard to 20. ```bash cmake -B cmake/build -G "Unix Makefiles" \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_CXX_STANDARD=20 ``` -------------------------------- ### Create and Manage Conversations Source: https://github.com/google-ai-edge/litert-lm/blob/main/docs/api/kotlin/getting_started.md Initialize a conversation with optional configuration and manage its lifecycle using manual closing or the use block. ```kotlin import com.google.ai.edge.litertlm.ConversationConfig import com.google.ai.edge.litertlm.Message import com.google.ai.edge.litertlm.SamplerConfig // Optional: Configure the system instruction, initial messages, sampling // parameters, etc. val conversationConfig = ConversationConfig( systemInstruction = Contents.of("You are a helpful assistant."), initialMessages = listOf( Message.user("What is the capital city of the United States?"), Message.model("Washington, D.C."), ), samplerConfig = SamplerConfig(topK = 10, topP = 0.95, temperature = 0.8), ) val conversation = engine.createConversation(conversationConfig) // Or with default config: // val conversation = engine.createConversation() // ... Use the conversation ... // Close the conversation when done conversation.close() ``` ```kotlin engine.createConversation(conversationConfig).use { conversation -> // Interact with the conversation } ```