### Install LM Studio Python SDK

Source: https://github.com/lmstudio-ai/lmstudio-python/blob/main/README.md

Commands to install the LM Studio Python SDK from PyPI using pip. This is the primary installation method for the SDK that provides the lmstudio package

```console
pip install lmstudio
```

--------------------------------

### Basic Text Completion with LM Studio

Source: https://github.com/lmstudio-ai/lmstudio-python/blob/main/README.md

Basic text completion using the synchronous Client API from the lmstudio package. Requires an already loaded LLM instance and handles websocket connections

```python
import lmstudio as lms

model = lms.llm()
model.complete("Once upon a time,")
```

--------------------------------

### Clone LM Studio Repository

Source: https://github.com/lmstudio-ai/lmstudio-python/blob/main/README.md

Source code retrieval using git for the LM Studio Python SDK project with recursive submodule initialization required for development

```console
git clone https://github.com/lmstudio-ai/lmstudio-python
cd lmstudio-python
```

```console
git submodule update --init --recursive
```

--------------------------------

### Repository Development Commands

Source: https://github.com/lmstudio-ai/lmstudio-python/blob/main/README.md

Additional git operations for repository maintenance and synchronization with SDK schema. Used for development environment setup and updates

```console
tox -m check
```

```console
tox -e sync-sdk-schema
```

--------------------------------

### Custom Callbacks for Prediction Progress Tracking

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

This example shows how to implement custom callbacks to monitor and react to different stages of an LLM prediction process. It covers callbacks for receiving the first token, processing prediction fragments, tracking prompt progress, and managing round starts and ends. This allows for real-time feedback and dynamic behavior during generation.

```python
import lmstudio as lms
import time

# Define callbacks
def on_first_token(round_num: int):
    print(f"[Round {round_num}] First token received")

def on_fragment(fragment: lms.LlmPredictionFragment, round_num: int):
    # Print each token as it arrives
    print(fragment.content, end="", flush=True)

def on_prompt_progress(progress: float, round_num: int):
    # Progress is 0.0 to 1.0
    if progress == 1.0:
        print(f"[Round {round_num}] Prompt processing complete")

def on_round_start(round_num: int):
    print(f"\n{'='*50}")
    print(f"Starting round {round_num}")
    print('='*50)

def on_round_end(round_num: int):
    print(f"\nRound {round_num} completed")

# Initialize
model = lms.llm()

# Non-streaming with callbacks
start_time = time.time()
result = model.complete(
    "Write a haiku about programming",
    config=lms.LlmPredictionConfig(temperature=0.8),
    on_first_token=lambda: print("Generating..."),
    on_prediction_fragment=lambda f: print(f.content, end="", flush=True)
)
elapsed = time.time() - start_time
print(f"\n\nCompleted in {elapsed:.2f}s")
print(f"Total tokens: {result.stats.get('total_tokens', 'N/A')}")

# Agent with full callback suite
chat = lms.Chat()

def my_tool(x: int) -> int:
    """Multiply by 2."""
    return x * 2

result = model.act(
    "What is 42 multiplied by 2?",
    tools=[my_tool],
    on_message=chat.append,
    on_first_token=on_first_token,
    on_prediction_fragment=on_fragment,
    on_prompt_processing_progress=on_prompt_progress,
    on_round_start=on_round_start,
    on_round_end=on_round_end,
    on_prediction_completed=lambda result: print(f"\nPrediction stats: {result}")
)

print(f"\nTotal rounds: {result.rounds}")
print(f"Total time: {result.total_time_seconds:.2f}s")
```

--------------------------------

### LM Studio Plugin Development with ToolsProvider

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

This section details how to develop custom plugins for LM Studio. It includes defining configuration schemas (per-chat and global), implementing tools that can be used by the LLM, and structuring the plugin files (manifest.json and Python script). The example demonstrates creating tools for fetching weather and calculating distances.

```json
# plugin_dir/manifest.json
{
    "name": "My Custom Plugin",
    "version": "1.0.0",
    "description": "Provides custom tools",
    "hooks": {
        "toolsProvider": "./src/plugin.py:create_tools_provider"
    }
}
```

```python
# plugin_dir/src/plugin.py
from lmstudio.plugin import ToolsProviderController, BaseConfigSchema, config_field
import requests

# Define configuration schema
class ConfigSchema(BaseConfigSchema):
    """Per-chat configuration."""
    api_key: str = config_field(
        label="API Key",
        hint="Your API key for the service",
        default=""
    )

class GlobalConfigSchema(BaseConfigSchema):
    """Global plugin configuration."""
    timeout: int = config_field(
        label="Request Timeout",
        hint="Timeout in seconds",
        default=30
    )

def create_tools_provider():
    """Create the tools provider hook."""
    controller = ToolsProviderController[ConfigSchema, GlobalConfigSchema]()

    @controller.tool()
    def fetch_weather(city: str) -> str:
        """Fetch weather information for a city."""
        # Access configuration
        config = controller.chat_config
        global_config = controller.global_config

        # Make API call (example)
        try:
            response = requests.get(
                f"https://api.weather.com/v1/{city}",
                headers={"Authorization": f"Bearer {config.api_key}"},
                timeout=global_config.timeout
            )
            return response.json()["description"]
        except Exception as e:
            return f"Error fetching weather: {e}"

    @controller.tool()
    def calculate_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
        """Calculate distance between two coordinates in kilometers."""
        from math import radians, sin, cos, sqrt, atan2

        R = 6371  # Earth radius in km

        lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
        dlat = lat2 - lat1
        dlon = lon2 - lon1

        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * atan2(sqrt(a), sqrt(1-a))

        return R * c

    return controller

# Run plugin
if __name__ == "__main__":
    from lmstudio.plugin import run_plugin
    run_plugin(plugin_dir=".")
```

--------------------------------

### LM Studio Client Usage with Error Handling in Python

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

This snippet demonstrates using a context manager for LM Studio client initialization, model loading with fallbacks, and prediction with comprehensive error handling for timeouts and failures. It requires the LM Studio server running locally and the lms library installed. Inputs include prompts and configuration; outputs are chat responses or error messages. Limitations include dependency on local server availability and potential timeouts under high load.

```python
# Use context manager for cleanup
try:
    with lms.Client() as client:
        try:
            # Try to get specific model
            model = client.llm.model("my-preferred-model")
        except LMStudioError:
            # Fallback to any loaded model
            loaded = client.list_loaded_models(namespace="llm")
            if loaded:
                model = client.llm.model(loaded[0].identifier)
            else:
                # Load default
                model = client.llm.load_new_instance("default-model")

        # Handle prediction errors
        chat = lms.Chat()
        try:
            result = model.respond(
                "Explain quantum entanglement",
                config=lms.LlmPredictionConfig(
                    temperature=0.7,
                    max_tokens=1000
                )
            )
            chat.add_assistant_response(result)
            print(result.content)

        except LMStudioTimeoutError:
            print("Request timed out - model may be overloaded")

        except LMStudioPredictionError as e:
            print(f"Prediction failed: {e}")
            chat.add_assistant_response(
                lms.AssistantResponse(content="I apologize, I encountered an error.")
            )

except LMStudioClientError as e:
    print(f"Failed to connect to LM Studio: {e}")
    print("Make sure LM Studio is running locally")

except LMStudioError as e:
    print(f"SDK error: {e}")
```

--------------------------------

### Chat Interface Development

Source: https://github.com/lmstudio-ai/lmstudio-python/blob/main/README.md

Chat response functionality using Chat helper to manage chat history and include it in response prediction requests. Supports multi-turn conversations with proper context management

```python
import lmstudio as lms

EXAMPLE_MESSAGES = (
    "My hovercraft is full of eels!",
    "I will not buy this record, it is scratched."
)

model = lms.llm()
chat = lms.Conversation("You are a helpful shopkeeper assisting a foreign traveller")
for message in EXAMPLE_MESSAGES:
    chat.add_user_message(message)
    print(f"Customer: {message}")
    response = model.respond(chat)
    chat.add_assistant_response(response)
    print(f"Shopkeeper: {response}")
```

--------------------------------

### Tool Use Error Handling in LM Studio Python SDK

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

This example shows defining a risky tool function, a custom error handler for tool calls, and using the model.act method for agent-like interactions with tools and retry logic. It depends on the lms library and a loaded model. Inputs are user requests and tool functions; outputs are processed results or error messages. Limitations include max rounds for predictions and handling of invalid tool requests.

```python
# Tool use error handling
def risky_tool(value: int) -> int:
    """A tool that might fail."""
    if value < 0:
        raise ValueError("Value must be positive")
    return value * 2

def handle_tool_error(error: lms.LMStudioPredictionError, request) -> str:
    """Handle tool call failures."""
    if request:
        return f"Tool '{request.tool_name}' failed: {error}. Please try a different approach."
    return "A tool call failed. Please rephrase your request."

model = lms.llm()
result = model.act(
    "Process the value -5",
    tools=[risky_tool],
    handle_invalid_tool_request=handle_tool_error,
    max_prediction_rounds=3
)
```

--------------------------------

### Model Management and Configuration with LM Studio

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Demonstrates how to load, configure, and manage LLM model instances using the lmstudio-python library. This includes listing downloaded and loaded models, specifying custom configurations like context length and GPU layers, tokenization utilities, and applying prompt templates.

```python
import lmstudio as lms

# Create explicit client
client = lms.Client(api_host="localhost:1234")

# List available models
downloaded = client.list_downloaded_models()
print("Downloaded models:")
for model in downloaded:
    print(f"  - {model.path}")

loaded = client.list_loaded_models()
print(f"\nCurrently loaded: {len(loaded)} models")

# Load model with custom configuration
model = client.llm.load_new_instance(
    model_key="qwen2.5-7b-instruct",
    config=lms.LlmLoadModelConfig(
        context_length=8192,
        gpu_split_strategy="layers",
        max_gpu_layers=32
    ),
    ttl=300000,  # 5 minutes in milliseconds
    on_load_progress=lambda progress: print(f"Loading: {progress*100:.1f}%")
)

# Get model information
info = model.get_info()
print(f"Model: {info['identifier']}")
print(f"Context length: {model.get_context_length()}")

# Tokenization utilities
text = "Hello, world!"
tokens = model.tokenize(text)
count = model.count_tokens(text)
print(f"Text: '{text}'")
print(f"Tokens: {tokens}")
print(f"Count: {count}")

# Batch tokenization
texts = ["First text", "Second text", "Third text"]
token_lists = model.tokenize(texts)
for text, tokens in zip(texts, token_lists):
    print(f"{text}: {len(tokens)} tokens")

# Apply model's prompt template
chat = lms.Chat("You are helpful")
chat.add_user_message("Hello!")
formatted = model.apply_prompt_template(chat)
print(f"Formatted prompt:\n{formatted}")

# Unload when done
model.unload()
```

--------------------------------

### Pretty Print JSON and Make Multiple Requests

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Demonstrates how to pretty print JSON output and handle multiple requests to an LLM model, parsing responses into a list of structured objects. Assumes a 'movie' object for JSON printing and a 'model' object with a 'respond' method for book information.

```python
import json

# Pretty print full JSON
print(json.dumps(movie, indent=2))

# Multiple requests
books = []
for title in ["The Hobbit", "1984", "Pride and Prejudice"]:
    result = model.respond(f"Tell me about {title}", response_format=BookInfo)
    books.append(result.parsed)

for book in books:
    print(f"{book['title']} by {book['author']} ({book['year']})")
```

--------------------------------

### Simple Text Completion with LM Studio Python SDK

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Generates text completions from prompts using the synchronous API of the LM Studio Python SDK. It requires the 'lmstudio' library and connects to a local LM Studio instance. The function accepts a prompt string and an optional configuration object for prediction parameters.

```python
import lmstudio as lms

# Connect to local LM Studio and get a model
model = lms.llm()

# Generate completion
result = model.complete("Once upon a time in a distant land,")
print(result.content)

# With configuration
result = model.complete(
    "Explain quantum computing",
    config=lms.LlmPredictionConfig(
        temperature=0.7,
        top_p=0.9,
        max_tokens=500
    )
)
print(result.content)
```

--------------------------------

### Agent Tool Use with LM Studio Python SDK

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Creates intelligent agents that can call Python functions to solve complex tasks using the LM Studio Python SDK. It requires the 'lmstudio' and 'math' libraries. Tools are defined as Python functions with type hints, and the agent can execute them with automatic tool execution and parallel calls.

```python
import lmstudio as lms
import math

# Define tools as Python functions with type hints
def add(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b

def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

def is_prime(n: int) -> bool:
    """Check if a number is prime."""
    if n < 2:
        return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

# Initialize
chat = lms.Chat()
model = lms.llm("qwen2.5-7b-instruct-1m")

# Run multi-round agent with automatic tool execution
result = model.act(
    "Is the result of (123 + 456) multiplied by 2 a prime number? Think step by step.",
    tools=[add, multiply, is_prime],
    max_prediction_rounds=10,
    max_parallel_tool_calls=2,  # Allow parallel execution
    on_message=chat.append,  # Track all messages
    on_round_start=lambda round_num: print(f"\n=== Round {round_num} ==="),
    on_round_end=lambda round_num: print(f"=== Round {round_num} complete ===\n")
)

print(f"\nCompleted in {result.rounds} rounds ({result.total_time_seconds:.2f}s)")
print("\nFull conversation:")
print(chat)
```

--------------------------------

### Asynchronous Concurrent LLM Operations

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Utilizes Python's async/await for efficient concurrent LLM operations. This snippet shows how to initialize an async client, execute multiple completion tasks concurrently using asyncio.gather, and handle streaming responses.

```python
import asyncio
import lmstudio as lms

async def main():
    # Must use context manager for async client (structured concurrency)
    async with lms.AsyncClient() as client:
        # Get model handle
        model = await client.llm.model("qwen2.5-7b-instruct-1m")

        # Define tasks
        questions = [
            "What is the capital of France?",
            "Explain photosynthesis in one sentence.",
            "What is the largest planet?",
            "Who wrote Romeo and Juliet?",
            "What is the speed of light?"
        ]

        # Execute concurrently
        results = await asyncio.gather(
            *[model.complete(q) for q in questions]
        )

        # Process results
        for question, result in zip(questions, results):
            print(f"Q: {question}")
            print(f"A: {result.content}\n")

        # Streaming with async
        chat = lms.Chat("You are concise")
        async for fragment in model.respond_stream(chat):
            print(fragment.content, end="", flush=True)

# Run async application
asyncio.run(main())
```

--------------------------------

### LM Studio SDK Error Handling and Timeout Configuration

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

This snippet illustrates best practices for error handling within the LM Studio Python SDK. It shows how to import specific exception classes like LMStudioError, LMStudioClientError, LMStudioTimeoutError, and LMStudioPredictionError. Additionally, it demonstrates how to configure the synchronous API timeout globally using `set_sync_api_timeout`.

```python
import lmstudio as lms
from lmstudio import (
    LMStudioError,
    LMStudioClientError,
    LMStudioTimeoutError,
    LMStudioPredictionError
)

# Configure timeout
lms.set_sync_api_timeout(120.0)  # 2 minutes

```

--------------------------------

### Structured JSON Responses with LM Studio Python SDK

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Parses model outputs into type-safe Python objects using schema definitions with the LM Studio Python SDK. It requires the 'lmstudio' and 'json' libraries. Response schemas are defined using classes inheriting from 'lms.BaseModel', enabling structured data access.

```python
import lmstudio as lms
import json

# Define response schema
class BookInfo(lms.BaseModel):
    """Structured information about a book."""
    title: str
    author: str
    year: int
    genres: list[str]
    summary: str

class MovieInfo(lms.BaseModel):
    """Structured information about a movie."""
    title: str
    director: str
    year: int
    cast: list[str]
    rating: float

# Get model
model = lms.llm()

# Request structured response
result = model.respond(
    "Tell me about The Lord of the Rings: The Fellowship of the Ring",
    response_format=MovieInfo
)

# Access parsed data with type safety
movie = result.parsed
print(f"Title: {movie['title']}")
print(f"Director: {movie['director']}")
print(f"Year: {movie['year']}")
print(f"Cast: {', '.join(movie['cast'])}")
print(f"Rating: {movie['rating']}/10")
```

--------------------------------

### Image Handling in Chat for Multimodal Models

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Illustrates how to include images in chat messages for multimodal LLM interactions. This involves preparing an image file, adding it to a chat message, and receiving a response from a vision-capable model. The conversation can then be continued with text-only messages.

```python
import lmstudio as lms

# Initialize
client = lms.Client()
model = lms.llm()  # Use vision-capable model
chat = lms.Chat("You are an image analysis assistant")

# Prepare image
image_handle = client.prepare_image(
    src="/path/to/image.jpg",
    name="photo.jpg"
)

# Add message with image
chat.add_user_message([
    "What do you see in this image?",
    image_handle
])

# Get response
result = model.respond(chat)
print(result.content)

# Continue conversation with text only
chat.add_assistant_response(result)
chat.add_user_message("Can you describe the colors?")
result = model.respond(chat)
print(result.content)
```

--------------------------------

### Streaming Chat with History using LM Studio Python SDK

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Builds interactive chatbots with streaming responses and conversation history management using the LM Studio Python SDK. It requires the 'lmstudio' and 'json' libraries. The chat history is managed via a 'Chat' object, and responses are streamed with a callback to automatically append messages.

```python
import lmstudio as lms

# Initialize chat with system prompt
chat = lms.Chat("You are a helpful AI assistant specializing in Python programming")
model = lms.llm()

# Interactive loop
while True:
    user_input = input("You: ")
    if not user_input:
        break

    # Add user message to history
    chat.add_user_message(user_input)

    # Stream response with callback to auto-append to history
    stream = model.respond_stream(chat, on_message=chat.append)

    print("Bot: ", end="", flush=True)
    for fragment in stream:
        print(fragment.content, end="", flush=True)
    print()  # New line after response

# Export conversation
import json
print(json.dumps(chat.to_dict(), indent=2))
```

--------------------------------

### Embedding Generation with LM Studio

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

Shows how to generate vector embeddings for text using embedding models provided by lmstudio-python. This includes generating embeddings for single text strings and batches of strings, and calculating cosine similarity between embeddings using NumPy.

```python
import lmstudio as lms
import numpy as np

# Get embedding model
embedding_model = lms.embedding_model("nomic-embed-text-v1.5")

# Single text embedding
text = "The quick brown fox jumps over the lazy dog"
embedding = embedding_model.embed(text)
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

# Batch embeddings (more efficient)
texts = [
    "Machine learning is fascinating",
    "Deep learning uses neural networks",
    "Natural language processing enables AI to understand text"
]

embeddings = embedding_model.embed_batch(texts)
print(f"Generated {len(embeddings)} embeddings")

# Calculate similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare embeddings
sim_0_1 = cosine_similarity(embeddings[0], embeddings[1])
sim_0_2 = cosine_similarity(embeddings[0], embeddings[2])
print(f"Similarity [0-1]: {sim_0_1:.4f}")
print(f"Similarity [0-2]: {sim_0_2:.4f}")
```

--------------------------------

### Tokenization for Embeddings

Source: https://context7.com/lmstudio-ai/lmstudio-python/llms.txt

This snippet demonstrates how to tokenize text and count the number of tokens using an embedding model. It is useful for understanding the token representation of text, which is crucial for many NLP tasks.

```python
tokens = embedding_model.tokenize(text)
count = embedding_model.count_tokens(text)
print(f"Tokens: {count}")
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.