### Quick Example: Install, Configure, and Train

Source: https://github.com/microsoft/skillopt/blob/main/docs/index.md

This example shows how to install SkillOpt, configure Azure OpenAI credentials, and train on the SearchQA benchmark.

```bash
# Install
pip install -e .

# Configure credentials
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-key"

# Train on SearchQA
python scripts/train.py --config configs/searchqa/default.yaml

# Evaluate best skill
python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/best_skill.md
```

--------------------------------

### Copy Environment Example

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Copies the example .env file to .env for user configuration.

```bash
cp .env.example .env
```

--------------------------------

### Quick Start - Training on SearchQA

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Minimal example command to train on SearchQA using a specified configuration, data split, and OpenAI endpoint.

```bash
python scripts/train.py \
    --config configs/searchqa/default.yaml \
    --split_dir /path/to/your/searchqa_split \
    --azure_openai_endpoint https://your-resource.openai.azure.com/ \
    --optimizer_model gpt-5.5 \
    --target_model gpt-5.5
```

--------------------------------

### Quick Start - Training CLI Arguments

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Table detailing key command-line arguments for the training script, including their descriptions and examples.

```markdown
| Argument | Description | Example |
|---|---|---|
| `--config` | Benchmark config YAML | `configs/searchqa/default.yaml` |
| `--split_dir` | Path to data split directory | `/path/to/split` |
| `--azure_openai_endpoint` | Azure OpenAI endpoint URL | `https://your-resource.openai.azure.com/` |
| `--optimizer_model` | Optimizer model deployment name | `gpt-5.5` |
| `--target_model` | Target model deployment name | `gpt-5.5` |
| `--num_epochs` | Number of training epochs | `4` |
| `--batch_size` | Batch size per step | `40` |
| `--workers` | Parallel rollout workers | `8` |
| `--out_root` | Output directory | `outputs/my_run` |
```

--------------------------------

### Environment Variables Example

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Example configuration for environment variables, including Azure OpenAI, OpenAI, and Anthropic API keys.

```ini
# Azure OpenAI (default backend)
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-key

# Or use OpenAI directly
OPENAI_API_KEY=sk-...

# Or Anthropic Claude
ANTHROPIC_API_KEY=sk-ant-...
```

--------------------------------

### Install and launch WebUI

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/first-experiment.md

Commands to install the WebUI dependencies and launch the application.

```bash
pip install -e ".[webui]"
python -m skillopt_webui.app
```

--------------------------------

### Install All Extras

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Installs all optional dependencies.

```bash
pip install -e ".[alfworld,claude,qwen,webui,dev]"
```

--------------------------------

### CLI Overrides Example

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Example of how to override configuration values from the command line.

```bash
python scripts/train.py \
  --config configs/searchqa/default.yaml \
  optimizer.learning_rate=16 \
  optimizer.lr_scheduler=linear \
  gradient.analyst_workers=8
```

--------------------------------

### Quick Install

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Clones the SkillOpt repository, navigates into the directory, and installs the package in editable mode.

```bash
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .
```

--------------------------------

### Install Qwen (Local) Extras

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Installs optional dependencies for the local Qwen backend.

```bash
pip install -e ".[qwen]"
```

--------------------------------

### Example configuration parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/first-experiment.md

Key parameters in the configuration file with analogies to deep learning concepts.

```yaml
train:
  num_epochs: 4           # (epochs)
  batch_size: 40          # (batch size)

optimizer:
  learning_rate: 4        # (max edits per step)
  lr_scheduler: cosine    # (learning rate schedule)
  use_slow_update: true   # (momentum at epoch boundary)
  use_meta_skill: true    # (cross-epoch optimizer memory)

gradient:
  analyst_workers: 16     # (parallel reflection workers)

evaluation:
  use_gate: true          # (validation gating)
```

--------------------------------

### Install Documentation Dependencies and Serve

Source: https://github.com/microsoft/skillopt/blob/main/CONTRIBUTING.md

Commands to install documentation dependencies and preview the documentation locally.

```bash
pip install -e ".[docs]"
mkdocs serve   # Preview at http://localhost:8000
```

--------------------------------

### Training Examples

Source: https://github.com/microsoft/skillopt/blob/main/docs/reference/cli.md

Examples demonstrating how to use the training command with different configurations and overrides.

```bash
# Basic training
python scripts/train.py --config configs/searchqa/default.yaml
```

```bash
# With overrides
python scripts/train.py \
  --config configs/searchqa/default.yaml \
  --cfg-options optimizer.learning_rate=16 optimizer.lr_scheduler=linear
```

```bash
# With custom initial skill
python scripts/train.py \
  --config configs/searchqa/default.yaml \
  --cfg-options env.skill_init=skills/my_seed.md
```

--------------------------------

### Configure API Credentials - Example

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Copies the example environment file and instructs to edit it with API credentials.

```bash
cp .env.example .env
# Edit .env with your API credentials, then:
source .env
```

--------------------------------

### Install WebUI Extras

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Installs optional dependencies for the WebUI.

```bash
pip install -e ".[webui]"
```

--------------------------------

### Verify Installation

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Verifies the SkillOpt installation by importing the library and printing a success message.

```python
import skillopt; print('SkillOpt ready!')
```

--------------------------------

### Install Development Extras

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Installs optional dependencies for development.

```bash
pip install -e ".[dev]"
```

--------------------------------

### Install SkillOpt

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Clones the repository, installs the package, and optionally installs dependencies for the ALFWorld benchmark.

```bash
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .

# For ALFWorld benchmark (optional):
pip install -e ".[alfworld]"
alfworld-download
```

--------------------------------

### Install Claude Backend Extras

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Installs optional dependencies for the Claude backend.

```bash
pip install -e ".[claude]"
```

--------------------------------

### Quick Start - Training on LiveMathematicianBench

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Command to train on LiveMathematicianBench using a specified configuration, data split, and OpenAI endpoint.

```bash
python scripts/train.py \
    --config configs/livemathematicianbench/default.yaml \
    --split_dir /path/to/your/livemath_split \
    --azure_openai_endpoint https://your-resource.openai.azure.com/ \
    --optimizer_model gpt-5.5 \
    --target_model gpt-5.5
```

--------------------------------

### Install ALFWorld Extras

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/installation.md

Installs optional dependencies for ALFWorld benchmark.

```bash
pip install -e ".[alfworld]"
```

--------------------------------

### Train the model

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/first-experiment.md

Command to start the training process using the specified configuration file.

```bash
python scripts/train.py --config configs/searchqa/default.yaml
```

--------------------------------

### Documentation Preview

Source: https://github.com/microsoft/skillopt/blob/main/docs/contributing.md

Commands to install documentation dependencies and serve the documentation locally for preview.

```bash
pip install -e ".[docs]"
mkdocs serve  # Preview at http://localhost:8000
```

--------------------------------

### Solution.py Initialization

Source: https://github.com/microsoft/skillopt/blob/main/skillopt/envs/spreadsheetbench/prompts/react_system.md

The required starting lines for the solution.py script, defining input and output paths.

```python
INPUT_PATH  = "<exact input path given in the task>"
OUTPUT_PATH = "<exact output path given in the task>"
```

--------------------------------

### Create Config

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-benchmark.md

Example YAML configuration file for the new benchmark.

```yaml
_base_: ['../_base_/default.yaml']

env:
  name: my_benchmark
  data_path: data/my_benchmark
  split_mode: ratio
  split_ratio: "2:1:7"

train:
  num_epochs: 4
  batch_size: 40

optimizer:
  learning_rate: 4
  lr_scheduler: cosine
  use_slow_update: true
  use_meta_skill: true

gradient:
  analyst_workers: 16
```

--------------------------------

### Configuration Example

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-backend.md

This YAML snippet demonstrates how to configure SkillOpt to use the new backend.

```yaml
model:
  backend: your_backend
  model_name: your-model-id
  temperature: 0.7
  max_tokens: 4096
```

--------------------------------

### Initial Skill Configuration

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/skill-document.md

Example YAML configuration for setting the initial skill for training.

```yaml
train:
  init_skill: "path/to/initial_skill.md"  # or omit for empty

```

--------------------------------

### Evaluation Examples

Source: https://github.com/microsoft/skillopt/blob/main/docs/reference/cli.md

Examples demonstrating how to use the evaluation command to assess skills on different data splits.

```bash
# Evaluate best skill on test set
python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/searchqa/run_001/skills/best_skill.md
```

```bash
# Evaluate on validation set
python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/searchqa/run_001/skills/best_skill.md \
  --split valid
```

--------------------------------

### Example training output

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/first-experiment.md

Sample output observed during the training process.

```text
[Step 1/8] Rollout: 20 items, 4 workers...
[Step 1/8] Score: 0.65 → Reflect...
[Step 1/8] 6 edit patches generated
[Step 1/8] Selected 4 edits (lr=8, cosine → 7.7)
[Step 1/8] Gate: val score 0.68 > 0.65 ✓ ACCEPT
[Step 2/8] ...
```

--------------------------------

### Clone and Install Development Dependencies

Source: https://github.com/microsoft/skillopt/blob/main/CONTRIBUTING.md

Steps to clone the SkillOpt repository and install development dependencies.

```bash
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e ".[dev]"
```

--------------------------------

### Quick Start - Training on ALFWorld

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Command to train on ALFWorld using a specified configuration, data split, and OpenAI endpoint.

```bash
python scripts/train.py \
    --config configs/alfworld/default.yaml \
    --split_dir /path/to/your/alfworld_split \
    --azure_openai_endpoint https://your-resource.openai.azure.com/ \
    --optimizer_model gpt-5.5 \
    --target_model gpt-5.5
```

--------------------------------

### Quick Start - Eval Only

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Command to evaluate a trained skill on specific data splits without further training.

```bash
# Evaluate a trained skill on specific data splits without training:

```

--------------------------------

### Model Parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Configuration parameters for the model backend, optimizer, and target.

```yaml
model:
  backend: azure_openai          # azure_openai | openai_chat | claude_code_exec | qwen
  optimizer: gpt-5.5               # Optimizer model (for reflection)
  target: gpt-5.5               # Target model (for rollout)
```

--------------------------------

### Optimizer Parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Configuration parameters for the optimizer, including learning rate, scheduler, and meta-skill usage.

```yaml
optimizer:
  learning_rate: 4               # Max edits per step (edit budget)
  min_learning_rate: 2           # Min edits for decay schedulers
  lr_scheduler: cosine           # constant | linear | cosine | autonomous
  use_slow_update: true          # Momentum-like blending at epoch boundary
  slow_update_samples: 20        # Samples for slow update evaluation
  use_meta_skill: true           # Cross-epoch strategy memory
```

--------------------------------

### Gradient (Reflection) Parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Configuration parameters for gradient reflection, including minibatch size and analyst workers.

```yaml
gradient:
  minibatch_size: 8              # Reflect minibatch size
  analyst_workers: 16            # Parallel reflection workers
  max_analyst_rounds: 3          # Max rounds of analyst reflection
  failure_only: false            # Only reflect on failures
```

--------------------------------

### Config Structure

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Directory structure for configuration files, showing global defaults and benchmark-specific overrides.

```yaml
configs/
├── _base_/
│   └── default.yaml          # Global defaults
├── searchqa/
│   └── default.yaml          # SearchQA overrides
├── docvqa/
│   └── default.yaml          # DocVQA overrides
└── alfworld/
    └── default.yaml          # ALFWorld overrides
```

--------------------------------

### Environment (Data) Parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Configuration parameters for the environment and data, including benchmark name, split mode, and data path.

```yaml
env:
  name: searchqa                 # Benchmark name
  split_mode: ratio              # ratio | split_dir
  split_ratio: "2:1:7"           # train:val:test ratio
  data_path: ""                  # Path to dataset
  exec_timeout: 120              # Per-task timeout (seconds)
```

--------------------------------

### Training Parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Configuration parameters for the training process, including epochs, batch size, and seed.

```yaml
train:
  num_epochs: 4                  # Number of training epochs
  batch_size: 40                 # Tasks per step (batch size)
  accumulation: 1                # Gradient accumulation
  seed: 42
```

--------------------------------

### Evaluation Parameters

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/configuration.md

Configuration parameters for evaluation, including validation gating and test evaluation.

```yaml
evaluation:
  use_gate: true                 # Validation gating (accept/reject updates)
  eval_test: true                # Run test evaluation after training
```

--------------------------------

### Data Preparation - Example JSON Structure (SearchQA)

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Shows the JSON format for SearchQA task items, including id, question, context, and answers.

```json
[
  {
    "id": "unique_item_id",
    "question": "Who wrote the novel ...",
    "context": "[DOC] relevant passage text ...",
    "answers": ["expected answer"]
  }
]
```

--------------------------------

### Review the config file

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/first-experiment.md

View the default configuration file for the SearchQA benchmark.

```bash
cat configs/searchqa/default.yaml
```

--------------------------------

### Usage

Source: https://github.com/microsoft/skillopt/blob/main/skillopt/envs/_template/README.md

Steps to copy, rename, implement, register, and create configuration for a new benchmark.

```bash
cp -r skillopt/envs/_template skillopt/envs/your_benchmark

```

--------------------------------

### Typical Skill Document Structure

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/skill-document.md

An example of the structure of a typical skill document in Markdown format.

```markdown
# Task Strategy

## General Approach
- Break complex problems into sub-steps
- Always verify intermediate results

## Common Patterns
- When you see X, try approach Y
- Avoid Z because it leads to errors

## Edge Cases
- If the input contains A, handle it specially by...
- Watch out for B — it requires C

## Output Format
- Always include reasoning before the answer
- Format numbers with proper units

```

--------------------------------

### Backend Architecture Directory Structure

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-backend.md

The directory structure shows where new backends should be placed, with `your_backend.py` as an example.

```tree
skillopt/model/
├── base.py           # Abstract base class
├── azure_openai.py   # Azure OpenAI backend
├── openai_model.py   # Direct OpenAI backend
├── claude.py         # Anthropic Claude backend
├── qwen.py           # Local Qwen (vLLM) backend
└── your_backend.py   # Your new backend
```

--------------------------------

### Configure API Credentials - Qwen (local vLLM)

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Sets environment variables for Qwen chat base URL and model name for local vLLM deployment.

```bash
export QWEN_CHAT_BASE_URL="http://localhost:8000/v1"
export QWEN_CHAT_MODEL="Qwen/Qwen3.5-4B"
```

--------------------------------

### Example Custom Backend Implementation

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-backend.md

This Python code defines a custom backend `YourBackend` that inherits from `ModelBackend` and implements the `generate` method.

```python
from skillopt.model.base import ModelBackend, ModelResponse
import os

class YourBackend(ModelBackend):
    """Your custom model backend."""
    
    def __init__(self, cfg: dict):
        super().__init__(cfg)
        self.model_name = cfg.get('model_name', 'your-default-model')
        self.api_key = os.environ.get('YOUR_API_KEY', '')
        self.client = self._init_client()
    
    def _init_client(self):
        """Initialize API client."""
        # TODO: Set up your API client
        pass
    
    async def generate(
        self,
        messages: list[dict],
        temperature: float = 0.7,
        max_tokens: int = 4096,
        **kwargs
    ) -> ModelResponse:
        """
        Generate a completion.
        
        Args:
            messages: Chat messages [{"role": "...", "content": "..."}]
            temperature: Sampling temperature
            max_tokens: Maximum tokens in response
            
        Returns:
            ModelResponse with content, usage, and metadata
        """
        response = await self.client.chat(
            model=self.model_name,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens,
        )
        
        return ModelResponse(
            content=response.text,
            usage={
                'prompt_tokens': response.usage.input,
                'completion_tokens': response.usage.output,
            },
            model=self.model_name,
        )
    
    async def generate_with_tools(
        self,
        messages: list[dict],
        tools: list[dict],
        **kwargs
    ) -> ModelResponse:
        """Generate with tool/function calling support."""
        # Optional: implement if your model supports tool use
        raise NotImplementedError("Tool use not supported")
```

--------------------------------

### Data Preparation - Directory Structure

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Illustrates the expected directory structure for data preparation, with train, val, and test subdirectories each containing a JSON file.

```bash
data/my_split/
├── train/items.json
├── val/items.json
└── test/items.json
```

--------------------------------

### WebUI Command

Source: https://github.com/microsoft/skillopt/blob/main/docs/reference/cli.md

The command to launch the SkillOpt Web User Interface.

```bash
python -m skillopt_webui.app [--port PORT] [--share]
```

--------------------------------

### Create Benchmark Package

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-benchmark.md

Create the directory structure for a new benchmark.

```bash
mkdir -p skillopt/envs/my_benchmark
touch skillopt/envs/my_benchmark/__init__.py
```

--------------------------------

### Launch WebUI with public share link

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Command to launch the WebUI with a public share link, useful for remote servers.

```bash
python -m skillopt_webui.app --share
```

--------------------------------

### Select Stage Analogy

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/training-loop.md

Python code illustrating gradient clipping and optimizer step size analogy for the select stage.

```python
# Analogy: gradient clipping + optimizer step size
selected = top_k(edits, k=learning_rate)
```

--------------------------------

### Run Training

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-benchmark.md

Command to run the training script with the new benchmark configuration.

```bash
python scripts/train.py --config configs/my_benchmark/default.yaml
```

--------------------------------

### Optional: Qwen local model (via vLLM) Dependencies

Source: https://github.com/microsoft/skillopt/blob/main/requirements.txt

These packages are optional and are needed for using Qwen local models via vLLM.

```python
# vllm>=0.4.0
```

--------------------------------

### Configure API Credentials - OpenAI

Source: https://github.com/microsoft/skillopt/blob/main/README.md

Sets the environment variable for OpenAI API key.

```bash
export OPENAI_API_KEY="sk-..."
```

--------------------------------

### Rollout Stage Analogy

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/training-loop.md

Python code illustrating the forward pass analogy for the rollout stage.

```python
# Analogy: forward pass through the network
predictions = model(input, skill_document)
scores = evaluate(predictions, ground_truth)
```

--------------------------------

### Implement Environment Adapter

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/new-benchmark.md

Python code for a custom environment adapter.

```python
from skillopt.envs.base import EnvAdapter, TaskResult

class MyBenchmarkEnv(EnvAdapter):
    """Execute tasks and evaluate results."""
    
    def __init__(self, cfg: dict):
        super().__init__(cfg)
    
    async def execute(self, item: DataItem, skill: str, model) -> TaskResult:
        """
        Execute a single task.
        
        Args:
            item: The data item to process
            skill: Current skill document content
            model: The target model instance
            
        Returns:
            TaskResult with prediction, score, and trajectory
        """
        # Build prompt with skill document
        prompt = self.build_prompt(item, skill)
        
        # Get model response
        response = await model.generate(prompt)
        
        # Extract prediction
        prediction = self.parse_response(response)
        
        # Score against ground truth
        score = self.evaluate(prediction, item.ground_truth)
        
        return TaskResult(
            item_id=item.id,
            prediction=prediction,
            score=score,
            trajectory=[
                {"role": "system", "content": skill},
                {"role": "user", "content": item.input},
                {"role": "assistant", "content": response}
            ]
        )
    
    def evaluate(self, prediction: str, ground_truth: str) -> float:
        """
        Score a prediction against ground truth. 
        
        Returns:
            Float between 0.0 and 1.0
        """
        # TODO: Implement your scoring logic
        # Examples: exact match, F1, ANLS, etc.
        return float(prediction.strip() == ground_truth.strip())
    
    def build_prompt(self, item, skill: str) -> str:
        """Combine skill document with task input."""
        return f"{skill}\n\n---\n\nQuestion: {item.input}"
    
    def parse_response(self, response: str) -> str:
        """Extract the answer from model response."""
        return response.strip()
```

--------------------------------

### Evaluate the best skill

Source: https://github.com/microsoft/skillopt/blob/main/docs/guide/first-experiment.md

Command to evaluate the best skill on the test split.

```bash
python scripts/eval_only.py \
  --config configs/searchqa/default.yaml \
  --skill outputs/searchqa/<run_id>/skills/best_skill.md
```

--------------------------------

### Comparison Grid CSS

Source: https://github.com/microsoft/skillopt/blob/main/index.html

CSS for creating a comparison grid layout.

```css
.comparison-grid {
  display: grid;
  grid-template-columns: repeat(3, minmax(0, 1fr));
  gap: 12px;
}
```