### StreamLoader Example

Source: https://developer.apple.com/documentation/evaluations/loader

An example of implementing the Loader protocol using `StreamLoader` with a custom async sequence.

```APIDOC
## StreamLoader

### Description
A loader backed by a custom async sequence.

### Example Usage
```swift
var dataset: any Loader<ModelSample<String>> {
    StreamLoader(stream: AsyncThrowingStream<ModelSample<String>, Error> { continuation in
        Task {
            let prompts = ["One plus one is...", "Swift is..."]
            for prompt in prompts {
                continuation.yield(ModelSample(prompt: prompt, expected: ""))
            }
            continuation.finish()
        }
    })
}
```
```

--------------------------------

### Implement SampleProtocol

Source: https://developer.apple.com/documentation/evaluations/sampleprotocol

An example of a struct 'MySample' that conforms to the SampleProtocol, providing 'input' and an optional 'expected' property.

```swift
struct MySample: SampleProtocol {
    var input: String
    var expected: String?
}
```

--------------------------------

### ArrayLoader Example

Source: https://developer.apple.com/documentation/evaluations/loader

An example of implementing the Loader protocol using `ArrayLoader` with in-memory samples.

```APIDOC
## ArrayLoader

### Description
A loader backed by an in-memory array.

### Example Usage
```swift
var dataset: any Loader<ModelSample<String>> {
    ArrayLoader(samples: [
        ModelSample(prompt: "One plus one is...", expected: "Two."),
        ModelSample(prompt: "Swift is...", expected: "A powerful language."),
    ])
}
```
```

--------------------------------

### Example Usage of group(_:_:) Method

Source: https://developer.apple.com/documentation/evaluations/metricsaggregator/group%28_%3A_%3A%29

Demonstrates how to use the group(_:_:) method to organize metrics like 'Accuracy' into a 'Quality Metrics' group. This example shows computing the mean and median of the accuracy metric within the defined group.

```swift
let accuracy = Metric("Accuracy")


func aggregateMetrics(using aggregator: inout MetricsAggregator) {
    aggregator.group("Quality Metrics") { group in
        group.computeMean(of: accuracy)
        group.computeMedian(of: accuracy)
    }
}
```

--------------------------------

### JSONLoader Example

Source: https://developer.apple.com/documentation/evaluations/loader

An example of implementing the Loader protocol using `JSONLoader` with a JSON or JSONL file.

```APIDOC
## JSONLoader

### Description
A loader backed by a JSON or JSONL file.

### Example Usage
```swift
var dataset: any Loader<ModelSample<String>> {
    JSONLoader(url: Bundle.main.url(forResource: "prompts", withExtension: "jsonl")!)
}
```
```

--------------------------------

### StreamLoader Example

Source: https://developer.apple.com/documentation/evaluations/loader

Instantiate a StreamLoader with an AsyncThrowingStream that yields ModelSample objects. This is useful for dynamically generated or streaming data.

```swift
var dataset: any Loader<ModelSample<String>> {
    StreamLoader(stream: AsyncThrowingStream<ModelSample<String>, Error> { continuation in
        Task {
            let prompts = ["One plus one is...", "Swift is..."]
            for prompt in prompts {
                continuation.yield(ModelSample(prompt: prompt, expected: ""))
            }
            continuation.finish()
        }
    })
}
```

--------------------------------

### Creating a ModelSampleOutput Instance

Source: https://developer.apple.com/documentation/evaluations/modelsampleoutput/init%28value%3Aexpectations%3A%29

Example of how to create an instance of `ModelSampleOutput` with a specific string value and no expectations. This demonstrates basic usage of the initializer.

```swift
let output = ModelSampleOutput<String, TrajectoryExpectation>(value: "Paris", expectations: nil)
```

--------------------------------

### Example ArgumentMatcher Configuration

Source: https://developer.apple.com/documentation/evaluations/argumentmatcher

Demonstrates how to create an array of ArgumentMatcher instances to define validation rules for multiple arguments. Use this to set up validation for tool calls.

```swift
let matchers: [ArgumentMatcher] = [
    .exact(argumentName: "city", value: "San Francisco"),
    .keyOnly(argumentName: "units"),
    .naturalLanguage(argumentName: "prompt", criteria: "A weather-related question")
]
```

--------------------------------

### ModelSample Initialization

Source: https://developer.apple.com/documentation/evaluations/modelsampleprotocol

Example of initializing a ModelSample with a prompt, expected output, and evaluation expectations. Use this for common language model evaluation scenarios.

```swift
let sample = ModelSample(
    prompt: "What's the weather?",
    expected: "Sunny",
    expectations: TrajectoryExpectation(ordered: [
        ToolExpectation("get_weather")
    ])
)
```

--------------------------------

### Creating a ModelJudgePrompt Instance

Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt/init%28instructions%3Aevaluationtarget%3Areference%3A%29

Example of creating a ModelJudgePrompt instance with custom system instructions.

```swift
let prompt = ModelJudgePrompt<ModelSample<String>>(
    instructions: "You are a domain expert."
)
```

--------------------------------

### ArrayLoader Example

Source: https://developer.apple.com/documentation/evaluations/loader

Instantiate an ArrayLoader with a collection of ModelSample objects for in-memory datasets.

```swift
var dataset: any Loader<ModelSample<String>> {
    ArrayLoader(samples: [
        ModelSample(prompt: "One plus one is...", expected: "Two."),
        ModelSample(prompt: "Swift is...", expected: "A powerful language."),
    ])
}
```

--------------------------------

### JSONLoader Example

Source: https://developer.apple.com/documentation/evaluations/loader

Instantiate a JSONLoader using a URL pointing to a JSON or JSONL file. Ensure the file is present in the main bundle.

```swift
var dataset: any Loader<ModelSample<String>> {
    JSONLoader(url: Bundle.main.url(forResource: "prompts", withExtension: "jsonl")!)
}
```

--------------------------------

### init(input:expected:expectations:)

Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28input%3Aexpected%3Aexpectations%3A%29

Creates a model sample with a prebuilt input. This initializer is available for iOS, iPadOS, macOS, visionOS, and watchOS starting from version 27.0.

```APIDOC
## init(input:expected:expectations:)

### Description
Creates a model sample with a prebuilt input.

### Parameters
#### Path Parameters
- **input** (ModelSampleInput) - Required - The input for the model sample.
- **expected** (ExpectedValue?) - Optional - The expected value for the model sample. Defaults to nil.
- **expectations** (TrajectoryExpectation?) - Optional - The expectations for the model sample. Defaults to nil.
```

--------------------------------

### Configuring Sampling Strategy

Source: https://developer.apple.com/documentation/evaluations/samplegenerator

Allows configuration of the strategy for selecting existing samples to be used as examples in the prompt. This influences how the generator learns from existing data.

```APIDOC
## samplingStrategy

### Description
Sets or gets the strategy for selecting existing samples to be used as examples within the generation prompt. This property allows you to control how the generator leverages prior data to inform new sample creation.

### Property Type
`SampleGenerator<SampleType>.SamplingStrategy?`

### Access
Read-write

### Example
```json
{
  "example": "Configuring the sampling strategy"
}
```
```

--------------------------------

### samplingStrategy Property

Source: https://developer.apple.com/documentation/evaluations/samplegenerator/samplingstrategy-swift.property

The strategy for selecting existing samples as examples in the prompt. When nil, the generator shows no examples and doesn't retry on repetition. When set, the strategy also controls retry behavior when the model repeats itself.

```APIDOC
## Property: samplingStrategy

### Description
The strategy for selecting existing samples as examples in the prompt. When `nil`, the generator shows no examples and doesn’t retry on repetition. When set, the strategy also controls retry behavior when the model repeats itself.

### Declaration
```swift
var samplingStrategy: SampleGenerator<SampleType>.SamplingStrategy?
```

### Related Types
- `enum SamplingStrategy`: The values that define how the generator selects existing samples as examples in the generation prompt.
- `var validator: ((SampleType) async throws -> Bool)?`: An optional closure that decides whether a generated sample is valid.
```

--------------------------------

### Run a Model-as-Judge Evaluation with Swift Testing

Source: https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators

Integrate your model-as-judge evaluation into your Swift Testing suite. This example shows how to define an evaluation and assert on its results.

```swift
import Testing
import Evaluations


struct BookTagTests {
    static let evaluation = BookTagEvaluation()
    @Test(.evaluates(Self.evaluation))
    func evaluateTagQuality() async throws {
        let result = EvaluationContext.current.result
        let score = result.aggregateValue(.mean(of: Self.evaluation.tagQuality))
        #expect(score > 2.5)
    }
}
```

--------------------------------

### Customize Model and Instructions for Sample Generation

Source: https://developer.apple.com/documentation/evaluations/generating-synthetic-evaluation-datasets

Use a `sessionProvider` closure with `makeSamples` to specify the language model and instructions for generating synthetic data. This example uses `PrivateCloudComputeLanguageModel` and provides custom instructions for creating to-do list items.

```swift
var expanded: [ModelSample<TaskItem>] = []


for try await sample in dataset.makeSamples(
    syntheticGenerationPrompt,
    targetCount: 20,
    sessionProvider: {
        LanguageModelSession(
            // Use the Private Compute Cloud model to generate samples.
            model: PrivateCloudComputeLanguageModel(),
            instructions: """
                You create new structured task data. Generate realistic \
                to-do list items based on the examples provided. Each item \
                needs a natural prompt, an appropriate title, correct \
                category classification, and an honest urgency rating.
                """
        )
    }
) {
    expanded.append(sample)
}

```

--------------------------------

### init(prompt:expected:instructions:generationSchema:expectations:)

Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28prompt%3Aexpected%3Ainstructions%3Agenerationschema%3Aexpectations%3A%29-8mni

Creates a model sample with a FoundationModels prompt. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, all starting from version 27.0 and in Beta.

```APIDOC
## init(prompt:expected:instructions:generationSchema:expectations:)

### Description
Creates a model sample with a FoundationModels prompt.

### Parameters
#### Initializer Parameters
- **prompt** (Prompt) - Required - The FoundationModels prompt.
- **expected** (ExpectedValue?) - Optional - The expected value for the model sample.
- **instructions** (Instructions?) - Optional - Instructions for generating the model sample.
- **generationSchema** (GenerationSchema?) - Optional - The schema for generation.
- **expectations** (TrajectoryExpectation?) - Optional - Expectations for the model sample's trajectory.

### Availability
iOS 27.0+ Beta
iPadOS 27.0+ Beta
Mac Catalyst 27.0+ Beta
macOS 27.0+ Beta
visionOS 27.0+ Beta
watchOS 27.0+ Beta
```

--------------------------------

### ModelSample Initializer

Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28prompt%3Aexpected%3Ainstructions%3Agenerationschema%3Aexpectations%3A%29-7daed

Creates a model sample with string-based prompt and instructions. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, all starting from version 27.0 and in Beta.

```APIDOC
## init(prompt:expected:instructions:generationSchema:expectations:)

### Description
Creates a model sample with string-based prompt and instructions.

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Parameters
- **prompt** (String) - Required - The string-based prompt for the model.
- **expected** (ExpectedValue?) - Optional - The expected value for the model sample. Defaults to nil.
- **instructions** (String?) - Optional - String-based instructions for the model. Defaults to nil.
- **generationSchema** (GenerationSchema?) - Optional - The schema for generation. Defaults to nil.
- **expectations** (TrajectoryExpectation?) - Optional - The trajectory expectations for the model sample. Defaults to nil.

### Request Example
```swift
init(
    prompt: "Your prompt here",
    expected: nil,
    instructions: "Optional instructions",
    generationSchema: nil,
    expectations: nil
)
```

### Response
This initializer does not return a value in the traditional sense; it constructs and returns an instance of `ModelSample`.
```

--------------------------------

### startTime

Source: https://developer.apple.com/documentation/evaluations/evaluationresult/starttime

The time when the evaluation run started. This is an instance property of the EvaluationResult.

```APIDOC
## startTime

### Description
The time when the evaluation run started.

### Instance Property
`let startTime: Date`

### Availability
- iOS 27.0+ Beta
- iPadOS 27.0+ Beta
- Mac Catalyst 27.0+ Beta
- macOS 27.0+ Beta
- visionOS 27.0+ Beta
- watchOS 27.0+ Beta
```

--------------------------------

### ArrayLoader init(samples:)

Source: https://developer.apple.com/documentation/evaluations/arrayloader/init%28samples%3A%29

Creates a loader backed by the given array of samples. This is a beta feature available on iOS, iPadOS, Mac Catalyst, macOS, and visionOS starting from version 27.0.

```APIDOC
## init(samples:)

### Description
Creates a loader backed by the given array of samples.

### Method
initializer

### Parameters
#### Path Parameters
- **samples** (Sample[]) - Required - The array of samples to back the loader.

### Response
#### Success Response (200)
- **ArrayLoader** - An instance of ArrayLoader initialized with the provided samples.
```

--------------------------------

### Creating a ModelSubject Instance

Source: https://developer.apple.com/documentation/evaluations/modelsubject

Initializes a ModelSubject with a string value representing the model's output. The transcript is optional and not provided in this example.

```swift
let subject = ModelSubject(value: "Paris, France")
```

--------------------------------

### Custom Evaluation Implementation

Source: https://developer.apple.com/documentation/evaluations/evaluation

An example of implementing the `Evaluation` protocol. This includes defining a metric, dataset, subject generation logic, evaluators, and metric aggregation.

```swift
struct MyEvaluation: Evaluation {
    let metric = Metric("Match")


    let dataset = ArrayLoader(samples: [
        ModelSample(prompt: "One plus one is...", expected: "Two.")
    ])


    func subject(from sample: ModelSample<String>) async throws -> ModelSubject<String> {
        ModelSubject(value: "Two.")
    }


    var evaluators: Evaluators {
        Evaluator { sample, subject in
            let metric = Metric("Match")
            guard let expected = sample.expected else { return metric.ignore() }
            return subject.value == expected ? metric.passing() : metric.failing()
        }
    }


    func aggregateMetrics(using aggregator: inout MetricsAggregator) {
        aggregator.computeMean(of: metric)
    }
}
```

--------------------------------

### ModelSampleInput Initializer

Source: https://developer.apple.com/documentation/evaluations/modelsampleinput/init%28prompt%3Ainstructions%3Agenerationschema%3A%29

Creates a model sample input with the given prompt, instructions, and schema. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta).

```APIDOC
## init(prompt:instructions:generationSchema:)

### Description
Creates a model sample input with the given prompt, instructions, and schema.

### Parameters

`prompt` (Prompt) - Required - The prompt to send to the language model.

`instructions` (Instructions?) - Optional - Optional system instructions for the model session.

`generationSchema` (GenerationSchema?) - Optional - The output schema for the assistant’s response.
```

--------------------------------

### Define a Prompt for Synthetic Data Generation

Source: https://developer.apple.com/documentation/evaluations/generating-synthetic-evaluation-datasets

Create a `Prompt` object to describe the characteristics of the synthetic data you want to generate. This prompt guides the model in creating realistic and varied samples.

```swift
let syntheticGenerationPrompt = Prompt("""
    Generate realistic to-do list items that a busy professional might have. \
    Each input is a natural-language request, and the expected output is the structured \
    task extracted from it. Cover a mix of work tasks (meetings, deadlines, \
    reviews), personal errands (shopping, appointments), health activities \
    (exercise, checkups), and home maintenance. Vary urgency and whether a \
    due date is specified.
    ")
```

--------------------------------

### ModelSample Prompt Property

Source: https://developer.apple.com/documentation/evaluations/modelsample/prompt

The 'prompt' property represents the user's prompt for this sample. It is available for various Apple platforms starting from version 27.0 Beta.

```APIDOC
## prompt

### Description
The user

’s prompt for this sample.

### Availability
iOS 27.0+ Beta
iPadOS 27.0+ Beta
Mac Catalyst 27.0+ Beta
macOS 27.0+ Beta
visionOS 27.0+ Beta
watchOS 27.0+ Beta

### Swift Code
```swift
var prompt: Prompt { get }
```

### Requirements
Available when `ExpectedValue` conforms to `Decodable`, `Encodable`, and `Sendable`.
```

--------------------------------

### init(from:)

Source: https://developer.apple.com/documentation/evaluations/aggregationoperation/init%28from%3A%29

Decodes an operation from a keyed container, reconstructing the metric from its name. Available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0.

```APIDOC
## init(from:)

### Description
Decodes an operation from a keyed container, reconstructing the metric from its name.

### Signature
```swift
init(from decoder: any Decoder) throws
```

### Availability
- iOS 27.0+
- iPadOS 27.0+
- Mac Catalyst 27.0+
- macOS 27.0+
- visionOS 27.0+
- watchOS 27.0+
```

--------------------------------

### init(toolCalls:toolOutputs:instructionText:prompts:responses:)

Source: https://developer.apple.com/documentation/evaluations/structuredtranscript/init%28toolcalls%3Atooloutputs%3Ainstructiontext%3Aprompts%3Aresponses%3A%29

Creates a structured transcript with optional tool calls, tool outputs, instruction text, user prompts, and model responses.

```APIDOC
## init(toolCalls:toolOutputs:instructionText:prompts:responses:)

### Description
Creates a structured transcript.

### Parameters

`toolCalls`
    
The tool calls from the session.

`toolOutputs`
    
The tool outputs from the session.

`instructionText`
    
The system instructions text.

`prompts`
    
The user prompts.

`responses`
    
The model responses.
```

--------------------------------

### ModelSampleOutput Initializer

Source: https://developer.apple.com/documentation/evaluations/modelsampleoutput/init%28value%3Aexpectations%3A%29

Creates a model sample output with an optional expected value and expectations. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta).

```APIDOC
## init(value:expectations:)

### Description
Creates a model sample output with an optional expected value and expectations.

### Parameters

#### `value`

The expected output value for comparison.

#### `expectations`

The expected behavior, such as a tool-call trajectory.

### Discussion
```swift
let output = ModelSampleOutput<String, TrajectoryExpectation>(value: "Paris", expectations: nil)
```
```

--------------------------------

### Accessing Evaluation Start and End Times

Source: https://developer.apple.com/documentation/evaluations/evaluationresult

Retrieves the start and end times of the evaluation run.

```swift
let endTime: Date
```

```swift
let startTime: Date
```

--------------------------------

### init(instructions:evaluationTarget:reference:)

Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt/init%28instructions%3Aevaluationtarget%3Areference%3A%29

Creates a model-as-judge prompt configuration. You can provide custom system instructions, an optional closure to convert the response to a string, and an optional closure returning labeled reference data.

```APIDOC
## init(instructions:evaluationTarget:reference:)

### Description
Creates a model-as-judge prompt configuration.

### Parameters
#### `instructions` (String)
System instructions for the model-as-judge. Defaults to a general-purpose evaluator prompt.

#### `evaluationTarget` ((Input.ExpectedValue) -> String)?
Optional closure to convert the response to a string. When `nil`, the response is JSON-serialized.

#### `reference` ((Input, Input.ExpectedValue) async throws -> [String : String])?
Optional closure returning labeled reference data to include in the judge prompt.

### Example
```swift
let prompt = ModelJudgePrompt<ModelSample<String>>(
    instructions: "You are a domain expert."
)
```
```

--------------------------------

### Define BookTags Structure

Source: https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators

Defines the expected structure for book tags using Generable and Guide from Foundation Models. This guides the language model in generating the output.

```swift
@Generable
struct BookTags: Codable, Sendable {
    @Guide(description: "Tags describing the book\'s genre, themes, and setting",
           .count(3...8))
    var tags: [String]
}
```

--------------------------------

### init(url:)

Source: https://developer.apple.com/documentation/evaluations/jsonloader/init%28url%3A%29

Creates a loader backed by the JSON or JSONL file at the given URL. This is a beta feature available on multiple Apple platforms.

```APIDOC
## init(url:)

### Description
Creates a loader backed by the JSON or JSONL file at the given URL.

### Method
Initializer

### Parameters
#### Path Parameters
- **url** (URL) - Required - The URL of the JSON or JSONL file.
```

--------------------------------

### init(_:scale:judge:scoringMode:prompt:)

Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator/init%28_%3Ascale%3Ajudge%3Ascoringmode%3Aprompt%3A%29

Creates a single-metric evaluator with a custom judge prompt. This initializer is available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, all starting from version 27.0 and in beta.

```APIDOC
## init(_:scale:judge:scoringMode:prompt:)

### Description
Creates a single-metric evaluator with a custom judge prompt.

### Parameters
#### Parameters
- **name** (String) - The metric name that corresponds to the DataFrame column.
- **scale** (ScoringScale) - The scoring scale for this metric.
- **judge** (any LanguageModel) - The language model to use as judge.
- **scoringMode** (ScoringMode) - Optional. A value that indicates whether scores are discrete (default) or allow any floating-point value.
- **prompt** (ModelJudgePrompt<Input>) - Configuration for the judge prompt, including instructions, response presentation, and reference.

### See Also
### Creating a single-dimension evaluator
`init(String, scale: ScoringScale, judge: any LanguageModel, scoringMode: ScoringMode)`
```

--------------------------------

### Creating a Basic ModelSample

Source: https://developer.apple.com/documentation/evaluations/modelsample

Instantiates a ModelSample with a simple string prompt and its expected string output. This is useful for basic text-based evaluations.

```swift
let sample = ModelSample(prompt: "The capital of France is...", expected: "Paris.")
```

--------------------------------

### failing(rationale:)

Source: https://developer.apple.com/documentation/evaluations/metric/failing%28rationale%3A%29

Returns a metric with a failing result. This method is available on iOS, iPadOS, Mac Catalyst, macOS, and visionOS, all starting from version 27.0 Beta. It can also be used on watchOS starting from version 27.0 Beta.

```APIDOC
## failing(rationale:)

### Description
Returns a metric with a failing result.

### Method Signature
```swift
func failing(rationale: String? = nil) -> Metric
```

### Parameters
#### Path Parameters
- **rationale** (String?) - Optional - A string providing the rationale for the failing result.
```

--------------------------------

### init(jsonData:)

Source: https://developer.apple.com/documentation/evaluations/evaluationresult/init%28jsondata%3A%29

Creates an evaluation result by parsing JSON data. This method is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, starting from version 27.0, and is currently in beta.

```APIDOC
## init(jsonData:)

### Description
Creates an evaluation result by parsing JSON data.

### Method
`init(jsonData: Data) throws`

### Parameters
#### Path Parameters
* **data** (Data) - Required - The JSON data to parse.
```

--------------------------------

### ResultColumn.name

Source: https://developer.apple.com/documentation/evaluations/resultcolumn/name

Gets the column name in the DataFrame.

```APIDOC
## name

### Description
The column name in the DataFrame.

### Property
`name` (String)

### Availability
iOS 27.0+ Beta
iPadOS 27.0+ Beta
Mac Catalyst 27.0+ Beta
macOS 27.0+ Beta
visionOS 27.0+ Beta
watchOS 27.0+ Beta

### Code Example
```swift
let name: String
```
```

--------------------------------

### Define ModelJudgeEvaluator with Scored Examples

Source: https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators

This snippet shows how to define a `ModelJudgeEvaluator` for email tone. It includes a numeric scale and detailed instructions with four scored examples (scores 4, 3, 2, and 1) to calibrate the model's judgment.

```swift
ModelJudgeEvaluator(
    "EmailTone",
    scale: .numeric([
        4: "Professional, clear, and well-matched to the scenario, with appropriate warmth.",
        3: "Professional and clear, but feels slightly generic, formal, or impersonal.",
        2: "Noticeable tone issues: too curt, too informal, or mismatched to the scenario.",
        1: "Unprofessional, unclear, rude, or completely inappropriate for the scenario.",
    ]),
    judge: SystemLanguageModel.default,
    prompt: ModelJudgePrompt(
        instructions: """
            You are an expert evaluator of professional email tone. Your task is to evaluate
            whether an AI-generated email strikes the right professional tone for a workplace
            setting.

            Evaluate the email considering:

            - Professionalism: Uses appropriate language for a workplace. Avoids slang, overly
            casual phrasing, or unnecessarily stiff formality.
            - Clarity: Clearly communicates its purpose. The reader immediately understands
            what is being asked or conveyed.
            - Warmth: Feels human and approachable. Includes appropriate pleasantries without
            being excessive.
            - Appropriateness: The tone matches the scenario: a complaint is firm but
            respectful; a request is polite but clear; good news is enthusiastic but
            professional.

            Here are some examples to calibrate your scoring:

            ### Example 1
            **Prompt:** Write an email to a colleague asking them to review your document
            by Friday.
            **Response:** \"Will you take a look at the Q3 report when you get a chance?
            It would be great to have your feedback by Friday so I can incorporate any changes
            before the Monday meeting. Let me know if that timeline works for you. Thanks!\"
            **Score:** 4
            **Rationale:** The email is polite, clear, and professional. It states the request,
            gives a reason for the deadline, and respects the recipient's time by checking if
            the timeline works.

            ### Example 2
            **Prompt:** Write an email sharing a project status update with stakeholders.
            **Response:** \"Hi everyone, I wanted to share a quick update on Project Atlas.
            We completed the design review last week, and development is on track for the
            June deadline. There are a couple of open questions about the API integration
            that I'll follow up on separately. Please reach out if you have any concerns.\"
            **Score:** 3
            **Rationale:** Professional and clear with good structure. Could be slightly warmer
            or more engaging, the update is efficient but reads as formulaic.

            ### Example 3
            **Prompt:** Write an email declining a meeting invitation.
            **Response:** \"I can't make it. Sorry.\"
            **Score:** 2
            **Rationale:** While not rude, the email is too brief for a professional setting.
            It doesn't offer an alternative or show engagement with the topic.

            ### Example 4
            **Prompt:** Write an email to a colleague asking them to review your document
            by Friday.
            **Response:** \"I need you to review my document. Get it done by Friday.\"
            **Score:** 1
            **Rationale:** The email is curt and demanding. It lacks any politeness, gives no
            context for the request, and does not acknowledge the recipient's workload or time.

            Use these examples to calibrate your scoring. Apply the same standards
            consistently. Evaluate step by step, then assign a score from 4, 3, 2, or 1.
            """)
)
```

--------------------------------

### init(expected:arguments:)

Source: https://developer.apple.com/documentation/evaluations/trajectoryexpectation/init%28expected%3Aarguments%3A%29

Creates a trajectory expectation for a single expected tool call. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta).

```APIDOC
## init(expected:arguments:)

### Description
Creates a trajectory expectation for a single expected tool call.

### Parameters
#### `toolName` (String) - Required
 The name of the tool expected to be called.

#### `arguments` ([ArgumentMatcher]) - Optional
 The argument matchers to validate. Defaults to an empty array.

### See Also
- `struct ToolExpectation`
```

--------------------------------

### ModelSample Initializer Signature

Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28prompt%3Aexpected%3Ainstructions%3Agenerationschema%3Aexpectations%3A%29-7daed

This is the signature for the init(prompt:expected:instructions:generationSchema:expectations:) initializer for ModelSample. It is available on iOS, iPadOS, macOS, visionOS, and watchOS starting from version 27.0.

```swift
init(
    prompt: String,
    expected: ExpectedValue? = Optional<String>(nilLiteral: (Опционально)),
    instructions: String? = nil,
    generationSchema: GenerationSchema? = nil,
    expectations: TrajectoryExpectation? = nil
)
```

--------------------------------

### AggregateMetric Structure

Source: https://developer.apple.com/documentation/evaluations/aggregatemetric

This snippet shows the basic structure of the AggregateMetric and an example of its usage in calculating accuracy.

```APIDOC
## AggregateMetric

An aggregate statistic computed from a metric’s results across the evaluation dataset.

```swift
struct AggregateMetric
```

### Overview

```swift
let accuracy = Metric("Accuracy")
let op = AggregationOperation.mean(of: accuracy)
print(op.label) // "Mean of Accuracy"
```

The summary DataFrame stores one `AggregateMetric` for each column. Each value records the operation that produced it, and derives its display label and source metric name from the operation.

### Instance Properties

* `let group: String?`
  The group this aggregate belongs to, if any.
* `var label: String`
  The display label for this aggregate.
* `let operation: AggregationOperation`
  The aggregation operation that produced this value.
* `var sourceMetric: String?`
  The name of the source metric.
* `let value: Double`
  The aggregate value.

### Conforms To

* `Decodable`
* `Encodable`
* `Equatable`
* `Sendable`
* `SendableMetatype`
```

--------------------------------

### ModelSampleInput Initializer

Source: https://developer.apple.com/documentation/evaluations/modelsampleinput

Creates a model sample input with the given prompt, instructions, and schema.

```APIDOC
## init(prompt: Prompt, instructions: Instructions?, generationSchema: GenerationSchema?)

### Description
Creates a model sample input with the given prompt, instructions, and schema.

### Parameters
#### Path Parameters
- **prompt** (Prompt) - Required - The FoundationModels prompt for this input.
- **instructions** (Instructions?) - Optional - The optional FoundationModels instructions for this input.
- **generationSchema** (GenerationSchema?) - Optional - The output schema for the assistant’s response.
```

--------------------------------

### Create a Model Sample with Tool Expectation

Source: https://developer.apple.com/documentation/evaluations/evaluating-language-model-responses

This Swift code demonstrates how to create a `ModelSample` for evaluating tool-calling behavior. It includes a prompt, expected output, and a `TrajectoryExpectation` to verify the `count_letters` tool is called with the correct arguments.

```swift
ModelSample(
    prompt: "Count the letter 'r' in 'strawberry'.",
    expected: 3,
    // Attach a trajectory expectation that defines the expected tool-calling sequence.
    expectations: TrajectoryExpectation(
        ordered: [
            // Expect the model to call `count_letters` with these exact arguments.
            ToolExpectation(
                "count_letters",
                arguments: [
                    .exact(argumentName: "letter", value: .string("r")),
                    .exact(argumentName: "word", value: .string("strawberry")),
                ]
            ),
        ]
    )
),

```

--------------------------------

### Declare allPass Metric

Source: https://developer.apple.com/documentation/evaluations/toolcallevaluator/allpass

Declares the allPass metric. Available on iOS, iPadOS, macOS, tvOS, and watchOS beta versions starting from 27.0.

```swift
let allPass: Metric
```

--------------------------------

### init(_:)

Source: https://developer.apple.com/documentation/evaluations/metric/init%28_%3A%29

Creates a metric with just a name. Use the factory methods — passing, failing, scoring, or ignore — to produce results.

```APIDOC
## init(_:)

### Description
Creates a metric with just a name. Use the factory methods — `passing`, `failing`, `scoring`, or `ignore` — to produce results.

### Method Signature
```swift
init(_ name: String)
```

### Parameters
#### Path Parameters
- **name** (String) - Required - The name of the metric.
```

--------------------------------

### Instruction Text Property

Source: https://developer.apple.com/documentation/evaluations/structuredtranscript/instructiontext

Access the system instruction text from the transcript. Available on iOS, iPadOS, macOS, and more, starting from version 27.0 Beta.

```swift
var instructionText: String
```

--------------------------------

### sourceMetric

Source: https://developer.apple.com/documentation/evaluations/aggregatemetric/sourcemetric

The name of the source metric. Available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta).

```APIDOC
## sourceMetric

### Description
The name of the source metric.

### Instance Property
`var sourceMetric: String? { get }`
```

--------------------------------

### Create Initial ModelSample Dataset

Source: https://developer.apple.com/documentation/evaluations/generating-synthetic-evaluation-datasets

Initializes a dataset of ModelSample objects for evaluating a task extraction feature. Each sample includes a prompt and an expected TaskItem.

```swift
let dataset: [ModelSample<TaskItem>] = [
    // Here's a health task that is non-urgent and has a due date.
    ModelSample(
        prompt: "Schedule dentist appointment for next Tuesday",
        expected: TaskItem(title: "Schedule dentist appointment",
                           dueOn: "04/07/2026", category: .health, isUrgent: false)
    ),
    // This is an errands task that is urgent and due today.
    ModelSample(
        prompt: "Buy groceries for dinner party tonight",
        expected: TaskItem(title: "Buy groceries for dinner party",
                           dueOn: "03/30/2026", category: .errands, isUrgent: true)
    ),
    // Here's a work task that's urgent and has a due date.
    ModelSample(
        prompt: "Finish quarterly report by end of week",
        expected: TaskItem(title: "Finish quarterly report",
                           dueOn: "04/03/2026", category: .work, isUrgent: true)
    ),
    // This is a home task that's non-urgent and has a due date.
    ModelSample(
        prompt: "Fix the leaky kitchen faucet this weekend",
        expected: TaskItem(title: "Fix leaky kitchen faucet",
                           dueOn: "04/05/2026", category: .home, isUrgent: false)
    ),
    // Here's a personal task that's non-urgent with no due date.
    ModelSample(
        prompt: "Learn to cook Thai food",
        expected: TaskItem(title: "Learn to cook Thai food",
                           dueOn: nil, category: .personal, isUrgent: false)
    ),
]
```

--------------------------------

### ModelSample Initializers

Source: https://developer.apple.com/documentation/evaluations/modelsample

Provides initializers for creating ModelSample instances with different configurations, including string-based prompts, FoundationModels prompts, and prebuilt inputs.

```APIDOC
## Initializers

### `init(prompt: String, expected: ExpectedValue?, instructions: String?, generationSchema: GenerationSchema?, expectations: TrajectoryExpectation?)`

Creates a model sample with string-based prompt and instructions.

### `init(prompt: Prompt, expected: ExpectedValue?, instructions: Instructions?, generationSchema: GenerationSchema?, expectations: TrajectoryExpectation?)`

Creates a model sample with a FoundationModels prompt.

### `init(input: ModelSampleInput, expected: ExpectedValue?, expectations: TrajectoryExpectation?)`

Creates a model sample with a prebuilt input.
```

--------------------------------

### Defining a Custom SafetyLevel Enum

Source: https://developer.apple.com/documentation/evaluations/scorelevel

Example of conforming an enum to ScoreLevel to create a 'SafetyLevel' scoring scale. Overrides 'guideDescription' and 'value' for specific cases.

```swift
enum SafetyLevel: ScoreLevel {
    case safe, unsafe


    var guideDescription: String {
        switch self {
        case .safe:
            return "The response is safe and appropriate"
        case .unsafe:
            return "The response contains harmful content"
        }
    }


    var value: Double {
        switch self {
        case .safe:
            return 1
        case .unsafe:
            return 0
        }
    }
}


let dimension = ScoreDimension("Safety", scale: .custom(SafetyLevel.self))
```

--------------------------------

### Custom Evaluator Implementation

Source: https://developer.apple.com/documentation/evaluations/evaluatorprotocol

An example of a custom evaluator struct conforming to EvaluatorProtocol. It defines a Metric and implements the metrics function to return a scoring value.

```swift
struct MyEvaluator<Input: SampleProtocol>:
EvaluatorProtocol
where Input.ExpectedValue: Sendable & Codable {
    let metric = Metric("Quality")


    func metrics(
        subject: ModelSubject<Input.ExpectedValue>,
        input: Input
    ) async throws -> [Metric] {
        return [metric.scoring(1.0)]
    }
}
```

--------------------------------

### Creating an Inline Evaluator

Source: https://developer.apple.com/documentation/evaluations/evaluator

An example of creating an inline evaluator using a closure. The closure receives the input sample and ModelSubject to determine a metric result.

```swift
Evaluator { sample, subject in
    let metric = Metric("TitleMatch")
    guard let expected = sample.expected else { return metric.ignore() }
    return subject.value == expected ? metric.passing() : metric.failing()
}
```

--------------------------------

### Default Instructions

Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt

Accesses the default system instructions used when no custom instructions are provided for a ModelJudgePrompt.

```APIDOC
## static var defaultInstructions: String

### Description
The default system instructions used when no custom instructions are provided.

### Properties
* **defaultInstructions** (String) - The default system instructions.
```

--------------------------------

### ModelSample Initializer with Prebuilt Input

Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28input%3Aexpected%3Aexpectations%3A%29

Use this initializer to create a ModelSample instance when you have a pre-defined input. It allows for optional expected values and trajectory expectations.

```swift
init(
    input: ModelSampleInput,
    expected: ExpectedValue? = Optional<String>(nilLiteral: ()),
    expectations: TrajectoryExpectation? = nil
)
```

--------------------------------

### startTime Property Declaration

Source: https://developer.apple.com/documentation/evaluations/evaluationresult/starttime

Declares the startTime property, which returns a Date object representing the start time of the evaluation run. Available in beta versions.

```swift
let startTime: Date
```

--------------------------------

### ModelSampleInput Initializer

Source: https://developer.apple.com/documentation/evaluations/modelsampleinput/init%28prompt%3Ainstructions%3Agenerationschema%3A%29

Use this initializer to create a ModelSampleInput object. It requires a prompt and can optionally include system instructions and an output generation schema.

```swift
init(
    prompt: Prompt,
    instructions: Instructions? = nil,
    generationSchema: GenerationSchema? = nil
)
```

--------------------------------

### Define SampleProtocol

Source: https://developer.apple.com/documentation/evaluations/sampleprotocol

This snippet shows the basic definition of the SampleProtocol, which conforms to Decodable, Encodable, and Sendable.

```swift
protocol SampleProtocol : Decodable, Encodable, Sendable
```

--------------------------------

### ArgumentValue.int(_:)

Source: https://developer.apple.com/documentation/evaluations/argumentvalue/int%28_%3A%29

Represents an integer value. This is a beta feature available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0.

```APIDOC
## ArgumentValue.int(_:)

### Description
An integer value.

### Method Signature
`case int(Int)`

### Availability
- iOS 27.0+
- iPadOS 27.0+
- Mac Catalyst 27.0+
- macOS 27.0+
- visionOS 27.0+
- watchOS 27.0+

### Beta Software Notice
This documentation contains preliminary information about an API or technology in development. This information is subject to change, and software implemented according to this documentation should be tested with final operating system software.
```

--------------------------------

### Default System Instructions

Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator

Provides the default system instructions used by the judge model when no custom instructions are specified.

```swift
static var defaultInstructions: String
```

--------------------------------

### Setting ScoringMode to Discrete

Source: https://developer.apple.com/documentation/evaluations/scoringmode

Example of setting the scoring mode to 'discrete' in Swift. This mode constrains the judge model to return one of the defined scale values.

```swift
let mode: ScoringMode = .discrete
```

--------------------------------

### init(ordered:unordered:disallowed:)

Source: https://developer.apple.com/documentation/evaluations/trajectoryexpectation/init%28ordered%3Aunordered%3Adisallowed%3A%29

Creates a trajectory expectation with ordered and unordered requirements, plus specific tools that must not be called.

```APIDOC
## init(ordered:unordered:disallowed:)

### Description
Creates a trajectory expectation with ordered and unordered requirements, plus specific tools that must not be called.

### Parameters
#### `ordered`
- Type: `[ToolExpectation]`
- Required: No (defaults to `[]`)
- Description: Steps that must be satisfied in sequential order.

#### `unordered`
- Type: `[ToolExpectation]`
- Required: No (defaults to `[]`)
- Description: Tool calls that must occur at some point, regardless of position.

#### `disallowed`
- Type: `[ToolExpectation]`
- Required: Yes
- Description: Tools that must NOT be called.

### Discussion
Additional tool calls beyond the expected ones are always allowed when using disallowed expectations — the disallowed list targets specific tools while permitting everything else. To disallow _all_ unexpected calls instead, use `init(ordered:unordered:allowsAdditionalToolCalls:)` with `allowsAdditionalToolCalls: false`.
```

--------------------------------

### ModelJudgePrompt Initialization

Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt

Initializes a ModelJudgePrompt configuration with custom instructions, an optional evaluation target closure, and an optional reference data closure.

```APIDOC
## init(instructions: String, evaluationTarget: ((Input.ExpectedValue) -> String)?, reference: ((Input, Input.ExpectedValue) async throws -> [String : String])?)

### Description
Creates a model-as-judge prompt configuration.

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
* **instructions** (String) - Required - The system instructions for the judge model.
* **evaluationTarget** ((Input.ExpectedValue) -> String)? - Optional - A closure that converts the model’s response to a string for the judge prompt.
* **reference** ((Input, Input.ExpectedValue) async throws -> [String : String])? - Optional - A closure that provides labeled reference data to include in the model-as-judge prompt.
```

--------------------------------

### StreamLoader Struct Declaration

Source: https://developer.apple.com/documentation/evaluations/streamloader

Declares the StreamLoader struct, which is generic over a Sample type conforming to SampleProtocol. It is available on various Apple platforms starting from beta versions.

```swift
struct StreamLoader<Sample> where Sample : SampleProtocol
```

--------------------------------

### ModelSubject.value

Source: https://developer.apple.com/documentation/evaluations/modelsubject/value

The typed value produced by the model. This property is available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, starting from version 27.0 (Beta).

```APIDOC
## ModelSubject.value

### Description
The typed value produced by the model.

### Availability
iOS 27.0+ Beta
iPadOS 27.0+ Beta
Mac Catalyst 27.0+ Beta
macOS 27.0+ Beta
visionOS 27.0+ Beta
watchOS 27.0+ Beta

### Instance Property
```swift
var value: Value
```
```

--------------------------------

### Aggregate Metrics Summary

Source: https://developer.apple.com/documentation/evaluations/evaluating-language-model-responses

Implement `aggregateMetrics(using:)` to define how metrics are summarized into high-level statistics like the mean.

```swift
func aggregateMetrics(using aggregator: inout MetricsAggregator) {
    aggregator.computeMean(of: exactMatch)
    aggregator.computeMean(of: absoluteError)
}
```

--------------------------------

### Declare ScoringMode Variable

Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator/scoringmode

Declare a variable to hold the scoring mode for the evaluator. This property is available on iOS, iPadOS, macOS, and visionOS starting from version 27.0.

```swift
let scoringMode: ScoringMode
```

--------------------------------

### Define Dataset with Model Samples

Source: https://developer.apple.com/documentation/evaluations/evaluating-language-model-responses

Define a dataset for evaluation using `ArrayLoader` and `ModelSample`. Each sample includes a prompt and an expected output.

```swift
import Evaluations
import FoundationModels


struct LetterCountEvaluation: Evaluation {
    let dataset = ArrayLoader(samples: [
            ModelSample(prompt: "Count the letter 'r' in 'strawberry'.", expected: 3),
            ModelSample(prompt: "How many a's are in 'banana'?", expected: 3),
            ModelSample(prompt: "Mississippi contains how many s?", expected: 4),
            ModelSample(prompt: "What's the number of l in hello?", expected: 2),
            ModelSample(prompt: "The letter 'e' in 'bookkeeper' appears how many times?", expected: 3),
        ])


```

--------------------------------

### resultID

Source: https://developer.apple.com/documentation/evaluations/evaluationresult/resultid

A unique identifier for this particular result. This property is available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 Beta.

```APIDOC
## resultID

### Description
A unique identifier for this particular result.

### Property
`let resultID: UUID`

### Availability
- iOS 27.0+ Beta
- iPadOS 27.0+ Beta
- Mac Catalyst 27.0+ Beta
- macOS 27.0+ Beta
- visionOS 27.0+ Beta
- watchOS 27.0+ Beta
```

--------------------------------

### ArgumentValue.double(_:)

Source: https://developer.apple.com/documentation/evaluations/argumentvalue/double%28_%3A%29

Represents a double-precision floating-point value. This is a beta feature available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0.

```APIDOC
## ArgumentValue.double(_:)

### Description
Represents a double-precision floating-point value.

### Method Signature
`case double(Double)`

### Platforms
iOS 27.0+ Beta
iPadOS 27.0+ Beta
Mac Catalyst 27.0+ Beta
macOS 27.0+ Beta
visionOS 27.0+ Beta
watchOS 27.0+ Beta

### See Also
* `case string(String)`
* `case int(Int)`
* `case bool(Bool)`
```

--------------------------------

### defaultInstructions

Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator/defaultinstructions

Retrieves the default system instructions used by the ModelJudgeEvaluator when no custom instructions are provided. This is a read-only property.

```APIDOC
## defaultInstructions

### Description
Provides the default system instructions for the ModelJudgeEvaluator. This property is used when a user does not supply custom instructions.

### Property
`static var defaultInstructions: String { get }`

### Returns
A `String` representing the default system instructions.
```

--------------------------------

### SampleGenerator Sampling Strategy Property

Source: https://developer.apple.com/documentation/evaluations/samplegenerator/samplingstrategy-swift.property

Declares the optional samplingStrategy property for the SampleGenerator. This property determines how existing samples are chosen as examples for prompts and influences retry behavior.

```swift
var samplingStrategy: SampleGenerator<SampleType>.SamplingStrategy?
```

--------------------------------

### Default Instructions

Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator

Provides the default system instructions that the judge model uses when no custom instructions are explicitly provided. This is useful for understanding the baseline behavior of the judge.

```APIDOC
## static var defaultInstructions: String

### Description
The default system instructions the model uses when no custom instructions are provided.

### Returns
- **String** - The default system instructions.
```

--------------------------------

### Define Custom EvaluationSubject

Source: https://developer.apple.com/documentation/evaluations/evaluationsubject

Conform to the EvaluationSubject protocol to create your own subject types. This example shows a basic struct MySubject that holds a codable value and an optional transcript.

```swift
protocol EvaluationSubject<Value>

struct MySubject<Value: Codable>: EvaluationSubject {
    var value: Value
    var transcript: StructuredTranscript?
}
```

--------------------------------

### run()

Source: https://developer.apple.com/documentation/evaluations/samplegenerator/run%28%29

Runs the generator and returns a stream of newly synthesized samples. Each element in the returned stream is a newly generated sample. After iteration completes, access `samples` to retrieve the full dataset (initial + generated), or `invalidSamples` to see samples the validator rejected.

```APIDOC
## run()

### Description
Runs the generator and returns a stream of newly synthesized samples. Each element in the returned stream is a newly generated sample. After iteration completes, access `samples` to retrieve the full dataset (initial + generated), or `invalidSamples` to see samples the validator rejected.

### Method
`run()`

### Return Value
An async throwing stream of individual samples (`AsyncSequence<SampleType, any Error>`).
```

--------------------------------

### Run Sample Generation

Source: https://developer.apple.com/documentation/evaluations/samplegenerator

Executes the sample generation process. Returns an asynchronous stream of newly synthesized samples. Ensure the generator is configured before calling this method.

```swift
func run() -> some AsyncSequence<SampleType, any Error>
```

--------------------------------

### Disallowed Tools

Source: https://developer.apple.com/documentation/evaluations/trajectoryexpectation/disallowed

Declare a list of tools that the model must not call. This property is available on iOS, iPadOS, macOS, tvOS, and watchOS starting from version 27.0 (Beta).

```swift
var disallowed: [ToolExpectation]
```