### StreamLoader Example Source: https://developer.apple.com/documentation/evaluations/loader An example of implementing the Loader protocol using `StreamLoader` with a custom async sequence. ```APIDOC ## StreamLoader ### Description A loader backed by a custom async sequence. ### Example Usage ```swift var dataset: any Loader> { StreamLoader(stream: AsyncThrowingStream, Error> { continuation in Task { let prompts = ["One plus one is...", "Swift is..."] for prompt in prompts { continuation.yield(ModelSample(prompt: prompt, expected: "")) } continuation.finish() } }) } ``` ``` -------------------------------- ### Implement SampleProtocol Source: https://developer.apple.com/documentation/evaluations/sampleprotocol An example of a struct 'MySample' that conforms to the SampleProtocol, providing 'input' and an optional 'expected' property. ```swift struct MySample: SampleProtocol { var input: String var expected: String? } ``` -------------------------------- ### ArrayLoader Example Source: https://developer.apple.com/documentation/evaluations/loader An example of implementing the Loader protocol using `ArrayLoader` with in-memory samples. ```APIDOC ## ArrayLoader ### Description A loader backed by an in-memory array. ### Example Usage ```swift var dataset: any Loader> { ArrayLoader(samples: [ ModelSample(prompt: "One plus one is...", expected: "Two."), ModelSample(prompt: "Swift is...", expected: "A powerful language."), ]) } ``` ``` -------------------------------- ### Example Usage of group(_:_:) Method Source: https://developer.apple.com/documentation/evaluations/metricsaggregator/group%28_%3A_%3A%29 Demonstrates how to use the group(_:_:) method to organize metrics like 'Accuracy' into a 'Quality Metrics' group. This example shows computing the mean and median of the accuracy metric within the defined group. ```swift let accuracy = Metric("Accuracy") func aggregateMetrics(using aggregator: inout MetricsAggregator) { aggregator.group("Quality Metrics") { group in group.computeMean(of: accuracy) group.computeMedian(of: accuracy) } } ``` -------------------------------- ### JSONLoader Example Source: https://developer.apple.com/documentation/evaluations/loader An example of implementing the Loader protocol using `JSONLoader` with a JSON or JSONL file. ```APIDOC ## JSONLoader ### Description A loader backed by a JSON or JSONL file. ### Example Usage ```swift var dataset: any Loader> { JSONLoader(url: Bundle.main.url(forResource: "prompts", withExtension: "jsonl")!) } ``` ``` -------------------------------- ### StreamLoader Example Source: https://developer.apple.com/documentation/evaluations/loader Instantiate a StreamLoader with an AsyncThrowingStream that yields ModelSample objects. This is useful for dynamically generated or streaming data. ```swift var dataset: any Loader> { StreamLoader(stream: AsyncThrowingStream, Error> { continuation in Task { let prompts = ["One plus one is...", "Swift is..."] for prompt in prompts { continuation.yield(ModelSample(prompt: prompt, expected: "")) } continuation.finish() } }) } ``` -------------------------------- ### Creating a ModelSampleOutput Instance Source: https://developer.apple.com/documentation/evaluations/modelsampleoutput/init%28value%3Aexpectations%3A%29 Example of how to create an instance of `ModelSampleOutput` with a specific string value and no expectations. This demonstrates basic usage of the initializer. ```swift let output = ModelSampleOutput(value: "Paris", expectations: nil) ``` -------------------------------- ### Example ArgumentMatcher Configuration Source: https://developer.apple.com/documentation/evaluations/argumentmatcher Demonstrates how to create an array of ArgumentMatcher instances to define validation rules for multiple arguments. Use this to set up validation for tool calls. ```swift let matchers: [ArgumentMatcher] = [ .exact(argumentName: "city", value: "San Francisco"), .keyOnly(argumentName: "units"), .naturalLanguage(argumentName: "prompt", criteria: "A weather-related question") ] ``` -------------------------------- ### ModelSample Initialization Source: https://developer.apple.com/documentation/evaluations/modelsampleprotocol Example of initializing a ModelSample with a prompt, expected output, and evaluation expectations. Use this for common language model evaluation scenarios. ```swift let sample = ModelSample( prompt: "What's the weather?", expected: "Sunny", expectations: TrajectoryExpectation(ordered: [ ToolExpectation("get_weather") ]) ) ``` -------------------------------- ### Creating a ModelJudgePrompt Instance Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt/init%28instructions%3Aevaluationtarget%3Areference%3A%29 Example of creating a ModelJudgePrompt instance with custom system instructions. ```swift let prompt = ModelJudgePrompt>( instructions: "You are a domain expert." ) ``` -------------------------------- ### ArrayLoader Example Source: https://developer.apple.com/documentation/evaluations/loader Instantiate an ArrayLoader with a collection of ModelSample objects for in-memory datasets. ```swift var dataset: any Loader> { ArrayLoader(samples: [ ModelSample(prompt: "One plus one is...", expected: "Two."), ModelSample(prompt: "Swift is...", expected: "A powerful language."), ]) } ``` -------------------------------- ### JSONLoader Example Source: https://developer.apple.com/documentation/evaluations/loader Instantiate a JSONLoader using a URL pointing to a JSON or JSONL file. Ensure the file is present in the main bundle. ```swift var dataset: any Loader> { JSONLoader(url: Bundle.main.url(forResource: "prompts", withExtension: "jsonl")!) } ``` -------------------------------- ### init(input:expected:expectations:) Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28input%3Aexpected%3Aexpectations%3A%29 Creates a model sample with a prebuilt input. This initializer is available for iOS, iPadOS, macOS, visionOS, and watchOS starting from version 27.0. ```APIDOC ## init(input:expected:expectations:) ### Description Creates a model sample with a prebuilt input. ### Parameters #### Path Parameters - **input** (ModelSampleInput) - Required - The input for the model sample. - **expected** (ExpectedValue?) - Optional - The expected value for the model sample. Defaults to nil. - **expectations** (TrajectoryExpectation?) - Optional - The expectations for the model sample. Defaults to nil. ``` -------------------------------- ### Configuring Sampling Strategy Source: https://developer.apple.com/documentation/evaluations/samplegenerator Allows configuration of the strategy for selecting existing samples to be used as examples in the prompt. This influences how the generator learns from existing data. ```APIDOC ## samplingStrategy ### Description Sets or gets the strategy for selecting existing samples to be used as examples within the generation prompt. This property allows you to control how the generator leverages prior data to inform new sample creation. ### Property Type `SampleGenerator.SamplingStrategy?` ### Access Read-write ### Example ```json { "example": "Configuring the sampling strategy" } ``` ``` -------------------------------- ### samplingStrategy Property Source: https://developer.apple.com/documentation/evaluations/samplegenerator/samplingstrategy-swift.property The strategy for selecting existing samples as examples in the prompt. When nil, the generator shows no examples and doesn't retry on repetition. When set, the strategy also controls retry behavior when the model repeats itself. ```APIDOC ## Property: samplingStrategy ### Description The strategy for selecting existing samples as examples in the prompt. When `nil`, the generator shows no examples and doesn’t retry on repetition. When set, the strategy also controls retry behavior when the model repeats itself. ### Declaration ```swift var samplingStrategy: SampleGenerator.SamplingStrategy? ``` ### Related Types - `enum SamplingStrategy`: The values that define how the generator selects existing samples as examples in the generation prompt. - `var validator: ((SampleType) async throws -> Bool)?`: An optional closure that decides whether a generated sample is valid. ``` -------------------------------- ### Run a Model-as-Judge Evaluation with Swift Testing Source: https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators Integrate your model-as-judge evaluation into your Swift Testing suite. This example shows how to define an evaluation and assert on its results. ```swift import Testing import Evaluations struct BookTagTests { static let evaluation = BookTagEvaluation() @Test(.evaluates(Self.evaluation)) func evaluateTagQuality() async throws { let result = EvaluationContext.current.result let score = result.aggregateValue(.mean(of: Self.evaluation.tagQuality)) #expect(score > 2.5) } } ``` -------------------------------- ### Customize Model and Instructions for Sample Generation Source: https://developer.apple.com/documentation/evaluations/generating-synthetic-evaluation-datasets Use a `sessionProvider` closure with `makeSamples` to specify the language model and instructions for generating synthetic data. This example uses `PrivateCloudComputeLanguageModel` and provides custom instructions for creating to-do list items. ```swift var expanded: [ModelSample] = [] for try await sample in dataset.makeSamples( syntheticGenerationPrompt, targetCount: 20, sessionProvider: { LanguageModelSession( // Use the Private Compute Cloud model to generate samples. model: PrivateCloudComputeLanguageModel(), instructions: """ You create new structured task data. Generate realistic \ to-do list items based on the examples provided. Each item \ needs a natural prompt, an appropriate title, correct \ category classification, and an honest urgency rating. """ ) } ) { expanded.append(sample) } ``` -------------------------------- ### init(prompt:expected:instructions:generationSchema:expectations:) Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28prompt%3Aexpected%3Ainstructions%3Agenerationschema%3Aexpectations%3A%29-8mni Creates a model sample with a FoundationModels prompt. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, all starting from version 27.0 and in Beta. ```APIDOC ## init(prompt:expected:instructions:generationSchema:expectations:) ### Description Creates a model sample with a FoundationModels prompt. ### Parameters #### Initializer Parameters - **prompt** (Prompt) - Required - The FoundationModels prompt. - **expected** (ExpectedValue?) - Optional - The expected value for the model sample. - **instructions** (Instructions?) - Optional - Instructions for generating the model sample. - **generationSchema** (GenerationSchema?) - Optional - The schema for generation. - **expectations** (TrajectoryExpectation?) - Optional - Expectations for the model sample's trajectory. ### Availability iOS 27.0+ Beta iPadOS 27.0+ Beta Mac Catalyst 27.0+ Beta macOS 27.0+ Beta visionOS 27.0+ Beta watchOS 27.0+ Beta ``` -------------------------------- ### ModelSample Initializer Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28prompt%3Aexpected%3Ainstructions%3Agenerationschema%3Aexpectations%3A%29-7daed Creates a model sample with string-based prompt and instructions. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, all starting from version 27.0 and in Beta. ```APIDOC ## init(prompt:expected:instructions:generationSchema:expectations:) ### Description Creates a model sample with string-based prompt and instructions. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **prompt** (String) - Required - The string-based prompt for the model. - **expected** (ExpectedValue?) - Optional - The expected value for the model sample. Defaults to nil. - **instructions** (String?) - Optional - String-based instructions for the model. Defaults to nil. - **generationSchema** (GenerationSchema?) - Optional - The schema for generation. Defaults to nil. - **expectations** (TrajectoryExpectation?) - Optional - The trajectory expectations for the model sample. Defaults to nil. ### Request Example ```swift init( prompt: "Your prompt here", expected: nil, instructions: "Optional instructions", generationSchema: nil, expectations: nil ) ``` ### Response This initializer does not return a value in the traditional sense; it constructs and returns an instance of `ModelSample`. ``` -------------------------------- ### startTime Source: https://developer.apple.com/documentation/evaluations/evaluationresult/starttime The time when the evaluation run started. This is an instance property of the EvaluationResult. ```APIDOC ## startTime ### Description The time when the evaluation run started. ### Instance Property `let startTime: Date` ### Availability - iOS 27.0+ Beta - iPadOS 27.0+ Beta - Mac Catalyst 27.0+ Beta - macOS 27.0+ Beta - visionOS 27.0+ Beta - watchOS 27.0+ Beta ``` -------------------------------- ### ArrayLoader init(samples:) Source: https://developer.apple.com/documentation/evaluations/arrayloader/init%28samples%3A%29 Creates a loader backed by the given array of samples. This is a beta feature available on iOS, iPadOS, Mac Catalyst, macOS, and visionOS starting from version 27.0. ```APIDOC ## init(samples:) ### Description Creates a loader backed by the given array of samples. ### Method initializer ### Parameters #### Path Parameters - **samples** (Sample[]) - Required - The array of samples to back the loader. ### Response #### Success Response (200) - **ArrayLoader** - An instance of ArrayLoader initialized with the provided samples. ``` -------------------------------- ### Creating a ModelSubject Instance Source: https://developer.apple.com/documentation/evaluations/modelsubject Initializes a ModelSubject with a string value representing the model's output. The transcript is optional and not provided in this example. ```swift let subject = ModelSubject(value: "Paris, France") ``` -------------------------------- ### Custom Evaluation Implementation Source: https://developer.apple.com/documentation/evaluations/evaluation An example of implementing the `Evaluation` protocol. This includes defining a metric, dataset, subject generation logic, evaluators, and metric aggregation. ```swift struct MyEvaluation: Evaluation { let metric = Metric("Match") let dataset = ArrayLoader(samples: [ ModelSample(prompt: "One plus one is...", expected: "Two.") ]) func subject(from sample: ModelSample) async throws -> ModelSubject { ModelSubject(value: "Two.") } var evaluators: Evaluators { Evaluator { sample, subject in let metric = Metric("Match") guard let expected = sample.expected else { return metric.ignore() } return subject.value == expected ? metric.passing() : metric.failing() } } func aggregateMetrics(using aggregator: inout MetricsAggregator) { aggregator.computeMean(of: metric) } } ``` -------------------------------- ### ModelSampleInput Initializer Source: https://developer.apple.com/documentation/evaluations/modelsampleinput/init%28prompt%3Ainstructions%3Agenerationschema%3A%29 Creates a model sample input with the given prompt, instructions, and schema. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta). ```APIDOC ## init(prompt:instructions:generationSchema:) ### Description Creates a model sample input with the given prompt, instructions, and schema. ### Parameters `prompt` (Prompt) - Required - The prompt to send to the language model. `instructions` (Instructions?) - Optional - Optional system instructions for the model session. `generationSchema` (GenerationSchema?) - Optional - The output schema for the assistant’s response. ``` -------------------------------- ### Define a Prompt for Synthetic Data Generation Source: https://developer.apple.com/documentation/evaluations/generating-synthetic-evaluation-datasets Create a `Prompt` object to describe the characteristics of the synthetic data you want to generate. This prompt guides the model in creating realistic and varied samples. ```swift let syntheticGenerationPrompt = Prompt(""" Generate realistic to-do list items that a busy professional might have. \ Each input is a natural-language request, and the expected output is the structured \ task extracted from it. Cover a mix of work tasks (meetings, deadlines, \ reviews), personal errands (shopping, appointments), health activities \ (exercise, checkups), and home maintenance. Vary urgency and whether a \ due date is specified. ") ``` -------------------------------- ### ModelSample Prompt Property Source: https://developer.apple.com/documentation/evaluations/modelsample/prompt The 'prompt' property represents the user's prompt for this sample. It is available for various Apple platforms starting from version 27.0 Beta. ```APIDOC ## prompt ### Description The user ’s prompt for this sample. ### Availability iOS 27.0+ Beta iPadOS 27.0+ Beta Mac Catalyst 27.0+ Beta macOS 27.0+ Beta visionOS 27.0+ Beta watchOS 27.0+ Beta ### Swift Code ```swift var prompt: Prompt { get } ``` ### Requirements Available when `ExpectedValue` conforms to `Decodable`, `Encodable`, and `Sendable`. ``` -------------------------------- ### init(from:) Source: https://developer.apple.com/documentation/evaluations/aggregationoperation/init%28from%3A%29 Decodes an operation from a keyed container, reconstructing the metric from its name. Available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0. ```APIDOC ## init(from:) ### Description Decodes an operation from a keyed container, reconstructing the metric from its name. ### Signature ```swift init(from decoder: any Decoder) throws ``` ### Availability - iOS 27.0+ - iPadOS 27.0+ - Mac Catalyst 27.0+ - macOS 27.0+ - visionOS 27.0+ - watchOS 27.0+ ``` -------------------------------- ### init(toolCalls:toolOutputs:instructionText:prompts:responses:) Source: https://developer.apple.com/documentation/evaluations/structuredtranscript/init%28toolcalls%3Atooloutputs%3Ainstructiontext%3Aprompts%3Aresponses%3A%29 Creates a structured transcript with optional tool calls, tool outputs, instruction text, user prompts, and model responses. ```APIDOC ## init(toolCalls:toolOutputs:instructionText:prompts:responses:) ### Description Creates a structured transcript. ### Parameters `toolCalls` The tool calls from the session. `toolOutputs` The tool outputs from the session. `instructionText` The system instructions text. `prompts` The user prompts. `responses` The model responses. ``` -------------------------------- ### ModelSampleOutput Initializer Source: https://developer.apple.com/documentation/evaluations/modelsampleoutput/init%28value%3Aexpectations%3A%29 Creates a model sample output with an optional expected value and expectations. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta). ```APIDOC ## init(value:expectations:) ### Description Creates a model sample output with an optional expected value and expectations. ### Parameters #### `value` The expected output value for comparison. #### `expectations` The expected behavior, such as a tool-call trajectory. ### Discussion ```swift let output = ModelSampleOutput(value: "Paris", expectations: nil) ``` ``` -------------------------------- ### Accessing Evaluation Start and End Times Source: https://developer.apple.com/documentation/evaluations/evaluationresult Retrieves the start and end times of the evaluation run. ```swift let endTime: Date ``` ```swift let startTime: Date ``` -------------------------------- ### init(instructions:evaluationTarget:reference:) Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt/init%28instructions%3Aevaluationtarget%3Areference%3A%29 Creates a model-as-judge prompt configuration. You can provide custom system instructions, an optional closure to convert the response to a string, and an optional closure returning labeled reference data. ```APIDOC ## init(instructions:evaluationTarget:reference:) ### Description Creates a model-as-judge prompt configuration. ### Parameters #### `instructions` (String) System instructions for the model-as-judge. Defaults to a general-purpose evaluator prompt. #### `evaluationTarget` ((Input.ExpectedValue) -> String)? Optional closure to convert the response to a string. When `nil`, the response is JSON-serialized. #### `reference` ((Input, Input.ExpectedValue) async throws -> [String : String])? Optional closure returning labeled reference data to include in the judge prompt. ### Example ```swift let prompt = ModelJudgePrompt>( instructions: "You are a domain expert." ) ``` ``` -------------------------------- ### Define BookTags Structure Source: https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators Defines the expected structure for book tags using Generable and Guide from Foundation Models. This guides the language model in generating the output. ```swift @Generable struct BookTags: Codable, Sendable { @Guide(description: "Tags describing the book\'s genre, themes, and setting", .count(3...8)) var tags: [String] } ``` -------------------------------- ### init(url:) Source: https://developer.apple.com/documentation/evaluations/jsonloader/init%28url%3A%29 Creates a loader backed by the JSON or JSONL file at the given URL. This is a beta feature available on multiple Apple platforms. ```APIDOC ## init(url:) ### Description Creates a loader backed by the JSON or JSONL file at the given URL. ### Method Initializer ### Parameters #### Path Parameters - **url** (URL) - Required - The URL of the JSON or JSONL file. ``` -------------------------------- ### init(_:scale:judge:scoringMode:prompt:) Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator/init%28_%3Ascale%3Ajudge%3Ascoringmode%3Aprompt%3A%29 Creates a single-metric evaluator with a custom judge prompt. This initializer is available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, all starting from version 27.0 and in beta. ```APIDOC ## init(_:scale:judge:scoringMode:prompt:) ### Description Creates a single-metric evaluator with a custom judge prompt. ### Parameters #### Parameters - **name** (String) - The metric name that corresponds to the DataFrame column. - **scale** (ScoringScale) - The scoring scale for this metric. - **judge** (any LanguageModel) - The language model to use as judge. - **scoringMode** (ScoringMode) - Optional. A value that indicates whether scores are discrete (default) or allow any floating-point value. - **prompt** (ModelJudgePrompt) - Configuration for the judge prompt, including instructions, response presentation, and reference. ### See Also ### Creating a single-dimension evaluator `init(String, scale: ScoringScale, judge: any LanguageModel, scoringMode: ScoringMode)` ``` -------------------------------- ### Creating a Basic ModelSample Source: https://developer.apple.com/documentation/evaluations/modelsample Instantiates a ModelSample with a simple string prompt and its expected string output. This is useful for basic text-based evaluations. ```swift let sample = ModelSample(prompt: "The capital of France is...", expected: "Paris.") ``` -------------------------------- ### failing(rationale:) Source: https://developer.apple.com/documentation/evaluations/metric/failing%28rationale%3A%29 Returns a metric with a failing result. This method is available on iOS, iPadOS, Mac Catalyst, macOS, and visionOS, all starting from version 27.0 Beta. It can also be used on watchOS starting from version 27.0 Beta. ```APIDOC ## failing(rationale:) ### Description Returns a metric with a failing result. ### Method Signature ```swift func failing(rationale: String? = nil) -> Metric ``` ### Parameters #### Path Parameters - **rationale** (String?) - Optional - A string providing the rationale for the failing result. ``` -------------------------------- ### init(jsonData:) Source: https://developer.apple.com/documentation/evaluations/evaluationresult/init%28jsondata%3A%29 Creates an evaluation result by parsing JSON data. This method is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, starting from version 27.0, and is currently in beta. ```APIDOC ## init(jsonData:) ### Description Creates an evaluation result by parsing JSON data. ### Method `init(jsonData: Data) throws` ### Parameters #### Path Parameters * **data** (Data) - Required - The JSON data to parse. ``` -------------------------------- ### ResultColumn.name Source: https://developer.apple.com/documentation/evaluations/resultcolumn/name Gets the column name in the DataFrame. ```APIDOC ## name ### Description The column name in the DataFrame. ### Property `name` (String) ### Availability iOS 27.0+ Beta iPadOS 27.0+ Beta Mac Catalyst 27.0+ Beta macOS 27.0+ Beta visionOS 27.0+ Beta watchOS 27.0+ Beta ### Code Example ```swift let name: String ``` ``` -------------------------------- ### Define ModelJudgeEvaluator with Scored Examples Source: https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators This snippet shows how to define a `ModelJudgeEvaluator` for email tone. It includes a numeric scale and detailed instructions with four scored examples (scores 4, 3, 2, and 1) to calibrate the model's judgment. ```swift ModelJudgeEvaluator( "EmailTone", scale: .numeric([ 4: "Professional, clear, and well-matched to the scenario, with appropriate warmth.", 3: "Professional and clear, but feels slightly generic, formal, or impersonal.", 2: "Noticeable tone issues: too curt, too informal, or mismatched to the scenario.", 1: "Unprofessional, unclear, rude, or completely inappropriate for the scenario.", ]), judge: SystemLanguageModel.default, prompt: ModelJudgePrompt( instructions: """ You are an expert evaluator of professional email tone. Your task is to evaluate whether an AI-generated email strikes the right professional tone for a workplace setting. Evaluate the email considering: - Professionalism: Uses appropriate language for a workplace. Avoids slang, overly casual phrasing, or unnecessarily stiff formality. - Clarity: Clearly communicates its purpose. The reader immediately understands what is being asked or conveyed. - Warmth: Feels human and approachable. Includes appropriate pleasantries without being excessive. - Appropriateness: The tone matches the scenario: a complaint is firm but respectful; a request is polite but clear; good news is enthusiastic but professional. Here are some examples to calibrate your scoring: ### Example 1 **Prompt:** Write an email to a colleague asking them to review your document by Friday. **Response:** \"Will you take a look at the Q3 report when you get a chance? It would be great to have your feedback by Friday so I can incorporate any changes before the Monday meeting. Let me know if that timeline works for you. Thanks!\" **Score:** 4 **Rationale:** The email is polite, clear, and professional. It states the request, gives a reason for the deadline, and respects the recipient's time by checking if the timeline works. ### Example 2 **Prompt:** Write an email sharing a project status update with stakeholders. **Response:** \"Hi everyone, I wanted to share a quick update on Project Atlas. We completed the design review last week, and development is on track for the June deadline. There are a couple of open questions about the API integration that I'll follow up on separately. Please reach out if you have any concerns.\" **Score:** 3 **Rationale:** Professional and clear with good structure. Could be slightly warmer or more engaging, the update is efficient but reads as formulaic. ### Example 3 **Prompt:** Write an email declining a meeting invitation. **Response:** \"I can't make it. Sorry.\" **Score:** 2 **Rationale:** While not rude, the email is too brief for a professional setting. It doesn't offer an alternative or show engagement with the topic. ### Example 4 **Prompt:** Write an email to a colleague asking them to review your document by Friday. **Response:** \"I need you to review my document. Get it done by Friday.\" **Score:** 1 **Rationale:** The email is curt and demanding. It lacks any politeness, gives no context for the request, and does not acknowledge the recipient's workload or time. Use these examples to calibrate your scoring. Apply the same standards consistently. Evaluate step by step, then assign a score from 4, 3, 2, or 1. """) ) ``` -------------------------------- ### init(expected:arguments:) Source: https://developer.apple.com/documentation/evaluations/trajectoryexpectation/init%28expected%3Aarguments%3A%29 Creates a trajectory expectation for a single expected tool call. This initializer is available for iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta). ```APIDOC ## init(expected:arguments:) ### Description Creates a trajectory expectation for a single expected tool call. ### Parameters #### `toolName` (String) - Required The name of the tool expected to be called. #### `arguments` ([ArgumentMatcher]) - Optional The argument matchers to validate. Defaults to an empty array. ### See Also - `struct ToolExpectation` ``` -------------------------------- ### ModelSample Initializer Signature Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28prompt%3Aexpected%3Ainstructions%3Agenerationschema%3Aexpectations%3A%29-7daed This is the signature for the init(prompt:expected:instructions:generationSchema:expectations:) initializer for ModelSample. It is available on iOS, iPadOS, macOS, visionOS, and watchOS starting from version 27.0. ```swift init( prompt: String, expected: ExpectedValue? = Optional(nilLiteral: (Опционально)), instructions: String? = nil, generationSchema: GenerationSchema? = nil, expectations: TrajectoryExpectation? = nil ) ``` -------------------------------- ### AggregateMetric Structure Source: https://developer.apple.com/documentation/evaluations/aggregatemetric This snippet shows the basic structure of the AggregateMetric and an example of its usage in calculating accuracy. ```APIDOC ## AggregateMetric An aggregate statistic computed from a metric’s results across the evaluation dataset. ```swift struct AggregateMetric ``` ### Overview ```swift let accuracy = Metric("Accuracy") let op = AggregationOperation.mean(of: accuracy) print(op.label) // "Mean of Accuracy" ``` The summary DataFrame stores one `AggregateMetric` for each column. Each value records the operation that produced it, and derives its display label and source metric name from the operation. ### Instance Properties * `let group: String?` The group this aggregate belongs to, if any. * `var label: String` The display label for this aggregate. * `let operation: AggregationOperation` The aggregation operation that produced this value. * `var sourceMetric: String?` The name of the source metric. * `let value: Double` The aggregate value. ### Conforms To * `Decodable` * `Encodable` * `Equatable` * `Sendable` * `SendableMetatype` ``` -------------------------------- ### ModelSampleInput Initializer Source: https://developer.apple.com/documentation/evaluations/modelsampleinput Creates a model sample input with the given prompt, instructions, and schema. ```APIDOC ## init(prompt: Prompt, instructions: Instructions?, generationSchema: GenerationSchema?) ### Description Creates a model sample input with the given prompt, instructions, and schema. ### Parameters #### Path Parameters - **prompt** (Prompt) - Required - The FoundationModels prompt for this input. - **instructions** (Instructions?) - Optional - The optional FoundationModels instructions for this input. - **generationSchema** (GenerationSchema?) - Optional - The output schema for the assistant’s response. ``` -------------------------------- ### Create a Model Sample with Tool Expectation Source: https://developer.apple.com/documentation/evaluations/evaluating-language-model-responses This Swift code demonstrates how to create a `ModelSample` for evaluating tool-calling behavior. It includes a prompt, expected output, and a `TrajectoryExpectation` to verify the `count_letters` tool is called with the correct arguments. ```swift ModelSample( prompt: "Count the letter 'r' in 'strawberry'.", expected: 3, // Attach a trajectory expectation that defines the expected tool-calling sequence. expectations: TrajectoryExpectation( ordered: [ // Expect the model to call `count_letters` with these exact arguments. ToolExpectation( "count_letters", arguments: [ .exact(argumentName: "letter", value: .string("r")), .exact(argumentName: "word", value: .string("strawberry")), ] ), ] ) ), ``` -------------------------------- ### Declare allPass Metric Source: https://developer.apple.com/documentation/evaluations/toolcallevaluator/allpass Declares the allPass metric. Available on iOS, iPadOS, macOS, tvOS, and watchOS beta versions starting from 27.0. ```swift let allPass: Metric ``` -------------------------------- ### init(_:) Source: https://developer.apple.com/documentation/evaluations/metric/init%28_%3A%29 Creates a metric with just a name. Use the factory methods — passing, failing, scoring, or ignore — to produce results. ```APIDOC ## init(_:) ### Description Creates a metric with just a name. Use the factory methods — `passing`, `failing`, `scoring`, or `ignore` — to produce results. ### Method Signature ```swift init(_ name: String) ``` ### Parameters #### Path Parameters - **name** (String) - Required - The name of the metric. ``` -------------------------------- ### Instruction Text Property Source: https://developer.apple.com/documentation/evaluations/structuredtranscript/instructiontext Access the system instruction text from the transcript. Available on iOS, iPadOS, macOS, and more, starting from version 27.0 Beta. ```swift var instructionText: String ``` -------------------------------- ### sourceMetric Source: https://developer.apple.com/documentation/evaluations/aggregatemetric/sourcemetric The name of the source metric. Available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 (Beta). ```APIDOC ## sourceMetric ### Description The name of the source metric. ### Instance Property `var sourceMetric: String? { get }` ``` -------------------------------- ### Create Initial ModelSample Dataset Source: https://developer.apple.com/documentation/evaluations/generating-synthetic-evaluation-datasets Initializes a dataset of ModelSample objects for evaluating a task extraction feature. Each sample includes a prompt and an expected TaskItem. ```swift let dataset: [ModelSample] = [ // Here's a health task that is non-urgent and has a due date. ModelSample( prompt: "Schedule dentist appointment for next Tuesday", expected: TaskItem(title: "Schedule dentist appointment", dueOn: "04/07/2026", category: .health, isUrgent: false) ), // This is an errands task that is urgent and due today. ModelSample( prompt: "Buy groceries for dinner party tonight", expected: TaskItem(title: "Buy groceries for dinner party", dueOn: "03/30/2026", category: .errands, isUrgent: true) ), // Here's a work task that's urgent and has a due date. ModelSample( prompt: "Finish quarterly report by end of week", expected: TaskItem(title: "Finish quarterly report", dueOn: "04/03/2026", category: .work, isUrgent: true) ), // This is a home task that's non-urgent and has a due date. ModelSample( prompt: "Fix the leaky kitchen faucet this weekend", expected: TaskItem(title: "Fix leaky kitchen faucet", dueOn: "04/05/2026", category: .home, isUrgent: false) ), // Here's a personal task that's non-urgent with no due date. ModelSample( prompt: "Learn to cook Thai food", expected: TaskItem(title: "Learn to cook Thai food", dueOn: nil, category: .personal, isUrgent: false) ), ] ``` -------------------------------- ### ModelSample Initializers Source: https://developer.apple.com/documentation/evaluations/modelsample Provides initializers for creating ModelSample instances with different configurations, including string-based prompts, FoundationModels prompts, and prebuilt inputs. ```APIDOC ## Initializers ### `init(prompt: String, expected: ExpectedValue?, instructions: String?, generationSchema: GenerationSchema?, expectations: TrajectoryExpectation?)` Creates a model sample with string-based prompt and instructions. ### `init(prompt: Prompt, expected: ExpectedValue?, instructions: Instructions?, generationSchema: GenerationSchema?, expectations: TrajectoryExpectation?)` Creates a model sample with a FoundationModels prompt. ### `init(input: ModelSampleInput, expected: ExpectedValue?, expectations: TrajectoryExpectation?)` Creates a model sample with a prebuilt input. ``` -------------------------------- ### Defining a Custom SafetyLevel Enum Source: https://developer.apple.com/documentation/evaluations/scorelevel Example of conforming an enum to ScoreLevel to create a 'SafetyLevel' scoring scale. Overrides 'guideDescription' and 'value' for specific cases. ```swift enum SafetyLevel: ScoreLevel { case safe, unsafe var guideDescription: String { switch self { case .safe: return "The response is safe and appropriate" case .unsafe: return "The response contains harmful content" } } var value: Double { switch self { case .safe: return 1 case .unsafe: return 0 } } } let dimension = ScoreDimension("Safety", scale: .custom(SafetyLevel.self)) ``` -------------------------------- ### Custom Evaluator Implementation Source: https://developer.apple.com/documentation/evaluations/evaluatorprotocol An example of a custom evaluator struct conforming to EvaluatorProtocol. It defines a Metric and implements the metrics function to return a scoring value. ```swift struct MyEvaluator: EvaluatorProtocol where Input.ExpectedValue: Sendable & Codable { let metric = Metric("Quality") func metrics( subject: ModelSubject, input: Input ) async throws -> [Metric] { return [metric.scoring(1.0)] } } ``` -------------------------------- ### Creating an Inline Evaluator Source: https://developer.apple.com/documentation/evaluations/evaluator An example of creating an inline evaluator using a closure. The closure receives the input sample and ModelSubject to determine a metric result. ```swift Evaluator { sample, subject in let metric = Metric("TitleMatch") guard let expected = sample.expected else { return metric.ignore() } return subject.value == expected ? metric.passing() : metric.failing() } ``` -------------------------------- ### Default Instructions Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt Accesses the default system instructions used when no custom instructions are provided for a ModelJudgePrompt. ```APIDOC ## static var defaultInstructions: String ### Description The default system instructions used when no custom instructions are provided. ### Properties * **defaultInstructions** (String) - The default system instructions. ``` -------------------------------- ### ModelSample Initializer with Prebuilt Input Source: https://developer.apple.com/documentation/evaluations/modelsample/init%28input%3Aexpected%3Aexpectations%3A%29 Use this initializer to create a ModelSample instance when you have a pre-defined input. It allows for optional expected values and trajectory expectations. ```swift init( input: ModelSampleInput, expected: ExpectedValue? = Optional(nilLiteral: ()), expectations: TrajectoryExpectation? = nil ) ``` -------------------------------- ### startTime Property Declaration Source: https://developer.apple.com/documentation/evaluations/evaluationresult/starttime Declares the startTime property, which returns a Date object representing the start time of the evaluation run. Available in beta versions. ```swift let startTime: Date ``` -------------------------------- ### ModelSampleInput Initializer Source: https://developer.apple.com/documentation/evaluations/modelsampleinput/init%28prompt%3Ainstructions%3Agenerationschema%3A%29 Use this initializer to create a ModelSampleInput object. It requires a prompt and can optionally include system instructions and an output generation schema. ```swift init( prompt: Prompt, instructions: Instructions? = nil, generationSchema: GenerationSchema? = nil ) ``` -------------------------------- ### Define SampleProtocol Source: https://developer.apple.com/documentation/evaluations/sampleprotocol This snippet shows the basic definition of the SampleProtocol, which conforms to Decodable, Encodable, and Sendable. ```swift protocol SampleProtocol : Decodable, Encodable, Sendable ``` -------------------------------- ### ArgumentValue.int(_:) Source: https://developer.apple.com/documentation/evaluations/argumentvalue/int%28_%3A%29 Represents an integer value. This is a beta feature available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0. ```APIDOC ## ArgumentValue.int(_:) ### Description An integer value. ### Method Signature `case int(Int)` ### Availability - iOS 27.0+ - iPadOS 27.0+ - Mac Catalyst 27.0+ - macOS 27.0+ - visionOS 27.0+ - watchOS 27.0+ ### Beta Software Notice This documentation contains preliminary information about an API or technology in development. This information is subject to change, and software implemented according to this documentation should be tested with final operating system software. ``` -------------------------------- ### Default System Instructions Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator Provides the default system instructions used by the judge model when no custom instructions are specified. ```swift static var defaultInstructions: String ``` -------------------------------- ### Setting ScoringMode to Discrete Source: https://developer.apple.com/documentation/evaluations/scoringmode Example of setting the scoring mode to 'discrete' in Swift. This mode constrains the judge model to return one of the defined scale values. ```swift let mode: ScoringMode = .discrete ``` -------------------------------- ### init(ordered:unordered:disallowed:) Source: https://developer.apple.com/documentation/evaluations/trajectoryexpectation/init%28ordered%3Aunordered%3Adisallowed%3A%29 Creates a trajectory expectation with ordered and unordered requirements, plus specific tools that must not be called. ```APIDOC ## init(ordered:unordered:disallowed:) ### Description Creates a trajectory expectation with ordered and unordered requirements, plus specific tools that must not be called. ### Parameters #### `ordered` - Type: `[ToolExpectation]` - Required: No (defaults to `[]`) - Description: Steps that must be satisfied in sequential order. #### `unordered` - Type: `[ToolExpectation]` - Required: No (defaults to `[]`) - Description: Tool calls that must occur at some point, regardless of position. #### `disallowed` - Type: `[ToolExpectation]` - Required: Yes - Description: Tools that must NOT be called. ### Discussion Additional tool calls beyond the expected ones are always allowed when using disallowed expectations — the disallowed list targets specific tools while permitting everything else. To disallow _all_ unexpected calls instead, use `init(ordered:unordered:allowsAdditionalToolCalls:)` with `allowsAdditionalToolCalls: false`. ``` -------------------------------- ### ModelJudgePrompt Initialization Source: https://developer.apple.com/documentation/evaluations/modeljudgeprompt Initializes a ModelJudgePrompt configuration with custom instructions, an optional evaluation target closure, and an optional reference data closure. ```APIDOC ## init(instructions: String, evaluationTarget: ((Input.ExpectedValue) -> String)?, reference: ((Input, Input.ExpectedValue) async throws -> [String : String])?) ### Description Creates a model-as-judge prompt configuration. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body * **instructions** (String) - Required - The system instructions for the judge model. * **evaluationTarget** ((Input.ExpectedValue) -> String)? - Optional - A closure that converts the model’s response to a string for the judge prompt. * **reference** ((Input, Input.ExpectedValue) async throws -> [String : String])? - Optional - A closure that provides labeled reference data to include in the model-as-judge prompt. ``` -------------------------------- ### StreamLoader Struct Declaration Source: https://developer.apple.com/documentation/evaluations/streamloader Declares the StreamLoader struct, which is generic over a Sample type conforming to SampleProtocol. It is available on various Apple platforms starting from beta versions. ```swift struct StreamLoader where Sample : SampleProtocol ``` -------------------------------- ### ModelSubject.value Source: https://developer.apple.com/documentation/evaluations/modelsubject/value The typed value produced by the model. This property is available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS, starting from version 27.0 (Beta). ```APIDOC ## ModelSubject.value ### Description The typed value produced by the model. ### Availability iOS 27.0+ Beta iPadOS 27.0+ Beta Mac Catalyst 27.0+ Beta macOS 27.0+ Beta visionOS 27.0+ Beta watchOS 27.0+ Beta ### Instance Property ```swift var value: Value ``` ``` -------------------------------- ### Aggregate Metrics Summary Source: https://developer.apple.com/documentation/evaluations/evaluating-language-model-responses Implement `aggregateMetrics(using:)` to define how metrics are summarized into high-level statistics like the mean. ```swift func aggregateMetrics(using aggregator: inout MetricsAggregator) { aggregator.computeMean(of: exactMatch) aggregator.computeMean(of: absoluteError) } ``` -------------------------------- ### Declare ScoringMode Variable Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator/scoringmode Declare a variable to hold the scoring mode for the evaluator. This property is available on iOS, iPadOS, macOS, and visionOS starting from version 27.0. ```swift let scoringMode: ScoringMode ``` -------------------------------- ### Define Dataset with Model Samples Source: https://developer.apple.com/documentation/evaluations/evaluating-language-model-responses Define a dataset for evaluation using `ArrayLoader` and `ModelSample`. Each sample includes a prompt and an expected output. ```swift import Evaluations import FoundationModels struct LetterCountEvaluation: Evaluation { let dataset = ArrayLoader(samples: [ ModelSample(prompt: "Count the letter 'r' in 'strawberry'.", expected: 3), ModelSample(prompt: "How many a's are in 'banana'?", expected: 3), ModelSample(prompt: "Mississippi contains how many s?", expected: 4), ModelSample(prompt: "What's the number of l in hello?", expected: 2), ModelSample(prompt: "The letter 'e' in 'bookkeeper' appears how many times?", expected: 3), ]) ``` -------------------------------- ### resultID Source: https://developer.apple.com/documentation/evaluations/evaluationresult/resultid A unique identifier for this particular result. This property is available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0 Beta. ```APIDOC ## resultID ### Description A unique identifier for this particular result. ### Property `let resultID: UUID` ### Availability - iOS 27.0+ Beta - iPadOS 27.0+ Beta - Mac Catalyst 27.0+ Beta - macOS 27.0+ Beta - visionOS 27.0+ Beta - watchOS 27.0+ Beta ``` -------------------------------- ### ArgumentValue.double(_:) Source: https://developer.apple.com/documentation/evaluations/argumentvalue/double%28_%3A%29 Represents a double-precision floating-point value. This is a beta feature available on iOS, iPadOS, Mac Catalyst, macOS, visionOS, and watchOS starting from version 27.0. ```APIDOC ## ArgumentValue.double(_:) ### Description Represents a double-precision floating-point value. ### Method Signature `case double(Double)` ### Platforms iOS 27.0+ Beta iPadOS 27.0+ Beta Mac Catalyst 27.0+ Beta macOS 27.0+ Beta visionOS 27.0+ Beta watchOS 27.0+ Beta ### See Also * `case string(String)` * `case int(Int)` * `case bool(Bool)` ``` -------------------------------- ### defaultInstructions Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator/defaultinstructions Retrieves the default system instructions used by the ModelJudgeEvaluator when no custom instructions are provided. This is a read-only property. ```APIDOC ## defaultInstructions ### Description Provides the default system instructions for the ModelJudgeEvaluator. This property is used when a user does not supply custom instructions. ### Property `static var defaultInstructions: String { get }` ### Returns A `String` representing the default system instructions. ``` -------------------------------- ### SampleGenerator Sampling Strategy Property Source: https://developer.apple.com/documentation/evaluations/samplegenerator/samplingstrategy-swift.property Declares the optional samplingStrategy property for the SampleGenerator. This property determines how existing samples are chosen as examples for prompts and influences retry behavior. ```swift var samplingStrategy: SampleGenerator.SamplingStrategy? ``` -------------------------------- ### Default Instructions Source: https://developer.apple.com/documentation/evaluations/modeljudgeevaluator Provides the default system instructions that the judge model uses when no custom instructions are explicitly provided. This is useful for understanding the baseline behavior of the judge. ```APIDOC ## static var defaultInstructions: String ### Description The default system instructions the model uses when no custom instructions are provided. ### Returns - **String** - The default system instructions. ``` -------------------------------- ### Define Custom EvaluationSubject Source: https://developer.apple.com/documentation/evaluations/evaluationsubject Conform to the EvaluationSubject protocol to create your own subject types. This example shows a basic struct MySubject that holds a codable value and an optional transcript. ```swift protocol EvaluationSubject struct MySubject: EvaluationSubject { var value: Value var transcript: StructuredTranscript? } ``` -------------------------------- ### run() Source: https://developer.apple.com/documentation/evaluations/samplegenerator/run%28%29 Runs the generator and returns a stream of newly synthesized samples. Each element in the returned stream is a newly generated sample. After iteration completes, access `samples` to retrieve the full dataset (initial + generated), or `invalidSamples` to see samples the validator rejected. ```APIDOC ## run() ### Description Runs the generator and returns a stream of newly synthesized samples. Each element in the returned stream is a newly generated sample. After iteration completes, access `samples` to retrieve the full dataset (initial + generated), or `invalidSamples` to see samples the validator rejected. ### Method `run()` ### Return Value An async throwing stream of individual samples (`AsyncSequence`). ``` -------------------------------- ### Run Sample Generation Source: https://developer.apple.com/documentation/evaluations/samplegenerator Executes the sample generation process. Returns an asynchronous stream of newly synthesized samples. Ensure the generator is configured before calling this method. ```swift func run() -> some AsyncSequence ``` -------------------------------- ### Disallowed Tools Source: https://developer.apple.com/documentation/evaluations/trajectoryexpectation/disallowed Declare a list of tools that the model must not call. This property is available on iOS, iPadOS, macOS, tvOS, and watchOS starting from version 27.0 (Beta). ```swift var disallowed: [ToolExpectation] ```