=============== LIBRARY RULES =============== From library maintainers: - Use Pickled v2 schema only. - Do not use scenarios, toolsets, access, or checks. - MCP uses contexts with mode: mcp and servers. - Questions use facts and misstatements. - Builds use verifier.failToPass and passToPass. ### Build Configuration Example Source: https://docs.pickled.dev/getting-started Define a build job that includes a goal, agents, contexts, workspace setup, and a verifier to check the agent's work. ```yaml builds: - id: add-endpoint goal: Add a health-check endpoint using my-product. agents: [quick] contexts: [from_readme] trials: 3 workspace: path: ./fixtures/app setup: [bun install] verifier: failToPass: - { run: bun test } ``` -------------------------------- ### Pickled Check Output Example Source: https://docs.pickled.dev/getting-started Example output from a Pickled check run, showing task details, agent and context performance, and overall score. ```text Task: How do I install my-product? [quick · from_readme] ✓ Well grounded 1/1 Overall: 100 / 100 · threshold 80 · run passes Every question met its checks. ``` -------------------------------- ### Configuring Build Verification Source: https://docs.pickled.dev/pickled-yml Define build configurations to verify agent implementation capabilities. Builds specify goals, workspaces, setup commands, and verifier rules for success and failure. ```yaml builds: - id: toolbar goal: Add a custom toolbar using my-product. agents: [builder] contexts: [given_docs] trials: 3 requires: [install_command] workspace: path: ./fixtures/react-app setup: - bun install --frozen-lockfile verifier: failToPass: - { name: toolbar behavior, run: bun test tests/toolbar.test.ts } passToPass: - { run: bun run typecheck } referenceSolution: patch: ./fixtures/solutions/toolbar.patch ``` -------------------------------- ### Score Example Answers Offline Source: https://docs.pickled.dev/getting-started Use 'pickled test' to score example answers against your fact contract without making any model calls. ```bash bunx @pickled-dev/cli test . ``` -------------------------------- ### Initialize Pickled Project Source: https://docs.pickled.dev/ Use this command to initialize a new Pickled project. It sets up the necessary configuration files for your project. ```bash bunx @pickled-dev/cli init ``` -------------------------------- ### Compare Context Paths Configuration Source: https://docs.pickled.dev/getting-started Configure multiple contexts for a question to compare how different sources or model memory perform. ```yaml contexts: memory: { mode: memory } # model memory only from_readme: { mode: inject, source: readme } # your README injected questions: - id: install question: How do I install my-product? agents: [quick] contexts: [memory, from_readme] expects: [install_command] ``` -------------------------------- ### Product Information Configuration Source: https://docs.pickled.dev/pickled-yml The product section is required and must include both 'name' and 'description'. These fields are used in the system prompt to inform the agent about the product. ```yaml product: name: my-product description: short one-liner about what your product does ``` -------------------------------- ### Preview Pickled Check Run Source: https://docs.pickled.dev/getting-started Use the --plan flag to see the execution plan of your checks without invoking any models. ```bash bunx @pickled-dev/cli check . --plan ``` -------------------------------- ### Smallest Pickled Configuration Source: https://docs.pickled.dev/getting-started This is a minimal configuration file for Pickled, defining schema version, product details, sources, agents, contexts, and a basic question with expected facts. ```yaml schemaVersion: 2 product: name: my-product description: short one-liner about what your product does sources: readme: { path: ./README.md } agents: quick: provider: claude-code model: claude-haiku-4-5 contexts: from_readme: { mode: inject, source: readme } facts: install_command: statement: my-product installs with bunx my-product. match: allOf: ["bunx my-product"] questions: - id: install question: How do I install my-product? agents: [quick] contexts: [from_readme] expects: [install_command] thresholds: questions: 80 ``` -------------------------------- ### Web Context Configuration Source: https://docs.pickled.dev/getting-started Add a 'web' context to allow the agent to discover answers by browsing the web. Answers without tool invocation are vetoed. ```yaml contexts: web_open: { mode: web } ``` -------------------------------- ### Pull-Request-Safe GitHub Actions Workflow Source: https://docs.pickled.dev/github-actions This workflow runs Pickled tests, checks, and builds on pull requests and pushes to the main branch. It uses `actions/checkout` and `oven-sh/setup-bun` to set up the environment and then executes Pickled commands for deterministic checks. ```yaml name: pickled on: pull_request: push: branches: [main] workflow_dispatch: jobs: deterministic: runs-on: ubuntu-latest steps: - uses: actions/checkout@v6 - uses: oven-sh/setup-bun@v2 - run: bun install - run: bunx @pickled-dev/cli test . - run: bunx @pickled-dev/cli check . --plan - run: bunx @pickled-dev/cli build . --plan ``` -------------------------------- ### Defining Product Facts Source: https://docs.pickled.dev/pickled-yml Configure reusable product truths that an answer must cover. Facts use a statement and a match condition with 'allOf' and/or 'anyOf' substrings. ```yaml facts: install_command: statement: my-product is installed with bunx my-product. match: allOf: ["bunx my-product"] entry_point: statement: Names the documented entry point. match: anyOf: ["quickstart", "getting-started"] ``` -------------------------------- ### Setting Thresholds for Checks and Builds Source: https://docs.pickled.dev/pickled-yml Configure optional per-kind gates for questions and builds. These thresholds (1-100) determine pass/fail criteria for 'pickled check' and 'pickled build' commands. ```yaml thresholds: questions: 80 builds: 80 ``` -------------------------------- ### Run Pickled Check Source: https://docs.pickled.dev/ Execute a Pickled check against your project. This command analyzes agent responses based on the defined configuration and sources. ```bash bunx @pickled-dev/cli check . ``` -------------------------------- ### Sample Larger Suites in GitHub Actions Source: https://docs.pickled.dev/github-actions This snippet demonstrates how to sample a larger suite of tests in a GitHub Actions workflow. It uses a seed derived from the pull request number or commit SHA to ensure reproducibility and captures detailed execution information. ```yaml - run: bunx @pickled-dev/cli check . --sample 2 --seed pull-${{ github.event.pull_request.number || github.sha }} env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ``` -------------------------------- ### Schema Version Configuration Source: https://docs.pickled.dev/pickled-yml The schemaVersion is a required field and must be set to 2. Version 1 configurations are rejected with a migration hint. ```yaml schemaVersion: 2 ``` -------------------------------- ### Agent Configuration Source: https://docs.pickled.dev/pickled-yml Configure agents with their provider, model, and specific parameters like maxTurns, temperature, or maxTokens. Providers include claude-code, codex-cli, anthropic, and openai. ```yaml agents: quick: provider: claude-code model: claude-haiku-4-5 maxTurns: 5 api: provider: anthropic model: claude-haiku-4-5 temperature: 0 maxTokens: 4096 ``` -------------------------------- ### Defining Agent Questions Source: https://docs.pickled.dev/pickled-yml Define questions to probe agent capabilities, specifying expected facts and rejected misstatements. Questions link to agents, contexts, and expected/rejected IDs. ```yaml questions: - id: install question: How do I install my-product? agents: [quick, api] contexts: [memory, given_docs, web_open] expects: [install_command, entry_point] rejects: [npm_install] examples: pass: - "Install it with `bunx my-product`, then run the quickstart." fail: - "Run `npm install my-product-cli`." ``` -------------------------------- ### Context Definitions for Pickled Source: https://docs.pickled.dev/pickled-yml Define contexts for agents to interact with sources. Modes include memory, inject, web, and mcp, each with specific configuration options like source, servers, and headers. ```yaml contexts: memory: { mode: memory } # prior knowledge, nothing injected given_docs: { mode: inject, source: docs } # source content placed in the prompt web_open: { mode: web } # open web discovery, no declared source web_docs: { mode: web, source: docs } # docs handed over, reached with web tools mcp_mintlify: mode: mcp servers: mintlify: url: https://docs.example.com/mcp headers: AUTH_TOKEN: ${AUTH_TOKEN} ``` -------------------------------- ### Environment Variable Expansion Source: https://docs.pickled.dev/pickled-yml Configure string values to expand environment variables matching the ${UPPER_SNAKE_CASE} pattern. Missing variables are replaced with empty strings. ```yaml String values matching "${UPPER_SNAKE_CASE}" are replaced with the corresponding `process.env` entry at load. Missing env vars become empty strings so the failure surfaces at the call site (e.g. a 401 from the MCP server) rather than at config load. Bun auto-loads ".env". ``` -------------------------------- ### GitHub Actions Workflow for Real Agent Runs Source: https://docs.pickled.dev/github-actions This workflow is designed for real agent runs that may consume tokens and edit workspaces. It is triggered manually or on a schedule and includes environment variables for API keys. It uses `--max-cells` to limit the scope of executions. ```yaml on: workflow_dispatch: schedule: - cron: "17 8 * * 1" jobs: real-agent-benchmark: runs-on: ubuntu-latest if: github.event_name == 'workflow_dispatch' || github.event_name == 'schedule' steps: - uses: actions/checkout@v6 - uses: oven-sh/setup-bun@v2 - run: bun install - run: bunx @pickled-dev/cli check . --max-cells 20 env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - run: bunx @pickled-dev/cli build . --max-cells 6 env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ``` -------------------------------- ### Source Definitions for Pickled Source: https://docs.pickled.dev/pickled-yml Define sources for Pickled to use, such as local files, remote URLs, or codebases. Supports path, url, codebase, exclude, and maxBytes options. ```yaml sources: readme: { path: ./README.md } docs: { url: https://example.com/llms-full.txt } code: { codebase: ./src/**/*.ts, exclude: ["**/*.test.ts"], maxBytes: 262144 } ``` -------------------------------- ### YAML Language Server Directive Source: https://docs.pickled.dev/pickled-yml Use this directive to enable editor autocomplete and inline validation for your pickled.yml file by pointing to the published JSON Schema. ```yaml # yaml-language-server: $schema=https://pickled.dev/schema/pickled.schema.json ``` -------------------------------- ### Defining Misstatements Source: https://docs.pickled.dev/pickled-yml Configure reusable wrong claims that an answer must not make. Misstatements use the same 'statement' and 'match' structure as facts and act as hard vetoes. ```yaml misstatements: npm_install: statement: The answer recommends installing with npm. match: anyOf: ["npm install my-product-cli"] ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.