### Install @plust/datasleuth with npm or yarn

Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md

Instructions for installing the @plust/datasleuth package using either npm or yarn package managers. This is the first step to using the library.

```bash
npm install @plust/datasleuth
```

```bash
yarn add @plust/datasleuth
```

--------------------------------

### Quick Start Research Example with @plust/datasleuth

Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md

A minimal TypeScript example demonstrating how to perform a research query using @plust/datasleuth. It shows schema definition, search provider configuration (Google), and execution of the research function with default settings. It includes error handling for the research process.

```typescript
import { research } from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';
import { google } from '@plust/search-sdk';

// Define your output schema
const outputSchema = z.object({
  summary: z.string(),
  keyFindings: z.array(z.string()),
  sources: z.array(z.string().url()),
});

// Configure Google search provider
const searchProvider = google.configure({
  apiKey: process.env.GOOGLE_API_KEY,
  cx: process.env.GOOGLE_CX, // Your search engine ID
});

// Execute research with the default pipeline
async function runResearch() {
  try {
    const results = await research({
      query: 'Latest advancements in quantum computing',
      outputSchema,
      defaultLLM: openai('gpt-4o'),
      defaultSearchProvider: searchProvider,
    });

    console.log('Research Summary:', results.summary);
    console.log('Key Findings:', results.keyFindings);
    console.log('Sources:', results.sources);
  } catch (error) {
    console.error('Research failed:', error);
  }
}

runResearch();
```

--------------------------------

### Configure Default Search Provider for Research

Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md

TypeScript example showing how to configure and use a default search provider (Google) within the @plust/datasleuth research function. This simplifies research by applying the same provider to all search steps unless overridden.

```typescript
import { research } from '@plust/datasleuth';
import { z } from 'zod';
import { google } from '@plust/search-sdk';
import { openai } from '@ai-sdk/openai';

// Configure Google search
const searchProvider = google.configure({
  apiKey: process.env.GOOGLE_API_KEY,
  cx: process.env.GOOGLE_CX
});

// Execute research with default steps
const results = await research({
  query: "Renewable energy trends 2025",
  outputSchema: z.object({...}), // Placeholder for actual schema
  defaultLLM: openai('gpt-4o'),
  defaultSearchProvider: searchProvider
});
```

--------------------------------

### Specify Search Providers in Individual Research Steps

Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md

TypeScript example demonstrating how to specify different search providers (Google and Bing) for individual `searchWeb` steps within a @plust/datasleuth research pipeline. This offers granular control over search sources for different stages of research.

```typescript
import { research, searchWeb, extractContent } from '@plust/datasleuth';
import { z } from 'zod';
import { google, bing } from '@plust/search-sdk';
import { openai } from '@ai-sdk/openai';

// Configure search providers
const googleSearch = google.configure({
  apiKey: process.env.GOOGLE_API_KEY,
  cx: process.env.GOOGLE_CX
});

const bingSearch = bing.configure({
  apiKey: process.env.BING_API_KEY
});

// Execute research with custom steps
const results = await research({
  query: "Renewable energy trends 2025",
  outputSchema: z.object({...}), // Placeholder for actual schema
  defaultLLM: openai('gpt-4o'),
  steps: [
    searchWeb({ provider: googleSearch, maxResults: 10 }),
    extractContent(),
    searchWeb({ provider: bingSearch, query: "renewable energy innovations" }),
    extractContent()
    // Other steps...
  ]
});
```

--------------------------------

### Install @plust/datasleuth Package

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This command installs the @plust/datasleuth package using npm. Ensure you have Node.js and npm installed on your system.

```bash
npm install @plust/datasleuth
```

--------------------------------

### Isolate Research Steps for Testing (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This example demonstrates how to isolate and test individual steps within a `research` pipeline. By explicitly providing a `steps` array containing only the desired step, developers can debug specific functionalities.

```typescript
// Test just the search step
const result = await research({
  query: "Your query",
  outputSchema: schema,
  steps: [searchWeb({ provider: googleSearch })]
});
```

--------------------------------

### AI Orchestration for Market Research

Source: https://context7.com/plustorg/datasleuth/llms.txt

This example showcases how to configure and use the `research` function with AI orchestration for comprehensive market research. It defines custom prompts, tools, and exit criteria for an AI agent.

```APIDOC
## AI Orchestration Example

### Description
This example demonstrates how to leverage AI agents to dynamically select and execute research tools for market analysis. It includes setting up a search provider, defining a custom prompt for market research, and configuring various tools like web search, content extraction, analysis, fact-checking, and summarization.

### Method
N/A (This is a client-side code example)

### Endpoint
N/A (This is a client-side code example)

### Parameters
#### Path Parameters
N/A

#### Query Parameters
N/A

#### Request Body
N/A

### Request Example
```typescript
import {
  research,
  orchestrate,
  searchWeb,
  extractContent,
  analyze,
  factCheck,
  summarize
} from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';
import { google } from '@plust/search-sdk';

const outputSchema = z.object({
  marketOverview: z.string(),
  technologies: z.array(z.object({
    name: z.string(),
    maturityLevel: z.enum(['research', 'emerging', 'growth', 'mature']),
    marketPotential: z.number().min(1).max(10),
    keyPlayers: z.array(z.string())
  })),
  investmentOpportunities: z.array(z.string()),
  risks: z.array(z.string()),
  confidenceScore: z.number().min(0).max(1)
});

const searchProvider = google.configure({
  apiKey: process.env.GOOGLE_API_KEY
});

// Custom orchestration prompt for market research
const marketResearchPrompt = `
You are conducting comprehensive market research on emerging technologies.
Your goal is to:
1. Gather information from multiple sources
2. Analyze technology maturity and market potential
3. Identify key players and investment opportunities
4. Assess risks and challenges
5. Validate findings with fact-checking

Choose tools strategically to build a complete market analysis.
`;

const results = await research({
  query: 'Green hydrogen production and storage technologies',
  outputSchema,
  steps: [
    orchestrate({
      model: openai('gpt-4o'),
      searchProvider,
      customPrompt: marketResearchPrompt,
      maxIterations: 20,
      continueOnError: true,
      includeInResults: true,

      // Custom tools available to the agent
      tools: {
        searchWeb: searchWeb({ provider: searchProvider, maxResults: 15 }),
        extractContent: extractContent({ maxUrls: 12 }),
        analyze: analyze({ llm: openai('gpt-4o'), focus: 'market-analysis' }),
        factCheck: factCheck({ llm: openai('gpt-4o'), threshold: 0.8 }),
        summarize: summarize({ llm: openai('gpt-4o'), format: 'structured' })
      },

      // Exit when we have high confidence and sufficient data
      exitCriteria: (state) => {
        const hasEnoughData =
          state.data.searchResults?.length >= 20 &&
          state.data.extractedContent?.length >= 10 &&
          state.data.analysis !== undefined;

        const hasHighConfidence =
          state.metadata.confidenceScore >= 0.85;

        return hasEnoughData && hasHighConfidence;
      },

      retry: {
        maxRetries: 3,
        baseDelay: 2000
      }
    })
  ]
});

console.log('Market Overview:', results.marketOverview);
console.log(`Analyzed ${results.technologies.length} technologies`);
console.log(`Identified ${results.investmentOpportunities.length} opportunities`);
console.log(`Confidence score: ${results.confidenceScore}`);
```

### Response
#### Success Response (200)
- **marketOverview** (string) - A summary of the market analysis.
- **technologies** (array) - An array of analyzed technologies with their details.
  - **name** (string) - The name of the technology.
  - **maturityLevel** (enum) - The maturity level of the technology ('research', 'emerging', 'growth', 'mature').
  - **marketPotential** (number) - The market potential score (1-10).
  - **keyPlayers** (array) - A list of key players in the technology's market.
- **investmentOpportunities** (array) - A list of identified investment opportunities.
- **risks** (array) - A list of identified risks and challenges.
- **confidenceScore** (number) - A score indicating the confidence in the analysis (0-1).

#### Response Example
```json
{
  "marketOverview": "Green hydrogen production and storage is a rapidly growing sector driven by the need for sustainable energy solutions...",
  "technologies": [
    {
      "name": "Electrolyzer Technology",
      "maturityLevel": "growth",
      "marketPotential": 9,
      "keyPlayers": ["Plug Power", "Cummins", "Siemens Energy"]
    },
    {
      "name": "Advanced Battery Storage",
      "maturityLevel": "mature",
      "marketPotential": 7,
      "keyPlayers": ["Tesla", "LG Energy Solution", "Panasonic"]
    }
  ],
  "investmentOpportunities": [
    "Investing in green hydrogen production facilities.",
    "Developing advanced electrolyzer components."
  ],
  "risks": [
    "High initial production costs.",
    "Grid integration challenges."
  ],
  "confidenceScore": 0.88
}
```
```

--------------------------------

### Clone Repository and Install Dependencies (Bash)

Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md

Commands to clone the datasleuth repository from GitHub and install project dependencies using npm. Ensures the local development environment is set up correctly.

```bash
git clone https://github.com/YOUR_USERNAME/datasleuth.git
cd datasleuth
npm install
```

--------------------------------

### Load Environment Variables with dotenv

Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md

TypeScript code snippet demonstrating how to load environment variables from a `.env` file using the `dotenv` package. This is crucial for securely managing API keys and other configuration settings required by @plust/datasleuth and its associated SDKs.

```typescript
import dotenv from 'dotenv';
dotenv.config();

// Now process.env.OPENAI_API_KEY, etc. are available
```

--------------------------------

### Basic Research Example

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This TypeScript snippet illustrates how to perform basic research using the @plust/datasleuth library. It sets up a Zod schema for structuring the output and integrates with OpenAI through the Vercel AI SDK.

```typescript
import {
  research
} from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';

// Define your output schema
const outputSchema = z.object({
  summary: z.string(),
  keyFindings: z.array(z.string()),
  sources: z.array(z.string().url()),
});

// Execute research with default pipeline
const results = await research({
  query: 'Latest advancements in quantum computing',
  outputSchema,
  defaultLLM: openai('gpt-4o'),
});
```

--------------------------------

### Set Maximum Content Length (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This example shows how to configure a function, likely related to content extraction, to accept a `maxContentLength` option. This helps in managing memory usage by limiting the amount of content processed at once.

```typescript
extractContent({ maxContentLength: 50000 })
```

--------------------------------

### Integrate Multiple LLMs with Datasleuth and Vercel AI SDK

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This example showcases how to integrate various Large Language Models (LLMs) using the Vercel AI SDK with Datasleuth for research. It allows specifying different LLM providers like OpenAI and Anthropic for distinct research steps such as planning, analysis, fact-checking, and summarization. Dependencies include '@plust/datasleuth', '@ai-sdk/openai', '@ai-sdk/anthropic', and 'zod'.

```typescript
import {
  research,
  plan,
  analyze,
  factCheck,
  summarize,
} from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';

// Define your output schema
const outputSchema = z.object({
  summary: z.string(),
  analysis: z.object({
    insights: z.array(z.string()),
  }),
  factChecks: z.array(
    z.object({
      statement: z.string(),
      isValid: z.boolean(),
    })
  ),
});

// Use different LLM providers for different steps
const results = await research({
  query: 'Advancements in gene editing technologies',
  outputSchema,
  steps: [
    // Use OpenAI for research planning
    plan({
      llm: openai('gpt-4o'),
      temperature: 0.4,
    }),

    // Use Anthropic for specialized analysis
    analyze({
      llm: anthropic('claude-3-opus-20240229'),
      focus: 'ethical-considerations',
      depth: 'comprehensive',
    }),

    // Use OpenAI for fact checking
    factCheck({
      llm: openai('gpt-4o'),
      threshold: 0.8,
      includeEvidence: true,
    }),

    // Use Anthropic for final summarization
    summarize({
      llm: anthropic('claude-3-sonnet-20240229'),
      format: 'structured',
      maxLength: 2000,
    }),
  ],
});
```

--------------------------------

### Generate Research Plans with AI using DataSleuth

Source: https://context7.com/plustorg/datasleuth/llms.txt

This example shows how to generate structured research plans with AI-defined objectives and search queries using the DataSleuth library. It utilizes the 'plan' step to define a custom prompt for the AI, focusing on specific research areas. Dependencies include '@plust/datasleuth', 'zod', '@ai-sdk/openai', and '@plust/search-sdk'.

```typescript
import { research, plan, searchWeb, extractContent } from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';

const customPlanningPrompt = `
You are a strategic research planning assistant specializing in technology trends.
Create a comprehensive research plan focusing on:
1. Market size and growth projections
2. Key technological innovations
3. Major industry players and their strategies
4. Regulatory landscape
5. Future outlook and predictions
`;

const outputSchema = z.object({
  findings: z.array(z.string()),
  sources: z.array(z.string().url())
});

const results = await research({
  query: 'Artificial intelligence in healthcare diagnostics',
  outputSchema,
  steps: [
    plan({
      llm: openai('gpt-4o'),
      customPrompt: customPlanningPrompt,
      temperature: 0.3,
      includeInResults: true,
      retry: {
        maxRetries: 3,
        baseDelay: 1000
      }
    }),
    searchWeb({ useQueriesFromPlan: true }),
    extractContent()
  ],
  defaultSearchProvider: google.configure({
    apiKey: process.env.GOOGLE_API_KEY
  })
});

// Access the generated plan
console.log('Research Objectives:', results.researchPlan.objectives);
console.log('Search Queries:', results.researchPlan.searchQueries);
console.log('Data Strategy:', results.researchPlan.dataGatheringStrategy);

```

--------------------------------

### Execute Parallel Research Tracks with Datasleuth

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This example demonstrates how to run multiple research tracks concurrently using Datasleuth's `parallel` and `track` functionalities. It allows configuring different search providers (e.g., Google, Bing) and LLM analyses for each track. The results can be merged using a custom strategy like 'weighted' with specified conflict resolution. Dependencies include '@plust/datasleuth', '@plust/search-sdk', '@ai-sdk/openai', and 'zod'.

```typescript
import {
  research,
  track,
  parallel,
  searchWeb,
  extractContent,
  analyze,
  ResultMerger,
} from '@plust/datasleuth';
import { z } from 'zod';
import { google, bing } from '@plust/search-sdk';
import { openai } from '@ai-sdk/openai';

// Configure search providers
const googleSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY });
const bingSearch = bing.configure({ apiKey: process.env.BING_API_KEY });

// Define your output schema
const outputSchema = z.object({
  summary: z.string(),
  findings: z.array(
    z.object({
      topic: z.string(),
      details: z.string(),
      confidence: z.number(),
    })
  ),
  sources: z.array(z.string().url()),
});

// Execute parallel research tracks
const results = await research({
  query: 'Quantum computing applications in healthcare',
  outputSchema,
  steps: [
    parallel({
      tracks: [
        track({
          name: 'academic',
          steps: [
            searchWeb({
              provider: googleSearch,
              query: 'quantum computing healthcare scholarly articles',
            }),
            extractContent(),
            analyze({
              llm: openai('gpt-4o'),
              focus: 'academic-research',
            }),
          ],
        }),
        track({
          name: 'commercial',
          steps: [
            searchWeb({
              provider: bingSearch,
              query: 'quantum computing healthcare startups companies',
            }),
            extractContent(),
            analyze({
              llm: openai('gpt-4o'),
              focus: 'commercial-applications',
            }),
          ],
        }),
      ],
      mergeFunction: ResultMerger.createMergeFunction({
        strategy: 'weighted',
        weights: { academic: 1.5, commercial: 1.0 },
        conflictResolution: 'mostConfident',
      }),
    }),
    summarize({ maxLength: 1000 }),
  ],
});
```

--------------------------------

### Provide Default LLM for Research in TypeScript

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

Illustrates how to initialize the `research` function with a default Language Model (LLM) using the `@ai-sdk/openai` package in TypeScript. This is useful when a specific LLM needs to be consistently applied.

```typescript
import { openai } from '@ai-sdk/openai';

research({
  query: "Your query",
  defaultLLM: openai('gpt-4o'),
  outputSchema: schema
});
```

--------------------------------

### Basic Research Pipeline with Zod and OpenAI

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This TypeScript example demonstrates a basic research pipeline using @plust/datasleuth. It defines an output schema with Zod for data validation and utilizes the OpenAI LLM via the Vercel AI SDK for processing the research query.

```typescript
import {
  research
} from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';

// Define the structure of your research results
const outputSchema = z.object({
  summary: z.string(),
  keyFindings: z.array(z.string()),
  sources: z.array(z.string().url()),
});

// Execute research
const results = await research({
  query: 'Latest advancements in quantum computing',
  outputSchema,
  defaultLLM: openai('gpt-4o'),
});

console.log(results);
```

--------------------------------

### Factory Function Pattern for Steps (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md

Example of implementing the factory function pattern to create reusable steps within the datasleuth research pipeline. This pattern helps in abstracting step creation and configuration.

```typescript
export function myStep(options: MyStepOptions = {}): ReturnType<typeof createStep> {
  return createStep('MyStep', executeMyStep, options);
}
```

--------------------------------

### Immutable State Transformation Example (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md

Demonstrates the principle of immutable state transformation, a key coding pattern in datasleuth. Instead of modifying existing state, new state objects are created with the updated values.

```typescript
function executeStep(state: ResearchState, options: StepOptions): Promise<ResearchState> {
  return {
    ...state,
    data: {
      ...state.data,
      newData: processedResult
    }
  };
}
```

--------------------------------

### Handle DataSleuth Research Errors

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This example demonstrates how to handle specific DataSleuth research errors, such as ConfigurationError, ValidationError, LLMError, and others. It uses a try-catch block to capture BaseResearchError and logs detailed information about the error.

```typescript
import { research, BaseResearchError } from '@plust/datasleuth';
import { z } from 'zod';

try {
  const results = await research({
    query: 'Quantum computing applications',
    outputSchema: z.object({
      /*...*/
    }),
  });
} catch (error) {
  if (error instanceof BaseResearchError) {
    console.error(`Research error: ${error.message}`);
    console.error(`Details: ${JSON.stringify(error.details)}`);
    console.error(`Suggestions: ${error.suggestions.join('\n')}`);
  } else {
    console.error(`Unexpected error: ${error}`);
  }
}
```

--------------------------------

### Pipeline Configuration and Error Handling in DataSleuth

Source: https://context7.com/plustorg/datasleuth/llms.txt

This TypeScript snippet illustrates how to configure research pipelines in DataSleuth, including setting up retries, timeouts, and advanced error handling strategies. It showcases the use of various error types like ConfigurationError, ValidationError, LLMError, and SearchError for graceful failure management. The example demonstrates a try-catch block to handle potential exceptions during the research process and log informative messages based on the error type.

```typescript
import {
  research,
  plan,
  searchWeb,
  extractContent
} from '@plust/datasleuth';
import {
  BaseResearchError,
  ConfigurationError,
  ValidationError,
  LLMError,
  SearchError
} from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';

const outputSchema = z.object({
  summary: z.string(),
  findings: z.array(z.string()),
  metadata: z.object({
    totalSources: z.number(),
    processingTime: z.number()
  })
});

try {
  const results = await research({
    query: 'Blockchain applications in supply chain management',
    outputSchema,
    steps: [
      plan({
        llm: openai('gpt-4o'),
        retry: { maxRetries: 3, baseDelay: 1000 }
      }),
      searchWeb({
        maxRetries: 5,
        requireResults: true
      }),
      extractContent()
    ],
    config: {
      // Continue executing steps even if one fails
      errorHandling: 'continue',
      continueOnError: true,

      // Retry configuration
      maxRetries: 3,
      retryDelay: 2000,
      backoffFactor: 2,

      // Timeout for entire pipeline
      timeout: 240000, // 4 minutes

      // Logging level
      logLevel: 'debug'
    },
    defaultSearchProvider: google.configure({
      apiKey: process.env.GOOGLE_API_KEY
    })
  });

  console.log('Research completed successfully');
  console.log(`Processed in ${results.metadata.processingTime}ms`);

} catch (error) {
  if (error instanceof ConfigurationError) {
    console.error('Configuration issue:', error.message);
    console.error('Suggestions:', error.suggestions);
  } else if (error instanceof ValidationError) {
    console.error('Validation failed:', error.message);
    console.error('Details:', error.details);
  } else if (error instanceof LLMError) {
    console.error('LLM error:', error.message);
    if (error.retry) {
      console.log('Error is retryable, implementing exponential backoff...');
    }
  } else if (error instanceof SearchError) {
    console.error('Search failed:', error.message);
    console.error('Step:', error.step);
  } else if (error instanceof BaseResearchError) {
    console.error('Research error:', error.getFormattedMessage());
    console.error('Code:', error.code);
  } else {
    console.error('Unexpected error:', error);
  }
}

```

--------------------------------

### Basic Jest Test Structure (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md

A fundamental example of how to structure tests using the Jest testing framework. It includes common test cases like successful execution, handling empty input, and error scenarios.

```typescript
describe('myFunction', () => {
  it('should process valid input correctly', () => {
    // Test implementation
  });
  
  it('should handle empty input gracefully', () => {
    // Test implementation
  });
  
  it('should throw appropriate error for invalid input', () => {
    // Test implementation
  });
});
```

--------------------------------

### Specify LLM per Step in TypeScript

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

Shows how to assign a specific Language Model (LLM) to individual steps within a research process using TypeScript. This allows for granular control over LLM usage for different parts of a workflow.

```typescript
steps: [
  plan({ llm: openai('gpt-4o') }),
  // other steps...
]
```

--------------------------------

### Execute AI Research with Zod Validation

Source: https://context7.com/plustorg/datasleuth/llms.txt

Demonstrates how to perform AI-powered research using @plust/datasleuth. It shows defining an output schema with Zod, configuring a search provider (e.g., Google Search), and executing the research query. The results are then logged, showcasing the structured output. Includes basic error handling setup.

```typescript
import { research } from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';
import { google } from '@plust/search-sdk';

// Define output structure with Zod schema
const outputSchema = z.object({
  summary: z.string(),
  keyFindings: z.array(z.string()),
  threats: z.array(z.string()),
  opportunities: z.array(z.string()),
  sources: z.array(z.object({
    url: z.string().url(),
    title: z.string(),
    reliability: z.number().min(0).max(1)
  }))
});

// Configure search provider
const searchProvider = google.configure({
  apiKey: process.env.GOOGLE_API_KEY,
  cx: process.env.GOOGLE_CX
});

// Execute research with default pipeline
const results = await research({
  query: 'Latest advancements in quantum computing',
  outputSchema,
  defaultLLM: openai('gpt-4o'),
  defaultSearchProvider: searchProvider
});

console.log(results.summary);
console.log(`Found ${results.keyFindings.length} key findings`);
console.log(`Analyzed ${results.sources.length} sources`);

// Error handling
try {
  const results = await research({
    query: 'Renewable energy storage technologies',
    outputSchema,
    defaultLLM: openai('gpt-4o'),
    defaultSearchProvider: searchProvider,
    config: {
      errorHandling: 'continue',
      timeout: 120000,
      maxRetries: 3
    }
  });
} catch (error) {
  if (error instanceof BaseResearchError) {
    console.error(`Research failed: ${error.message}`);
    console.error(`Suggestions: ${error.suggestions.join(', ')}`);
  }
}
```

--------------------------------

### Handle ConfigurationError in TypeScript

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This snippet demonstrates how to catch and log specific details from a ConfigurationError in TypeScript. It assumes the `research` function can throw this error and provides access to message, details, and suggestions properties for debugging.

```typescript
try {
  const results = await research({ /*...*/ });
} catch (error) {
  if (error instanceof ConfigurationError) {
    console.error(`Configuration error: ${error.message}`);
    console.error(`Details: ${JSON.stringify(error.details)}`);
    console.error(`Fix suggestions: ${error.suggestions.join('\n')}`);
  }
}
```

--------------------------------

### Custom Validation with Zod `.refine()` (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This example illustrates how to add custom validation logic to a Zod schema using the `.refine()` method. It's particularly useful for validating nested objects or arrays, such as ensuring an array is not empty.

```typescript
const schema = z.object({
  items: z.array(z.string()).refine(
    (items) => items.length > 0,
    { message: "Items array cannot be empty" }
  )
});
```

--------------------------------

### Pipeline Step: searchWeb

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Searches the web using configured search providers. This is a fundamental step for gathering information from the internet.

```APIDOC
## searchWeb(options)

### Description
Searches the web using configured search providers.

### Method
`searchWeb`

### Parameters
#### Parameters
- **provider** (SearchProvider) - Required - Configured search provider.
- **maxResults** (number) - Optional - Maximum results to return.
- **language** (string) - Optional - Language code (e.g., 'en').
- **region** (string) - Optional - Region code (e.g., 'US').
- **safeSearch** ('off' | 'moderate' | 'strict') - Optional - Safe search setting.
- **useQueriesFromPlan** (boolean) - Optional - Use queries from research plan.

### Response
#### Success Response (200)
- **ResearchStep** - The web search results.
```

--------------------------------

### Execute Research with Agent Orchestration in TypeScript

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Demonstrates how to configure search providers (Google, SerpApi) and utilize the `research` function with `orchestrate` to dynamically execute research steps. It includes defining a detailed output schema and custom prompt for the AI agent. Dependencies include `@plust/datasleuth`, `@plust/search-sdk`, `zod`, and `@ai-sdk/openai`.

```typescript
import {
  research,
  orchestrate,
  searchWeb,
  extractContent,
  analyze,
  transform,
} from '@plust/datasleuth';
import { z } from 'zod';
import { google, serpapi } from '@plust/search-sdk';
import { openai } from '@ai-sdk/openai';

// Configure search providers
const webSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY });
const academicSearch = serpapi.configure({
  apiKey: process.env.SERPAPI_KEY,
  engine: 'google_scholar',
});

// Execute research with orchestration
const results = await research({
  query: 'Emerging technologies in renewable energy storage',
  outputSchema: z.object({
    marketOverview: z.string(),
    technologies: z.array(
      z.object({
        name: z.string(),
        maturityLevel: z.enum(['research', 'emerging', 'growth', 'mature']),
        costEfficiency: z.number().min(1).max(10),
        scalabilityPotential: z.number().min(1).max(10),
        keyPlayers: z.array(z.string()),
      })
    ),
    forecast: z.object({
      shortTerm: z.string(),
      mediumTerm: z.string(),
      longTerm: z.string(),
    }),
    sources: z.array(
      z.object({
        url: z.string().url(),
        type: z.enum(['academic', 'news', 'company', 'government']),
        relevance: z.number().min(0).max(1),
      })
    ),
  }),
  steps: [
    orchestrate({
      llm: openai('gpt-4o'),
      tools: {
        searchWeb: searchWeb({ provider: webSearch }),
        searchAcademic: searchWeb({ provider: academicSearch }),
        extractContent: extractContent(),
        analyze: analyze(),
        // Add your custom tools here
      },
      customPrompt: `
        You are conducting market research on emerging renewable energy storage technologies.
        Your goal is to build a comprehensive market overview with technical assessment.
      `,
      maxIterations: 15,
      exitCriteria: (state) =>
        state.metadata.confidenceScore > 0.85 &&
        state.data.dataPoints?.length > 20,
    }),
  ],
});

```

--------------------------------

### Options Pattern for Function Configuration (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md

Illustrates the options pattern, where functions accept an optional configuration object with default values. This provides flexibility in customizing function behavior without overly long parameter lists.

```typescript
interface MyOptions {
  param1?: string;
  param2?: number;
}

function myFunction(options: MyOptions = {}) {
  const { param1 = 'default', param2 = 42 } = options;
  // Implementation
}
```

--------------------------------

### Configure Verbose Logging in Research Function (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This snippet shows how to enable verbose logging for a `research` function by setting the `logLevel` to 'debug' within its configuration. This is essential for gathering more detailed information during debugging.

```typescript
research({
  query: "Your query",
  outputSchema: schema,
  config: {
    logLevel: 'debug' // 'error', 'warn', 'info', 'debug', 'trace'
  }
});
```

--------------------------------

### Build Custom Research Workflows with DataSleuth Pipelines

Source: https://context7.com/plustorg/datasleuth/llms.txt

This code demonstrates building custom research workflows with specific pipeline steps using the DataSleuth library. It shows how to chain multiple research actions like planning, searching, extracting, fact-checking, analyzing, and summarizing. Dependencies include '@plust/datasleuth', 'zod', '@ai-sdk/openai', '@ai-sdk/anthropic', and '@plust/search-sdk'.

```typescript
import {
  research,
  plan,
  searchWeb,
  extractContent,
  factCheck,
  analyze,
  summarize,
  evaluate,
  repeatUntil
} from '@plust/datasleuth';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@plust/search-sdk';

const outputSchema = z.object({
  executiveSummary: z.string(),
  marketAnalysis: z.object({
    size: z.string(),
    growth: z.string(),
    trends: z.array(z.string())
  }),
  competitiveLandscape: z.array(z.object({
    company: z.string(),
    marketShare: z.string(),
    strengths: z.array(z.string())
  })),
  recommendations: z.array(z.string()),
  dataQuality: z.number().min(0).max(1)
});

const searchProvider = google.configure({
  apiKey: process.env.GOOGLE_API_KEY
});

const results = await research({
  query: 'Electric vehicle battery market analysis',
  outputSchema,
  steps: [
    // Step 1: Generate research plan
    plan({
      llm: openai('gpt-4o'),
      temperature: 0.4,
      includeInResults: false
    }),

    // Step 2: Initial web search
    searchWeb({
      provider: searchProvider,
      maxResults: 15,
      useQueriesFromPlan: true
    }),

    // Step 3: Extract content from top results
    extractContent({
      maxUrls: 10,
      maxContentLength: 8000,
      selectors: 'article, .content, main'
    }),

    // Step 4: Repeat search until we have enough sources
    repeatUntil(
      evaluate({
        criteriaFn: (state) => state.data.searchResults.length >= 20
      }),
      [
        searchWeb({ provider: searchProvider }),
        extractContent()
      ],
      { maxIterations: 3 }
    ),

    // Step 5: Fact-check extracted information
    factCheck({
      llm: openai('gpt-4o'),
      threshold: 0.75,
      includeEvidence: true
    }),

    // Step 6: Perform deep analysis
    analyze({
      llm: anthropic('claude-3-opus-20240229'),
      focus: 'market-dynamics',
      depth: 'comprehensive'
    }),

    // Step 7: Synthesize findings
    summarize({
      llm: anthropic('claude-3-sonnet-20240229'),
      format: 'structured',
      maxLength: 3000,
      includeCitations: true
    })
  ],
  config: {
    errorHandling: 'continue',
    timeout: 180000
  }
});

console.log(results.executiveSummary);
console.log(`Market analysis confidence: ${results.dataQuality}`);

```

--------------------------------

### Configure Custom Research Pipeline with Datasleuth

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

This snippet demonstrates how to configure a custom research pipeline using Datasleuth. It involves defining specific steps like planning, web searching, content extraction, and evaluation, with options for error handling and timeouts. It requires the '@plust/datasleuth', '@plust/search-sdk', and '@ai-sdk/openai' packages, along with 'zod' for schema definition.

```typescript
import {
  research,
  plan,
  searchWeb,
  extractContent,
  evaluate,
  repeatUntil,
} from '@plust/datasleuth';
import { z } from 'zod';
import { google } from '@plust/search-sdk';
import { openai } from '@ai-sdk/openai';

// Configure a search provider
const googleSearch = google.configure({
  apiKey: process.env.GOOGLE_API_KEY,
  cx: process.env.GOOGLE_CX,
});

// Define complex output schema
const outputSchema = z.object({
  summary: z.string(),
  threats: z.array(z.string()),
  opportunities: z.array(z.string()),
  timeline: z.array(
    z.object({
      year: z.number(),
      event: z.string(),
    })
  ),
  sources: z.array(
    z.object({
      url: z.string().url(),
      reliability: z.number().min(0).max(1),
    })
  ),
});

// Execute research with custom pipeline steps
const results = await research({
  query: 'Impact of climate change on agriculture',
  outputSchema,
  steps: [
    plan({ llm: openai('gpt-4o') }),
    searchWeb({ provider: googleSearch, maxResults: 10 }),
    extractContent({ selector: 'article, .content, main' }),
    repeatUntil(evaluate({ criteriaFn: (data) => data.sources.length > 15 }), [
      searchWeb({ provider: googleSearch }),
      extractContent(),
    ]),
  ],
  config: {
    errorHandling: 'continue',
    timeout: 60000, // 1 minute
  },
});
```

--------------------------------

### Process Large Content in Chunks (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This TypeScript code demonstrates a strategy for processing large content by breaking it down into smaller chunks of a specified size. This approach helps mitigate memory issues when dealing with extensive documents.

```typescript
// Process content in 10KB chunks
const chunkSize = 10 * 1024;
for (let i = 0; i < content.length; i += chunkSize) {
  const chunk = content.substring(i, i + chunkSize);
  // Process chunk...
}
```

--------------------------------

### Pipeline Step: plan

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Creates a research plan using LLMs. This step can be used to generate a structured plan before executing other research steps.

```APIDOC
## plan(options?)

### Description
Creates a research plan using LLMs.

### Method
`plan`

### Parameters
#### Parameters
- **llm** (LanguageModel) - Optional - LLM model to use (falls back to defaultLLM).
- **customPrompt** (string) - Optional - Custom system prompt.
- **temperature** (number) - Optional - LLM temperature (0.0-1.0).
- **includeInResults** (boolean) - Optional - Whether to include plan in results.

### Response
#### Success Response (200)
- **ResearchStep** - The generated research plan.
```

--------------------------------

### Pipeline Step: summarize

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Synthesizes information into concise summaries. This step is useful for condensing large amounts of data into digestible formats.

```APIDOC
## summarize(options?)

### Description
Synthesizes information into concise summaries.

### Method
`summarize`

### Parameters
#### Parameters
- **llm** (LanguageModel) - Optional - LLM model to use.
- **maxLength** (number) - Optional - Maximum summary length.
- **format** ('paragraph' | 'bullet' | 'structured') - Optional - Summary format.
- **includeInResults** (boolean) - Optional - Whether to include summary in results.

### Response
#### Success Response (200)
- **ResearchStep** - The summary of the information.
```

--------------------------------

### Configure Research Function to Continue on Error (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This code snippet shows how to configure the `research` function to continue executing even when errors occur. By setting `errorHandling` to 'continue' and `continueOnError` to `true`, the pipeline can proceed past failed steps.

```typescript
research({
  query: "Your query",
  outputSchema: schema,
  config: {
    errorHandling: 'continue',
    continueOnError: true
  }
});
```

--------------------------------

### TypeScript Interface for the searchWeb Pipeline Step

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Defines the TypeScript interface for the `searchWeb` pipeline step, used for searching the web with configured providers. Parameters include `provider`, `maxResults`, `language`, `region`, `safeSearch`, and `useQueriesFromPlan`.

```typescript
searchWeb({
  provider: SearchProvider;   // Configured search provider
  maxResults?: number;        // Maximum results to return
  language?: string;          // Language code (e.g., 'en')
  region?: string;            // Region code (e.g., 'US')
  safeSearch?: 'off' | 'moderate' | 'strict';
  useQueriesFromPlan?: boolean; // Use queries from research plan
}): ResearchStep

```

--------------------------------

### Add Custom Evaluation Step to Inspect Pipeline State (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This TypeScript code illustrates how to add a custom evaluation step within a research pipeline. The `evaluate` function logs the current state of the pipeline, allowing for inspection of intermediate data during debugging.

```typescript
steps: [
  // other steps...
  evaluate({
    criteriaFn: (state) => {
      console.log('Current state:', JSON.stringify(state.data, null, 2));
      return true; // Always continue
    }
  })
]
```

--------------------------------

### TypeScript Interface for the plan Pipeline Step

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Defines the TypeScript interface for the `plan` pipeline step, which uses LLMs to create a research plan. It accepts optional parameters like `llm`, `customPrompt`, `temperature`, and `includeInResults`.

```typescript
plan({
  llm?: LanguageModel;        // LLM model to use (falls back to defaultLLM)
  customPrompt?: string;      // Custom system prompt
  temperature?: number;       // LLM temperature (0.0-1.0)
  includeInResults?: boolean; // Whether to include plan in results
}): ResearchStep

```

--------------------------------

### Handle Missing Required Fields in Zod Schema (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This snippet demonstrates how to make fields optional or provide default values in a Zod schema to handle missing required fields during validation. It's useful for ensuring the research pipeline generates all necessary data.

```typescript
const schema = z.object({
  summary: z.string(),
  findings: z.array(z.string()).optional(), // Make optional
  // or provide default value:
  sources: z.array(z.string().url()).default([])
});
```

--------------------------------

### Pipeline Step: extractContent

Source: https://github.com/plustorg/datasleuth/blob/main/README.md

Extracts content from web pages. This step is useful for parsing and retrieving specific information from URLs.

```APIDOC
## extractContent(options?)

### Description
Extracts content from web pages.

### Method
`extractContent`

### Parameters
#### Parameters
- **selectors** (string) - Optional - CSS selectors for content.
- **maxUrls** (number) - Optional - Maximum URLs to process.
- **maxContentLength** (number) - Optional - Maximum content length per URL.
- **includeInResults** (boolean) - Optional - Whether to include content in results.

### Response
#### Success Response (200)
- **ResearchStep** - The extracted content.
```

--------------------------------

### Transform Data Types in Zod Schema (TypeScript)

Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md

This code shows how to use `z.preprocess` to transform input data types, such as converting string numbers to actual numbers, within a Zod schema. It also illustrates using `z.union` for fields that can accept multiple data types.

```typescript
// Transform string numbers to actual numbers
const schema = z.object({
  value: z.preprocess(
    (val) => typeof val === 'string' ? Number(val) : val,
    z.number()
  )
});

// Use z.union() for fields that might have multiple types:
const schemaWithUnion = z.object({
  date: z.union([z.string(), z.date()])
});
```

--------------------------------

### Web Search with Multiple Providers using Datasleuth

Source: https://context7.com/plustorg/datasleuth/llms.txt

Demonstrates configuring and utilizing multiple search providers (Google, Bing, SerpAPI) within a single research operation. It showcases how to define custom output schemas using Zod for structured results. This approach allows for targeted searches across different engines and types (web, academic) for comprehensive information gathering.

```typescript
import { research, searchWeb, extractContent } from '@plust/datasleuth';
import { google, bing, serpapi } from '@plust/search-sdk';
import { z } from 'zod';

// Configure multiple search providers
const googleSearch = google.configure({
  apiKey: process.env.GOOGLE_API_KEY,
  cx: process.env.GOOGLE_CX
});

const bingSearch = bing.configure({
  apiKey: process.env.BING_API_KEY
});

const academicSearch = serpapi.configure({
  apiKey: process.env.SERPAPI_KEY,
  engine: 'google_scholar'
});

const outputSchema = z.object({
  findings: z.array(z.string()),
  sources: z.array(z.object({
    url: z.string().url(),
    title: z.string(),
    type: z.enum(['web', 'academic', 'news'])
  }))
});

// Use different providers for different search types
const results = await research({
  query: 'Machine learning applications in drug discovery',
  outputSchema,
  steps: [
    // General web search
    searchWeb({
      provider: googleSearch,
      maxResults: 20,
      language: 'en',
      region: 'US',
      safeSearch: 'moderate',
      includeInResults: false
    }),

    // Academic paper search
    searchWeb({
      provider: academicSearch,
      query: 'machine learning drug discovery peer reviewed',
      maxResults: 10
    }),

    // News search
    searchWeb({
      provider: bingSearch,
      query: 'recent breakthroughs ML pharmaceutical research',
      maxResults: 10
    }),

    extractContent({
      maxUrls: 15,
      maxContentLength: 10000
    })
  ]
});

console.log(`Found ${results.sources.length} total sources`);

```