### Install @plust/datasleuth with npm or yarn Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md Instructions for installing the @plust/datasleuth package using either npm or yarn package managers. This is the first step to using the library. ```bash npm install @plust/datasleuth ``` ```bash yarn add @plust/datasleuth ``` -------------------------------- ### Quick Start Research Example with @plust/datasleuth Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md A minimal TypeScript example demonstrating how to perform a research query using @plust/datasleuth. It shows schema definition, search provider configuration (Google), and execution of the research function with default settings. It includes error handling for the research process. ```typescript import { research } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; import { google } from '@plust/search-sdk'; // Define your output schema const outputSchema = z.object({ summary: z.string(), keyFindings: z.array(z.string()), sources: z.array(z.string().url()), }); // Configure Google search provider const searchProvider = google.configure({ apiKey: process.env.GOOGLE_API_KEY, cx: process.env.GOOGLE_CX, // Your search engine ID }); // Execute research with the default pipeline async function runResearch() { try { const results = await research({ query: 'Latest advancements in quantum computing', outputSchema, defaultLLM: openai('gpt-4o'), defaultSearchProvider: searchProvider, }); console.log('Research Summary:', results.summary); console.log('Key Findings:', results.keyFindings); console.log('Sources:', results.sources); } catch (error) { console.error('Research failed:', error); } } runResearch(); ``` -------------------------------- ### Configure Default Search Provider for Research Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md TypeScript example showing how to configure and use a default search provider (Google) within the @plust/datasleuth research function. This simplifies research by applying the same provider to all search steps unless overridden. ```typescript import { research } from '@plust/datasleuth'; import { z } from 'zod'; import { google } from '@plust/search-sdk'; import { openai } from '@ai-sdk/openai'; // Configure Google search const searchProvider = google.configure({ apiKey: process.env.GOOGLE_API_KEY, cx: process.env.GOOGLE_CX }); // Execute research with default steps const results = await research({ query: "Renewable energy trends 2025", outputSchema: z.object({...}), // Placeholder for actual schema defaultLLM: openai('gpt-4o'), defaultSearchProvider: searchProvider }); ``` -------------------------------- ### Specify Search Providers in Individual Research Steps Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md TypeScript example demonstrating how to specify different search providers (Google and Bing) for individual `searchWeb` steps within a @plust/datasleuth research pipeline. This offers granular control over search sources for different stages of research. ```typescript import { research, searchWeb, extractContent } from '@plust/datasleuth'; import { z } from 'zod'; import { google, bing } from '@plust/search-sdk'; import { openai } from '@ai-sdk/openai'; // Configure search providers const googleSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY, cx: process.env.GOOGLE_CX }); const bingSearch = bing.configure({ apiKey: process.env.BING_API_KEY }); // Execute research with custom steps const results = await research({ query: "Renewable energy trends 2025", outputSchema: z.object({...}), // Placeholder for actual schema defaultLLM: openai('gpt-4o'), steps: [ searchWeb({ provider: googleSearch, maxResults: 10 }), extractContent(), searchWeb({ provider: bingSearch, query: "renewable energy innovations" }), extractContent() // Other steps... ] }); ``` -------------------------------- ### Install @plust/datasleuth Package Source: https://github.com/plustorg/datasleuth/blob/main/README.md This command installs the @plust/datasleuth package using npm. Ensure you have Node.js and npm installed on your system. ```bash npm install @plust/datasleuth ``` -------------------------------- ### Isolate Research Steps for Testing (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This example demonstrates how to isolate and test individual steps within a `research` pipeline. By explicitly providing a `steps` array containing only the desired step, developers can debug specific functionalities. ```typescript // Test just the search step const result = await research({ query: "Your query", outputSchema: schema, steps: [searchWeb({ provider: googleSearch })] }); ``` -------------------------------- ### AI Orchestration for Market Research Source: https://context7.com/plustorg/datasleuth/llms.txt This example showcases how to configure and use the `research` function with AI orchestration for comprehensive market research. It defines custom prompts, tools, and exit criteria for an AI agent. ```APIDOC ## AI Orchestration Example ### Description This example demonstrates how to leverage AI agents to dynamically select and execute research tools for market analysis. It includes setting up a search provider, defining a custom prompt for market research, and configuring various tools like web search, content extraction, analysis, fact-checking, and summarization. ### Method N/A (This is a client-side code example) ### Endpoint N/A (This is a client-side code example) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```typescript import { research, orchestrate, searchWeb, extractContent, analyze, factCheck, summarize } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; import { google } from '@plust/search-sdk'; const outputSchema = z.object({ marketOverview: z.string(), technologies: z.array(z.object({ name: z.string(), maturityLevel: z.enum(['research', 'emerging', 'growth', 'mature']), marketPotential: z.number().min(1).max(10), keyPlayers: z.array(z.string()) })), investmentOpportunities: z.array(z.string()), risks: z.array(z.string()), confidenceScore: z.number().min(0).max(1) }); const searchProvider = google.configure({ apiKey: process.env.GOOGLE_API_KEY }); // Custom orchestration prompt for market research const marketResearchPrompt = ` You are conducting comprehensive market research on emerging technologies. Your goal is to: 1. Gather information from multiple sources 2. Analyze technology maturity and market potential 3. Identify key players and investment opportunities 4. Assess risks and challenges 5. Validate findings with fact-checking Choose tools strategically to build a complete market analysis. `; const results = await research({ query: 'Green hydrogen production and storage technologies', outputSchema, steps: [ orchestrate({ model: openai('gpt-4o'), searchProvider, customPrompt: marketResearchPrompt, maxIterations: 20, continueOnError: true, includeInResults: true, // Custom tools available to the agent tools: { searchWeb: searchWeb({ provider: searchProvider, maxResults: 15 }), extractContent: extractContent({ maxUrls: 12 }), analyze: analyze({ llm: openai('gpt-4o'), focus: 'market-analysis' }), factCheck: factCheck({ llm: openai('gpt-4o'), threshold: 0.8 }), summarize: summarize({ llm: openai('gpt-4o'), format: 'structured' }) }, // Exit when we have high confidence and sufficient data exitCriteria: (state) => { const hasEnoughData = state.data.searchResults?.length >= 20 && state.data.extractedContent?.length >= 10 && state.data.analysis !== undefined; const hasHighConfidence = state.metadata.confidenceScore >= 0.85; return hasEnoughData && hasHighConfidence; }, retry: { maxRetries: 3, baseDelay: 2000 } }) ] }); console.log('Market Overview:', results.marketOverview); console.log(`Analyzed ${results.technologies.length} technologies`); console.log(`Identified ${results.investmentOpportunities.length} opportunities`); console.log(`Confidence score: ${results.confidenceScore}`); ``` ### Response #### Success Response (200) - **marketOverview** (string) - A summary of the market analysis. - **technologies** (array) - An array of analyzed technologies with their details. - **name** (string) - The name of the technology. - **maturityLevel** (enum) - The maturity level of the technology ('research', 'emerging', 'growth', 'mature'). - **marketPotential** (number) - The market potential score (1-10). - **keyPlayers** (array) - A list of key players in the technology's market. - **investmentOpportunities** (array) - A list of identified investment opportunities. - **risks** (array) - A list of identified risks and challenges. - **confidenceScore** (number) - A score indicating the confidence in the analysis (0-1). #### Response Example ```json { "marketOverview": "Green hydrogen production and storage is a rapidly growing sector driven by the need for sustainable energy solutions...", "technologies": [ { "name": "Electrolyzer Technology", "maturityLevel": "growth", "marketPotential": 9, "keyPlayers": ["Plug Power", "Cummins", "Siemens Energy"] }, { "name": "Advanced Battery Storage", "maturityLevel": "mature", "marketPotential": 7, "keyPlayers": ["Tesla", "LG Energy Solution", "Panasonic"] } ], "investmentOpportunities": [ "Investing in green hydrogen production facilities.", "Developing advanced electrolyzer components." ], "risks": [ "High initial production costs.", "Grid integration challenges." ], "confidenceScore": 0.88 } ``` ``` -------------------------------- ### Clone Repository and Install Dependencies (Bash) Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md Commands to clone the datasleuth repository from GitHub and install project dependencies using npm. Ensures the local development environment is set up correctly. ```bash git clone https://github.com/YOUR_USERNAME/datasleuth.git cd datasleuth npm install ``` -------------------------------- ### Load Environment Variables with dotenv Source: https://github.com/plustorg/datasleuth/blob/main/docs/getting-started.md TypeScript code snippet demonstrating how to load environment variables from a `.env` file using the `dotenv` package. This is crucial for securely managing API keys and other configuration settings required by @plust/datasleuth and its associated SDKs. ```typescript import dotenv from 'dotenv'; dotenv.config(); // Now process.env.OPENAI_API_KEY, etc. are available ``` -------------------------------- ### Basic Research Example Source: https://github.com/plustorg/datasleuth/blob/main/README.md This TypeScript snippet illustrates how to perform basic research using the @plust/datasleuth library. It sets up a Zod schema for structuring the output and integrates with OpenAI through the Vercel AI SDK. ```typescript import { research } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; // Define your output schema const outputSchema = z.object({ summary: z.string(), keyFindings: z.array(z.string()), sources: z.array(z.string().url()), }); // Execute research with default pipeline const results = await research({ query: 'Latest advancements in quantum computing', outputSchema, defaultLLM: openai('gpt-4o'), }); ``` -------------------------------- ### Set Maximum Content Length (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This example shows how to configure a function, likely related to content extraction, to accept a `maxContentLength` option. This helps in managing memory usage by limiting the amount of content processed at once. ```typescript extractContent({ maxContentLength: 50000 }) ``` -------------------------------- ### Integrate Multiple LLMs with Datasleuth and Vercel AI SDK Source: https://github.com/plustorg/datasleuth/blob/main/README.md This example showcases how to integrate various Large Language Models (LLMs) using the Vercel AI SDK with Datasleuth for research. It allows specifying different LLM providers like OpenAI and Anthropic for distinct research steps such as planning, analysis, fact-checking, and summarization. Dependencies include '@plust/datasleuth', '@ai-sdk/openai', '@ai-sdk/anthropic', and 'zod'. ```typescript import { research, plan, analyze, factCheck, summarize, } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; import { anthropic } from '@ai-sdk/anthropic'; // Define your output schema const outputSchema = z.object({ summary: z.string(), analysis: z.object({ insights: z.array(z.string()), }), factChecks: z.array( z.object({ statement: z.string(), isValid: z.boolean(), }) ), }); // Use different LLM providers for different steps const results = await research({ query: 'Advancements in gene editing technologies', outputSchema, steps: [ // Use OpenAI for research planning plan({ llm: openai('gpt-4o'), temperature: 0.4, }), // Use Anthropic for specialized analysis analyze({ llm: anthropic('claude-3-opus-20240229'), focus: 'ethical-considerations', depth: 'comprehensive', }), // Use OpenAI for fact checking factCheck({ llm: openai('gpt-4o'), threshold: 0.8, includeEvidence: true, }), // Use Anthropic for final summarization summarize({ llm: anthropic('claude-3-sonnet-20240229'), format: 'structured', maxLength: 2000, }), ], }); ``` -------------------------------- ### Generate Research Plans with AI using DataSleuth Source: https://context7.com/plustorg/datasleuth/llms.txt This example shows how to generate structured research plans with AI-defined objectives and search queries using the DataSleuth library. It utilizes the 'plan' step to define a custom prompt for the AI, focusing on specific research areas. Dependencies include '@plust/datasleuth', 'zod', '@ai-sdk/openai', and '@plust/search-sdk'. ```typescript import { research, plan, searchWeb, extractContent } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; const customPlanningPrompt = ` You are a strategic research planning assistant specializing in technology trends. Create a comprehensive research plan focusing on: 1. Market size and growth projections 2. Key technological innovations 3. Major industry players and their strategies 4. Regulatory landscape 5. Future outlook and predictions `; const outputSchema = z.object({ findings: z.array(z.string()), sources: z.array(z.string().url()) }); const results = await research({ query: 'Artificial intelligence in healthcare diagnostics', outputSchema, steps: [ plan({ llm: openai('gpt-4o'), customPrompt: customPlanningPrompt, temperature: 0.3, includeInResults: true, retry: { maxRetries: 3, baseDelay: 1000 } }), searchWeb({ useQueriesFromPlan: true }), extractContent() ], defaultSearchProvider: google.configure({ apiKey: process.env.GOOGLE_API_KEY }) }); // Access the generated plan console.log('Research Objectives:', results.researchPlan.objectives); console.log('Search Queries:', results.researchPlan.searchQueries); console.log('Data Strategy:', results.researchPlan.dataGatheringStrategy); ``` -------------------------------- ### Execute Parallel Research Tracks with Datasleuth Source: https://github.com/plustorg/datasleuth/blob/main/README.md This example demonstrates how to run multiple research tracks concurrently using Datasleuth's `parallel` and `track` functionalities. It allows configuring different search providers (e.g., Google, Bing) and LLM analyses for each track. The results can be merged using a custom strategy like 'weighted' with specified conflict resolution. Dependencies include '@plust/datasleuth', '@plust/search-sdk', '@ai-sdk/openai', and 'zod'. ```typescript import { research, track, parallel, searchWeb, extractContent, analyze, ResultMerger, } from '@plust/datasleuth'; import { z } from 'zod'; import { google, bing } from '@plust/search-sdk'; import { openai } from '@ai-sdk/openai'; // Configure search providers const googleSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY }); const bingSearch = bing.configure({ apiKey: process.env.BING_API_KEY }); // Define your output schema const outputSchema = z.object({ summary: z.string(), findings: z.array( z.object({ topic: z.string(), details: z.string(), confidence: z.number(), }) ), sources: z.array(z.string().url()), }); // Execute parallel research tracks const results = await research({ query: 'Quantum computing applications in healthcare', outputSchema, steps: [ parallel({ tracks: [ track({ name: 'academic', steps: [ searchWeb({ provider: googleSearch, query: 'quantum computing healthcare scholarly articles', }), extractContent(), analyze({ llm: openai('gpt-4o'), focus: 'academic-research', }), ], }), track({ name: 'commercial', steps: [ searchWeb({ provider: bingSearch, query: 'quantum computing healthcare startups companies', }), extractContent(), analyze({ llm: openai('gpt-4o'), focus: 'commercial-applications', }), ], }), ], mergeFunction: ResultMerger.createMergeFunction({ strategy: 'weighted', weights: { academic: 1.5, commercial: 1.0 }, conflictResolution: 'mostConfident', }), }), summarize({ maxLength: 1000 }), ], }); ``` -------------------------------- ### Provide Default LLM for Research in TypeScript Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md Illustrates how to initialize the `research` function with a default Language Model (LLM) using the `@ai-sdk/openai` package in TypeScript. This is useful when a specific LLM needs to be consistently applied. ```typescript import { openai } from '@ai-sdk/openai'; research({ query: "Your query", defaultLLM: openai('gpt-4o'), outputSchema: schema }); ``` -------------------------------- ### Basic Research Pipeline with Zod and OpenAI Source: https://github.com/plustorg/datasleuth/blob/main/README.md This TypeScript example demonstrates a basic research pipeline using @plust/datasleuth. It defines an output schema with Zod for data validation and utilizes the OpenAI LLM via the Vercel AI SDK for processing the research query. ```typescript import { research } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; // Define the structure of your research results const outputSchema = z.object({ summary: z.string(), keyFindings: z.array(z.string()), sources: z.array(z.string().url()), }); // Execute research const results = await research({ query: 'Latest advancements in quantum computing', outputSchema, defaultLLM: openai('gpt-4o'), }); console.log(results); ``` -------------------------------- ### Factory Function Pattern for Steps (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md Example of implementing the factory function pattern to create reusable steps within the datasleuth research pipeline. This pattern helps in abstracting step creation and configuration. ```typescript export function myStep(options: MyStepOptions = {}): ReturnType { return createStep('MyStep', executeMyStep, options); } ``` -------------------------------- ### Immutable State Transformation Example (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md Demonstrates the principle of immutable state transformation, a key coding pattern in datasleuth. Instead of modifying existing state, new state objects are created with the updated values. ```typescript function executeStep(state: ResearchState, options: StepOptions): Promise { return { ...state, data: { ...state.data, newData: processedResult } }; } ``` -------------------------------- ### Handle DataSleuth Research Errors Source: https://github.com/plustorg/datasleuth/blob/main/README.md This example demonstrates how to handle specific DataSleuth research errors, such as ConfigurationError, ValidationError, LLMError, and others. It uses a try-catch block to capture BaseResearchError and logs detailed information about the error. ```typescript import { research, BaseResearchError } from '@plust/datasleuth'; import { z } from 'zod'; try { const results = await research({ query: 'Quantum computing applications', outputSchema: z.object({ /*...*/ }), }); } catch (error) { if (error instanceof BaseResearchError) { console.error(`Research error: ${error.message}`); console.error(`Details: ${JSON.stringify(error.details)}`); console.error(`Suggestions: ${error.suggestions.join('\n')}`); } else { console.error(`Unexpected error: ${error}`); } } ``` -------------------------------- ### Pipeline Configuration and Error Handling in DataSleuth Source: https://context7.com/plustorg/datasleuth/llms.txt This TypeScript snippet illustrates how to configure research pipelines in DataSleuth, including setting up retries, timeouts, and advanced error handling strategies. It showcases the use of various error types like ConfigurationError, ValidationError, LLMError, and SearchError for graceful failure management. The example demonstrates a try-catch block to handle potential exceptions during the research process and log informative messages based on the error type. ```typescript import { research, plan, searchWeb, extractContent } from '@plust/datasleuth'; import { BaseResearchError, ConfigurationError, ValidationError, LLMError, SearchError } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; const outputSchema = z.object({ summary: z.string(), findings: z.array(z.string()), metadata: z.object({ totalSources: z.number(), processingTime: z.number() }) }); try { const results = await research({ query: 'Blockchain applications in supply chain management', outputSchema, steps: [ plan({ llm: openai('gpt-4o'), retry: { maxRetries: 3, baseDelay: 1000 } }), searchWeb({ maxRetries: 5, requireResults: true }), extractContent() ], config: { // Continue executing steps even if one fails errorHandling: 'continue', continueOnError: true, // Retry configuration maxRetries: 3, retryDelay: 2000, backoffFactor: 2, // Timeout for entire pipeline timeout: 240000, // 4 minutes // Logging level logLevel: 'debug' }, defaultSearchProvider: google.configure({ apiKey: process.env.GOOGLE_API_KEY }) }); console.log('Research completed successfully'); console.log(`Processed in ${results.metadata.processingTime}ms`); } catch (error) { if (error instanceof ConfigurationError) { console.error('Configuration issue:', error.message); console.error('Suggestions:', error.suggestions); } else if (error instanceof ValidationError) { console.error('Validation failed:', error.message); console.error('Details:', error.details); } else if (error instanceof LLMError) { console.error('LLM error:', error.message); if (error.retry) { console.log('Error is retryable, implementing exponential backoff...'); } } else if (error instanceof SearchError) { console.error('Search failed:', error.message); console.error('Step:', error.step); } else if (error instanceof BaseResearchError) { console.error('Research error:', error.getFormattedMessage()); console.error('Code:', error.code); } else { console.error('Unexpected error:', error); } } ``` -------------------------------- ### Basic Jest Test Structure (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md A fundamental example of how to structure tests using the Jest testing framework. It includes common test cases like successful execution, handling empty input, and error scenarios. ```typescript describe('myFunction', () => { it('should process valid input correctly', () => { // Test implementation }); it('should handle empty input gracefully', () => { // Test implementation }); it('should throw appropriate error for invalid input', () => { // Test implementation }); }); ``` -------------------------------- ### Specify LLM per Step in TypeScript Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md Shows how to assign a specific Language Model (LLM) to individual steps within a research process using TypeScript. This allows for granular control over LLM usage for different parts of a workflow. ```typescript steps: [ plan({ llm: openai('gpt-4o') }), // other steps... ] ``` -------------------------------- ### Execute AI Research with Zod Validation Source: https://context7.com/plustorg/datasleuth/llms.txt Demonstrates how to perform AI-powered research using @plust/datasleuth. It shows defining an output schema with Zod, configuring a search provider (e.g., Google Search), and executing the research query. The results are then logged, showcasing the structured output. Includes basic error handling setup. ```typescript import { research } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; import { google } from '@plust/search-sdk'; // Define output structure with Zod schema const outputSchema = z.object({ summary: z.string(), keyFindings: z.array(z.string()), threats: z.array(z.string()), opportunities: z.array(z.string()), sources: z.array(z.object({ url: z.string().url(), title: z.string(), reliability: z.number().min(0).max(1) })) }); // Configure search provider const searchProvider = google.configure({ apiKey: process.env.GOOGLE_API_KEY, cx: process.env.GOOGLE_CX }); // Execute research with default pipeline const results = await research({ query: 'Latest advancements in quantum computing', outputSchema, defaultLLM: openai('gpt-4o'), defaultSearchProvider: searchProvider }); console.log(results.summary); console.log(`Found ${results.keyFindings.length} key findings`); console.log(`Analyzed ${results.sources.length} sources`); // Error handling try { const results = await research({ query: 'Renewable energy storage technologies', outputSchema, defaultLLM: openai('gpt-4o'), defaultSearchProvider: searchProvider, config: { errorHandling: 'continue', timeout: 120000, maxRetries: 3 } }); } catch (error) { if (error instanceof BaseResearchError) { console.error(`Research failed: ${error.message}`); console.error(`Suggestions: ${error.suggestions.join(', ')}`); } } ``` -------------------------------- ### Handle ConfigurationError in TypeScript Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This snippet demonstrates how to catch and log specific details from a ConfigurationError in TypeScript. It assumes the `research` function can throw this error and provides access to message, details, and suggestions properties for debugging. ```typescript try { const results = await research({ /*...*/ }); } catch (error) { if (error instanceof ConfigurationError) { console.error(`Configuration error: ${error.message}`); console.error(`Details: ${JSON.stringify(error.details)}`); console.error(`Fix suggestions: ${error.suggestions.join('\n')}`); } } ``` -------------------------------- ### Custom Validation with Zod `.refine()` (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This example illustrates how to add custom validation logic to a Zod schema using the `.refine()` method. It's particularly useful for validating nested objects or arrays, such as ensuring an array is not empty. ```typescript const schema = z.object({ items: z.array(z.string()).refine( (items) => items.length > 0, { message: "Items array cannot be empty" } ) }); ``` -------------------------------- ### Pipeline Step: searchWeb Source: https://github.com/plustorg/datasleuth/blob/main/README.md Searches the web using configured search providers. This is a fundamental step for gathering information from the internet. ```APIDOC ## searchWeb(options) ### Description Searches the web using configured search providers. ### Method `searchWeb` ### Parameters #### Parameters - **provider** (SearchProvider) - Required - Configured search provider. - **maxResults** (number) - Optional - Maximum results to return. - **language** (string) - Optional - Language code (e.g., 'en'). - **region** (string) - Optional - Region code (e.g., 'US'). - **safeSearch** ('off' | 'moderate' | 'strict') - Optional - Safe search setting. - **useQueriesFromPlan** (boolean) - Optional - Use queries from research plan. ### Response #### Success Response (200) - **ResearchStep** - The web search results. ``` -------------------------------- ### Execute Research with Agent Orchestration in TypeScript Source: https://github.com/plustorg/datasleuth/blob/main/README.md Demonstrates how to configure search providers (Google, SerpApi) and utilize the `research` function with `orchestrate` to dynamically execute research steps. It includes defining a detailed output schema and custom prompt for the AI agent. Dependencies include `@plust/datasleuth`, `@plust/search-sdk`, `zod`, and `@ai-sdk/openai`. ```typescript import { research, orchestrate, searchWeb, extractContent, analyze, transform, } from '@plust/datasleuth'; import { z } from 'zod'; import { google, serpapi } from '@plust/search-sdk'; import { openai } from '@ai-sdk/openai'; // Configure search providers const webSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY }); const academicSearch = serpapi.configure({ apiKey: process.env.SERPAPI_KEY, engine: 'google_scholar', }); // Execute research with orchestration const results = await research({ query: 'Emerging technologies in renewable energy storage', outputSchema: z.object({ marketOverview: z.string(), technologies: z.array( z.object({ name: z.string(), maturityLevel: z.enum(['research', 'emerging', 'growth', 'mature']), costEfficiency: z.number().min(1).max(10), scalabilityPotential: z.number().min(1).max(10), keyPlayers: z.array(z.string()), }) ), forecast: z.object({ shortTerm: z.string(), mediumTerm: z.string(), longTerm: z.string(), }), sources: z.array( z.object({ url: z.string().url(), type: z.enum(['academic', 'news', 'company', 'government']), relevance: z.number().min(0).max(1), }) ), }), steps: [ orchestrate({ llm: openai('gpt-4o'), tools: { searchWeb: searchWeb({ provider: webSearch }), searchAcademic: searchWeb({ provider: academicSearch }), extractContent: extractContent(), analyze: analyze(), // Add your custom tools here }, customPrompt: ` You are conducting market research on emerging renewable energy storage technologies. Your goal is to build a comprehensive market overview with technical assessment. `, maxIterations: 15, exitCriteria: (state) => state.metadata.confidenceScore > 0.85 && state.data.dataPoints?.length > 20, }), ], }); ``` -------------------------------- ### Options Pattern for Function Configuration (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/CONTRIBUTING.md Illustrates the options pattern, where functions accept an optional configuration object with default values. This provides flexibility in customizing function behavior without overly long parameter lists. ```typescript interface MyOptions { param1?: string; param2?: number; } function myFunction(options: MyOptions = {}) { const { param1 = 'default', param2 = 42 } = options; // Implementation } ``` -------------------------------- ### Configure Verbose Logging in Research Function (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This snippet shows how to enable verbose logging for a `research` function by setting the `logLevel` to 'debug' within its configuration. This is essential for gathering more detailed information during debugging. ```typescript research({ query: "Your query", outputSchema: schema, config: { logLevel: 'debug' // 'error', 'warn', 'info', 'debug', 'trace' } }); ``` -------------------------------- ### Build Custom Research Workflows with DataSleuth Pipelines Source: https://context7.com/plustorg/datasleuth/llms.txt This code demonstrates building custom research workflows with specific pipeline steps using the DataSleuth library. It shows how to chain multiple research actions like planning, searching, extracting, fact-checking, analyzing, and summarizing. Dependencies include '@plust/datasleuth', 'zod', '@ai-sdk/openai', '@ai-sdk/anthropic', and '@plust/search-sdk'. ```typescript import { research, plan, searchWeb, extractContent, factCheck, analyze, summarize, evaluate, repeatUntil } from '@plust/datasleuth'; import { z } from 'zod'; import { openai } from '@ai-sdk/openai'; import { anthropic } from '@ai-sdk/anthropic'; import { google } from '@plust/search-sdk'; const outputSchema = z.object({ executiveSummary: z.string(), marketAnalysis: z.object({ size: z.string(), growth: z.string(), trends: z.array(z.string()) }), competitiveLandscape: z.array(z.object({ company: z.string(), marketShare: z.string(), strengths: z.array(z.string()) })), recommendations: z.array(z.string()), dataQuality: z.number().min(0).max(1) }); const searchProvider = google.configure({ apiKey: process.env.GOOGLE_API_KEY }); const results = await research({ query: 'Electric vehicle battery market analysis', outputSchema, steps: [ // Step 1: Generate research plan plan({ llm: openai('gpt-4o'), temperature: 0.4, includeInResults: false }), // Step 2: Initial web search searchWeb({ provider: searchProvider, maxResults: 15, useQueriesFromPlan: true }), // Step 3: Extract content from top results extractContent({ maxUrls: 10, maxContentLength: 8000, selectors: 'article, .content, main' }), // Step 4: Repeat search until we have enough sources repeatUntil( evaluate({ criteriaFn: (state) => state.data.searchResults.length >= 20 }), [ searchWeb({ provider: searchProvider }), extractContent() ], { maxIterations: 3 } ), // Step 5: Fact-check extracted information factCheck({ llm: openai('gpt-4o'), threshold: 0.75, includeEvidence: true }), // Step 6: Perform deep analysis analyze({ llm: anthropic('claude-3-opus-20240229'), focus: 'market-dynamics', depth: 'comprehensive' }), // Step 7: Synthesize findings summarize({ llm: anthropic('claude-3-sonnet-20240229'), format: 'structured', maxLength: 3000, includeCitations: true }) ], config: { errorHandling: 'continue', timeout: 180000 } }); console.log(results.executiveSummary); console.log(`Market analysis confidence: ${results.dataQuality}`); ``` -------------------------------- ### Configure Custom Research Pipeline with Datasleuth Source: https://github.com/plustorg/datasleuth/blob/main/README.md This snippet demonstrates how to configure a custom research pipeline using Datasleuth. It involves defining specific steps like planning, web searching, content extraction, and evaluation, with options for error handling and timeouts. It requires the '@plust/datasleuth', '@plust/search-sdk', and '@ai-sdk/openai' packages, along with 'zod' for schema definition. ```typescript import { research, plan, searchWeb, extractContent, evaluate, repeatUntil, } from '@plust/datasleuth'; import { z } from 'zod'; import { google } from '@plust/search-sdk'; import { openai } from '@ai-sdk/openai'; // Configure a search provider const googleSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY, cx: process.env.GOOGLE_CX, }); // Define complex output schema const outputSchema = z.object({ summary: z.string(), threats: z.array(z.string()), opportunities: z.array(z.string()), timeline: z.array( z.object({ year: z.number(), event: z.string(), }) ), sources: z.array( z.object({ url: z.string().url(), reliability: z.number().min(0).max(1), }) ), }); // Execute research with custom pipeline steps const results = await research({ query: 'Impact of climate change on agriculture', outputSchema, steps: [ plan({ llm: openai('gpt-4o') }), searchWeb({ provider: googleSearch, maxResults: 10 }), extractContent({ selector: 'article, .content, main' }), repeatUntil(evaluate({ criteriaFn: (data) => data.sources.length > 15 }), [ searchWeb({ provider: googleSearch }), extractContent(), ]), ], config: { errorHandling: 'continue', timeout: 60000, // 1 minute }, }); ``` -------------------------------- ### Process Large Content in Chunks (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This TypeScript code demonstrates a strategy for processing large content by breaking it down into smaller chunks of a specified size. This approach helps mitigate memory issues when dealing with extensive documents. ```typescript // Process content in 10KB chunks const chunkSize = 10 * 1024; for (let i = 0; i < content.length; i += chunkSize) { const chunk = content.substring(i, i + chunkSize); // Process chunk... } ``` -------------------------------- ### Pipeline Step: plan Source: https://github.com/plustorg/datasleuth/blob/main/README.md Creates a research plan using LLMs. This step can be used to generate a structured plan before executing other research steps. ```APIDOC ## plan(options?) ### Description Creates a research plan using LLMs. ### Method `plan` ### Parameters #### Parameters - **llm** (LanguageModel) - Optional - LLM model to use (falls back to defaultLLM). - **customPrompt** (string) - Optional - Custom system prompt. - **temperature** (number) - Optional - LLM temperature (0.0-1.0). - **includeInResults** (boolean) - Optional - Whether to include plan in results. ### Response #### Success Response (200) - **ResearchStep** - The generated research plan. ``` -------------------------------- ### Pipeline Step: summarize Source: https://github.com/plustorg/datasleuth/blob/main/README.md Synthesizes information into concise summaries. This step is useful for condensing large amounts of data into digestible formats. ```APIDOC ## summarize(options?) ### Description Synthesizes information into concise summaries. ### Method `summarize` ### Parameters #### Parameters - **llm** (LanguageModel) - Optional - LLM model to use. - **maxLength** (number) - Optional - Maximum summary length. - **format** ('paragraph' | 'bullet' | 'structured') - Optional - Summary format. - **includeInResults** (boolean) - Optional - Whether to include summary in results. ### Response #### Success Response (200) - **ResearchStep** - The summary of the information. ``` -------------------------------- ### Configure Research Function to Continue on Error (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This code snippet shows how to configure the `research` function to continue executing even when errors occur. By setting `errorHandling` to 'continue' and `continueOnError` to `true`, the pipeline can proceed past failed steps. ```typescript research({ query: "Your query", outputSchema: schema, config: { errorHandling: 'continue', continueOnError: true } }); ``` -------------------------------- ### TypeScript Interface for the searchWeb Pipeline Step Source: https://github.com/plustorg/datasleuth/blob/main/README.md Defines the TypeScript interface for the `searchWeb` pipeline step, used for searching the web with configured providers. Parameters include `provider`, `maxResults`, `language`, `region`, `safeSearch`, and `useQueriesFromPlan`. ```typescript searchWeb({ provider: SearchProvider; // Configured search provider maxResults?: number; // Maximum results to return language?: string; // Language code (e.g., 'en') region?: string; // Region code (e.g., 'US') safeSearch?: 'off' | 'moderate' | 'strict'; useQueriesFromPlan?: boolean; // Use queries from research plan }): ResearchStep ``` -------------------------------- ### Add Custom Evaluation Step to Inspect Pipeline State (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This TypeScript code illustrates how to add a custom evaluation step within a research pipeline. The `evaluate` function logs the current state of the pipeline, allowing for inspection of intermediate data during debugging. ```typescript steps: [ // other steps... evaluate({ criteriaFn: (state) => { console.log('Current state:', JSON.stringify(state.data, null, 2)); return true; // Always continue } }) ] ``` -------------------------------- ### TypeScript Interface for the plan Pipeline Step Source: https://github.com/plustorg/datasleuth/blob/main/README.md Defines the TypeScript interface for the `plan` pipeline step, which uses LLMs to create a research plan. It accepts optional parameters like `llm`, `customPrompt`, `temperature`, and `includeInResults`. ```typescript plan({ llm?: LanguageModel; // LLM model to use (falls back to defaultLLM) customPrompt?: string; // Custom system prompt temperature?: number; // LLM temperature (0.0-1.0) includeInResults?: boolean; // Whether to include plan in results }): ResearchStep ``` -------------------------------- ### Handle Missing Required Fields in Zod Schema (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This snippet demonstrates how to make fields optional or provide default values in a Zod schema to handle missing required fields during validation. It's useful for ensuring the research pipeline generates all necessary data. ```typescript const schema = z.object({ summary: z.string(), findings: z.array(z.string()).optional(), // Make optional // or provide default value: sources: z.array(z.string().url()).default([]) }); ``` -------------------------------- ### Pipeline Step: extractContent Source: https://github.com/plustorg/datasleuth/blob/main/README.md Extracts content from web pages. This step is useful for parsing and retrieving specific information from URLs. ```APIDOC ## extractContent(options?) ### Description Extracts content from web pages. ### Method `extractContent` ### Parameters #### Parameters - **selectors** (string) - Optional - CSS selectors for content. - **maxUrls** (number) - Optional - Maximum URLs to process. - **maxContentLength** (number) - Optional - Maximum content length per URL. - **includeInResults** (boolean) - Optional - Whether to include content in results. ### Response #### Success Response (200) - **ResearchStep** - The extracted content. ``` -------------------------------- ### Transform Data Types in Zod Schema (TypeScript) Source: https://github.com/plustorg/datasleuth/blob/main/docs/troubleshooting.md This code shows how to use `z.preprocess` to transform input data types, such as converting string numbers to actual numbers, within a Zod schema. It also illustrates using `z.union` for fields that can accept multiple data types. ```typescript // Transform string numbers to actual numbers const schema = z.object({ value: z.preprocess( (val) => typeof val === 'string' ? Number(val) : val, z.number() ) }); // Use z.union() for fields that might have multiple types: const schemaWithUnion = z.object({ date: z.union([z.string(), z.date()]) }); ``` -------------------------------- ### Web Search with Multiple Providers using Datasleuth Source: https://context7.com/plustorg/datasleuth/llms.txt Demonstrates configuring and utilizing multiple search providers (Google, Bing, SerpAPI) within a single research operation. It showcases how to define custom output schemas using Zod for structured results. This approach allows for targeted searches across different engines and types (web, academic) for comprehensive information gathering. ```typescript import { research, searchWeb, extractContent } from '@plust/datasleuth'; import { google, bing, serpapi } from '@plust/search-sdk'; import { z } from 'zod'; // Configure multiple search providers const googleSearch = google.configure({ apiKey: process.env.GOOGLE_API_KEY, cx: process.env.GOOGLE_CX }); const bingSearch = bing.configure({ apiKey: process.env.BING_API_KEY }); const academicSearch = serpapi.configure({ apiKey: process.env.SERPAPI_KEY, engine: 'google_scholar' }); const outputSchema = z.object({ findings: z.array(z.string()), sources: z.array(z.object({ url: z.string().url(), title: z.string(), type: z.enum(['web', 'academic', 'news']) })) }); // Use different providers for different search types const results = await research({ query: 'Machine learning applications in drug discovery', outputSchema, steps: [ // General web search searchWeb({ provider: googleSearch, maxResults: 20, language: 'en', region: 'US', safeSearch: 'moderate', includeInResults: false }), // Academic paper search searchWeb({ provider: academicSearch, query: 'machine learning drug discovery peer reviewed', maxResults: 10 }), // News search searchWeb({ provider: bingSearch, query: 'recent breakthroughs ML pharmaceutical research', maxResults: 10 }), extractContent({ maxUrls: 15, maxContentLength: 10000 }) ] }); console.log(`Found ${results.sources.length} total sources`); ```