### Install LangStruct Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Installs the LangStruct library along with example dependencies. This command is used to set up the necessary tools for using LangStruct. ```bash pip install "langstruct[examples]" ``` -------------------------------- ### Install and Run vLLM Server Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Installs the vLLM library and starts a local OpenAI-compatible API server for serving local models. ```bash # Install vLLM pip install vllm # Start vLLM server python -m vllm.entrypoints.openai.api_server \ --model microsoft/DialoGPT-medium \ --port 8000 ``` -------------------------------- ### LangStruct with Multiple Examples (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Demonstrates how to provide multiple examples to LangStruct for better type inference. This allows the extractor to understand a wider range of data structures and types. ```python # Better type inference from multiple examples examples = [ {"name": "Alice", "age": 25, "skills": ["Python"]}, {"name": "Bob", "age": 35, "skills": ["JavaScript", "React"]} ] extractor = LangStruct(examples=examples) # Infers: name=str, age=int, skills=List[str] ``` -------------------------------- ### LangStruct Source Tracking Example (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx A minimal example demonstrating how to extract source tracking information from text using LangStruct. This helps in understanding where specific extracted entities originated within the source text. ```python result = extractor.extract(text) ``` -------------------------------- ### Install LangStruct with Optional Extras Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Installs LangStruct with specific optional dependencies. Each command installs a different set of extras like 'viz', 'examples', 'parallel', 'dev', or 'all' for comprehensive features. ```bash pip install "langstruct[viz]" pip install "langstruct[examples]" pip install "langstruct[parallel]" pip install "langstruct[dev]" pip install "langstruct[all]" ``` -------------------------------- ### Development Installation for LangStruct Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Guides for setting up the LangStruct project for development. Includes cloning the repository, installing dependencies in development mode using `uv` or `pip`, and running tests and linting. ```bash # Clone repository git clone https://github.com/langstruct/langstruct.git cd langstruct # Install in development mode uv sync --dev # Or with pip pip install -e ".[dev,test]" # Run tests pytest # Run linting ruff check . mypy src/ ``` -------------------------------- ### Basic Data Extraction with LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Demonstrates how to perform basic data extraction using LangStruct. It involves defining a schema by example, extracting entities from text, and tracking character-level source information. Refinement can be enabled for higher accuracy. ```python from langstruct import LangStruct # Define schema by example extractor = LangStruct(example={ "company": "Apple Inc.", "revenue": 125.3, "quarter": "Q3 2024" }) # Extract from text text = "Apple reported $125.3B revenue in Q3 2024..." result = extractor.extract(text) print(result.entities) # {'company': 'Apple Inc.', 'revenue': 125.3, 'quarter': 'Q3 2024'} print(result.sources) # Character-level source tracking # {'company': [CharSpan(0, 5, 'Apple')], ...} # Boost accuracy with refinement refined_result = extractor.extract(text, refine=True) print(f"Confidence: {refined_result.confidence:.1%}") # Higher confidence ``` -------------------------------- ### Save and Load LangStruct Extractors Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Demonstrates how to persist the state of an extractor to disk and then load it back for later use. This is crucial for deployment and resuming operations without re-initializing the extractor. ```python # Save an extractor (preserves all state) extractor.save("./my_extractor") # Load anywhere (API keys must be available) loaded_extractor = LangStruct.load("./my_extractor") # Works exactly like the original result = loaded_extractor.extract("New text") ``` -------------------------------- ### LangStruct Model Configuration (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Shows how to configure the language model used by LangStruct. It covers auto-detection, specifying a particular model, and using local models via Ollama. ```python # Default: Auto-detects available models extractor = LangStruct(example=schema) # Specific model extractor = LangStruct( example=schema, model="gpt-5-mini" # Example latest OpenAI model ) # Local with Ollama extractor = LangStruct( example=schema, model="ollama/llama3.2" ) ``` -------------------------------- ### Install LangStruct and Set API Keys Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Installs the langstruct Python package and sets environment variables for API keys required by various language models like OpenAI, Gemini, and Claude. ```bash pip install langstruct # Set up any API key: export OPENAI_API_API_KEY="sk-your-key" # OpenAI export GOOGLE_API_KEY="your-key" # Gemini export ANTHROPIC_API_KEY="sk-ant-key" # Claude ``` -------------------------------- ### Verify LangStruct Installation and Basic Functionality Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx A Python script to verify the LangStruct installation by checking the version and performing a basic extraction task using a defined schema. ```python import langstruct # Check version print(f"LangStruct version: {langstruct.__version__}") # Test basic functionality from pydantic import BaseModel, Field from langstruct import LangStruct class TestSchema(BaseModel): message: str = Field(description="A simple message") # This will test your API connection (uses your default model) extractor = LangStruct(schema=TestSchema) result = extractor.extract("Hello, LangStruct!") print(f"Success! Extracted: {result.entities}") ``` -------------------------------- ### Start Local Documentation Server Source: https://github.com/langstruct-ai/langstruct/blob/main/CONTRIBUTING.md Starts a local development server to preview the documentation site as it's being built. Changes are often reflected live. ```bash cd docs pnpm install pnpm dev ``` -------------------------------- ### Install LangStruct with uv Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Installs the LangStruct package using the 'uv' package manager, which is recommended for faster installation. ```bash uv add langstruct ``` -------------------------------- ### Install LangStruct Package Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Instructions for installing the LangStruct package using pip. It covers both user-level installation and the recommended approach using a virtual environment. ```bash # Use user installation if needed pip install --user langstruct # Or use virtual environment (recommended) python -m venv langstruct-env source langstruct-env/bin/activate # On Windows: langstruct-env\Scripts\activate pip install langstruct ``` -------------------------------- ### Complete RAG Pipeline with LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Illustrates a complete Retrieval Augmented Generation (RAG) pipeline using LangStruct and ChromaDB. It covers indexing documents with extracted metadata and querying the vector database using parsed natural language queries. ```python from langstruct import LangStruct from chromadb import Client # 1. Single instance for both operations ls = LangStruct(example={ "company": "Apple", "revenue": 100.0, "quarter": "Q3" }) vector_db = Client().create_collection("docs") # 2. Index documents with metadata def index_document(text): metadata = ls.extract(text).entities vector_db.add(texts=[text], metadatas=[metadata]) # 3. Query with natural language def search(query): parsed = ls.query(query) return vector_db.query( query_texts=parsed.semantic_terms, where=parsed.structured_filters, n_results=5 ) # Usage index_document("Apple reported $125B in Q3 2024...") results = search("Q3 tech companies over $100B") # Returns only Apple, not other Q3 mentions ``` -------------------------------- ### LangStruct with Custom Pydantic Schemas (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Explains how to define custom data schemas using Pydantic models for LangStruct. This provides more control over data validation and structure during extraction. ```python from pydantic import BaseModel, Field from typing import List, Optional class CompanySchema(BaseModel): name: str revenue: float = Field(gt=0, description="Revenue in billions") employees: Optional[int] = None products: List[str] = [] extractor = LangStruct(schema=CompanySchema) ``` -------------------------------- ### Install and Configure Ollama for Local Models Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Installs Ollama, pulls a model (e.g., llama2), and sets the Ollama base URL as an environment variable for LangStruct to use local models. ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull a model ollama pull llama2 # Use in LangStruct export OLLAMA_BASE_URL="http://localhost:11434" ``` -------------------------------- ### Install Dependencies Source: https://github.com/langstruct-ai/langstruct/blob/main/CONTRIBUTING.md Installs project dependencies using either uv (recommended) or pip. The `.[dev]` extra ensures development-specific packages are installed. ```bash # With uv (recommended) uv sync --extra dev # Or with pip pip install -e ".[dev]" ``` -------------------------------- ### Set Google Gemini API Key Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Configures the Google API key as an environment variable for accessing Google Gemini models. ```bash export GOOGLE_API_KEY="your-google-api-key" ``` -------------------------------- ### LangStruct Batch Processing and Retries (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Details how to process multiple documents efficiently using LangStruct's batch capabilities. It includes options for controlling concurrency, showing progress, handling rate limits, and retrying failed extractions. ```python # Process multiple documents efficiently documents = [doc1, doc2, doc3, ...] results = extractor.extract(documents, max_workers=8, show_progress=True) for result in results: print(f"Confidence: {result.confidence:.1%}") print(f"Entities: {result.entities}") # Batch processing with refinement for higher accuracy results = extractor.extract( documents, refine=True, max_workers=5, rate_limit=60, # calls per minute retry_failed=True # raise on failures (False to skip with warnings) ) # Note: 2-5x higher cost but significantly better accuracy ``` -------------------------------- ### Connect to OpenAI API Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx This Python snippet demonstrates how to establish a connection to the OpenAI API using the `openai` library. It assumes the `OPENAI_API_KEY` environment variable is set. ```python import openai client = openai.OpenAI() # Uses OPENAI_API_KEY response = client.models.list() print("OpenAI connection successful") ``` -------------------------------- ### Complete LangStruct Example: Extract Metadata and Parse Queries Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Demonstrates a full usage of LangStruct for extracting entities from a document and parsing a natural language query into structured filters. This example also includes comments on how to integrate with a RAG system. ```python from langstruct import LangStruct # 1. Single instance for both operations ls = LangStruct(example={ "company": "Apple", "revenue": 100.0, "quarter": "Q3" }) # 2. Extract metadata from documents doc = "Apple reported $125B revenue in Q3 2024" metadata = ls.extract(doc).entities print(f"Extracted: {metadata}") # 3. Parse queries into filters query = "Q3 tech companies over $100B" filters = ls.query(query) print(f"Filters: {filters.structured_filters}") # 4. Use with your RAG system # vector_db.add(doc, metadata=metadata) # results = vector_db.search( # query=filters.semantic_terms, # where=filters.structured_filters # ) ``` -------------------------------- ### Configure LangStruct with .env File Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Sets up LangStruct configuration by creating a .env file in the project root, including API keys and default model settings. ```env # Add your preferred provider's API key GOOGLE_API_KEY=your-google-api-key OPENAI_API_KEY=sk-your-key-here ANTHROPIC_API_KEY=sk-ant-your-key-here LANGSTRUCT_DEFAULT_MODEL=your-preferred-model ``` -------------------------------- ### Install LangStruct and Set API Keys (Bash) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/index.mdx Provides instructions for installing the LangStruct library using pip and setting up necessary API keys for various LLM providers. It covers OpenAI, Google Gemini, and Anthropic, as well as the option to use local models with Ollama. ```bash pip install langstruct # Set up any API key (choose one): export OPENAI_API_KEY="sk-your-key" # OpenAI export GOOGLE_API_KEY="your-key" # Google Gemini export ANTHROPIC_API_KEY="sk-ant-key" # Claude models # Or use local models with Ollama (no API key needed) ``` -------------------------------- ### Extract Structured Data with LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Demonstrates how to perform data extraction using LangStruct by defining a schema through an example. It shows how to extract entities and their source spans from a given text. ```python from langstruct import LangStruct # Define schema by example extractor = LangStruct(example={ "company": "Apple Inc.", "revenue": 125.3, "quarter": "Q3 2024" }) # Extract from text text = "Apple reported $125.3B revenue in Q3 2024, beating estimates." result = extractor.extract(text) print(result.entities) # {'company': 'Apple Inc.', 'revenue': 125.3, 'quarter': 'Q3 2024'} print(result.sources['revenue']) # [CharSpan(15, 22, '$125.3B')] ``` -------------------------------- ### Set Up Pre-commit Hooks Source: https://github.com/langstruct-ai/langstruct/blob/main/CONTRIBUTING.md Installs pre-commit hooks to automate code formatting, linting, and other checks before committing. This helps maintain code quality. ```bash uv run pre-commit install ``` -------------------------------- ### Reinstall LangStruct with All Dependencies Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Uninstalls the current LangStruct package and then reinstalls it with all optional dependencies to resolve potential import errors. ```bash # If you get import errors, reinstall with dependencies pip uninstall langstruct pip install langstruct[all] ``` -------------------------------- ### Test Google Gemini API Key Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx A Python snippet to test the configured Google API key by attempting to list available models using the google.genai client. ```python # Test your Google API key from google import genai client = genai.Client() # Uses GOOGLE_API_KEY response = client.models.list() print("Google Gemini connection successful") ``` -------------------------------- ### LangStruct Initialization and Optimization with MIPROv2 Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/why-dspy.mdx This Python code demonstrates initializing a LangStruct extractor with a schema example and then using its 'optimize' method, powered by MIPROv2. It shows how to provide training texts and expected results to automatically tune the extraction prompts and examples. ```python from langstruct import LangStruct # 1. Create extractor with your schema extractor = LangStruct(example={ "company": "Apple", "revenue": 100.0, "quarter": "Q3" }) # 2. Let MIPROv2 optimize prompts and examples automatically extractor.optimize( texts=["Apple reported $125B in Q3...", "Meta earned $40B..."], expected_results=[ {"company": "Apple", "revenue": 125.0, "quarter": "Q3"}, {"company": "Meta", "revenue": 40.0, "quarter": "Q3"} ] ) # 3. Now it's optimized for your specific data! result = extractor.extract("Microsoft announced $65B revenue for Q4") ``` -------------------------------- ### Shell command for running LangStruct example Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/examples/gepa.mdx This command sets the Google API key as an environment variable and then runs the specified Python example script using 'uv'. This is typically used to set up and execute the optimization process. ```bash export GOOGLE_API_KEY="YOUR_KEY" uv run python examples/07b_optimization_gepa.py ``` -------------------------------- ### Initialize LangStruct with Different Model Configurations Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Shows how to initialize LangStruct, either by auto-detecting the model from the environment or by explicitly specifying a model, including options for cloud-based and local models. ```python from langstruct import LangStruct # No model needed - it auto-detects from your environment! extractor = LangStruct(example=schema) # Or specify model explicitly extractor = LangStruct( example=schema, model="gemini/gemini-2.5-flash-lite" # Fast & cheap ) # Local models extractor = LangStruct( example=schema, model="ollama/llama3.2" # No API needed ) ``` -------------------------------- ### Track Data Sources in LangStruct Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Iterates through the sources of a result to identify and print the field, text, and span of each data source. This helps in understanding where specific information originated. ```python for field, spans in result.sources.items(): for span in spans: print(f"{field}: '{text[span.start:span.end]}' at {span.start}-{span.end}") ``` -------------------------------- ### Production Deployment Workflow for LangStruct Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/persistence.mdx Illustrates the typical workflow for production deployment, including training and saving an extractor during development, and then loading and using it in a production service or API. This ensures a smooth transition from development to deployment. ```python # Development: Train and save extractor = LangStruct(schema=MySchema) extractor.optimize(training_data, expected_results) extractor.save("./production_extractor") # Production: Load and use def load_extractor(): return LangStruct.load("./production_extractor") # Use in API or service extractor = load_extractor() result = extractor.extract(incoming_text) ``` -------------------------------- ### Manual Documentation Deployment (Bash) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/README.md Provides steps for manually building and deploying the LangStruct documentation site. This involves navigating to the `docs/` directory, building the production site, and then deploying the generated `dist/` directory. ```bash cd docs pnpm build # Deploy dist/ directory to your hosting provider ``` -------------------------------- ### Export and Visualize LangStruct Results Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Provides methods for saving individual results to JSON, exporting batch results in various formats (CSV, JSON, Excel, Parquet), and performing a JSONL round-trip for annotation and visualization. ```python # Save individual result result.save_json("output.json") # Export batch results extractor.export_batch(results, "output.csv") # csv/json/excel/parquet # JSONL round‑trip extractor.save_annotated_documents(results, "extractions.jsonl") loaded = extractor.load_annotated_documents("extractions.jsonl") extractor.visualize(loaded, "results.html") ``` -------------------------------- ### Install LangStruct with pip Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Installs the LangStruct package using the standard 'pip' package manager. ```bash pip install langstruct ``` -------------------------------- ### Query Parsing for RAG with LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/getting-started.mdx Shows how to use LangStruct to parse natural language queries for Retrieval Augmented Generation (RAG). It extracts semantic terms and structured filters from a query, which can then be used to query a vector database. ```python from langstruct import LangStruct # Same instance for both extraction and parsing ls = LangStruct(example={ "company": "Apple Inc.", "revenue": 125.3, "quarter": "Q3 2024" }) # Parse natural language query query = "Q3 2024 tech companies over $100B discussing AI" result = ls.query(query) print(result.semantic_terms) # ['tech companies', 'AI', 'artificial intelligence'] print(result.structured_filters) # {'quarter': 'Q3 2024', 'revenue': {'$gte': 100.0}} ``` -------------------------------- ### DSPy Optimization in LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Explains how LangStruct utilizes DSPy 3.0 for automatic prompt optimization, eliminating the need for manual prompt engineering. It contrasts the traditional approach with LangStruct's self-optimizing method and shows how to initiate optimization. ```python # LangStruct uses DSPy 3.0 for automatic optimization # No manual prompt engineering needed! # Traditional approach (manual prompts): prompt = "Extract company, revenue, quarter from: {text}" # Requires iterative tuning, breaks with new data # LangStruct approach (self-optimizing): extractor = LangStruct(example=schema) # Automatically optimizes prompts using MIPROv2 # Improves with your data, no manual tuning # See optimization in action extractor.optimize( texts=["training texts..."], expected_results=[{"expected outputs..."}] # Optional - uses confidence if omitted ) ``` -------------------------------- ### Build Documentation Locally Source: https://github.com/langstruct-ai/langstruct/blob/main/CONTRIBUTING.md Builds the static documentation site for LangStruct. This command is run from the 'docs' directory. ```bash cd docs pnpm install pnpm build ``` -------------------------------- ### Process Multiple Documents with LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Demonstrates batch processing of multiple documents for data extraction using LangStruct. It includes parameters for controlling concurrency (`max_workers`), progress display (`show_progress`), and rate limiting (`rate_limit`). ```python # Batch processing documents = [ "Apple Q3: $125.3B revenue", "Microsoft Q3: $62.9B revenue", "Google Q3: $88.2B revenue" ] results = extractor.extract( documents, max_workers=8, show_progress=True, rate_limit=60 ) for result in results: print(f"{result.entities['company']}: ${result.entities['revenue']}B") print(f"Confidence: {result.confidence:.1%}\n") ``` -------------------------------- ### Set OpenAI API Key Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Configures the OpenAI API key as an environment variable for accessing OpenAI models. ```bash export OPENAI_API_KEY="your-openai-api-key" ``` -------------------------------- ### Set Anthropic Claude API Key Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Configures the Anthropic API key as an environment variable for accessing Anthropic Claude models. ```bash export ANTHROPIC_API_KEY="your-anthropic-api-key" ``` -------------------------------- ### Source Tracking and Visualization with LangStruct (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/quickstart.mdx Details how to track and visualize extracted data sources at a character-level precision. It covers saving visualizations to HTML, using JSONL for dataset round-trips, and debugging validation warnings. ```python result = extractor.extract(text) # Character-level precision for field, spans in result.sources.items(): for span in spans: print(f"{field}: '{text[span.start:span.end]}' at {span.start}-{span.end}") # Interactive visualization from langstruct import HTMLVisualizer viz = HTMLVisualizer() viz.save_visualization(text, result, "output.html") # JSONL round‑trip for datasets results = extractor.extract(documents, validate=False) extractor.save_annotated_documents(results, "extractions.jsonl") loaded = extractor.load_annotated_documents("extractions.jsonl") extractor.visualize(loaded, "results.html") # Debug mode for detailed validation feedback result = extractor.extract(text, debug=True) # Shows detailed validation warnings and suggestions when issues are detected ``` -------------------------------- ### Set Azure OpenAI Credentials Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Configures the Azure OpenAI endpoint, API key, and API version as environment variables for accessing Azure OpenAI services. ```bash export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_API_KEY="your-azure-api-key" export AZURE_OPENAI_API_VERSION="2024-02-01" ``` -------------------------------- ### RAG Integration Example with LangStruct Source: https://context7.com/langstruct-ai/langstruct/llms.txt Provides a foundational example for integrating LangStruct with a vector database (Chroma) for enhanced RAG capabilities. It sets up an extractor with a defined schema for financial documents, initializes embeddings and text splitters, and demonstrates the initial setup for processing documents and preparing them for retrieval. ```python from langchain_community.vectorstores import Chroma from langchain_community.embeddings import OpenAIEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.schema import Document from langstruct import LangStruct # Define schema for financial documents extractor = LangStruct(example={ "company": "Contoso Corp", "quarter": "Q2 2024", "revenue_numeric": 61.9, "risks": ["Macro", "Competition"] }) embeddings = OpenAIEmbeddings() text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200) ``` -------------------------------- ### Quick Experiment Extractor Initialization - Python Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/optimization/index.mdx Provides a concise way to initialize a LangStruct extractor for quick experiments, skipping the optimization step entirely. ```python extractor = LangStruct(example={"name": "John", "age": 30}) ``` -------------------------------- ### Configure LangStruct Environment Variables Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/installation.mdx Sets various environment variables for LangStruct, including API keys for different providers, default model, cache directory, and log level. ```bash # Choose your preferred provider export GOOGLE_API_KEY="your-google-api-key" # Google Gemini export OPENAI_API_KEY="sk-..." # OpenAI export OPENAI_ORG_ID="org-..." # Optional export ANTHROPIC_API_KEY="sk-ant-..." # Anthropic Claude # LangStruct configuration export LANGSTRUCT_DEFAULT_MODEL="your-preferred-model" export LANGSTRUCT_CACHE_DIR="~/.langstruct" export LANGSTRUCT_LOG_LEVEL="INFO" ``` -------------------------------- ### Model Switching with Traditional Libraries (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/why-langstruct.mdx Demonstrates the difficulties of switching LLM providers (OpenAI to Claude to Llama) using traditional libraries. It highlights the need for extensive re-tuning and rewriting prompts and examples for each new model, leading to significant time loss. ```python # Month 1: Carefully tune prompts for OpenAI extractor = LangExtract(...) # Spend days crafting examples and prompt engineering # Month 6: Switch to Claude - everything breaks! # ❌ Prompts don't work the same way # ❌ Few-shot examples need rewriting # ❌ Back to manual tuning for weeks # Month 12: Move to local Llama - start over again! # ❌ Different prompt format requirements # ❌ Re-engineer everything from scratch ``` -------------------------------- ### Running Documentation Commands (Bash) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/README.md Lists essential commands for managing the LangStruct documentation project, including installation, local development, building for production, and previewing. These commands are executed from the `docs/` directory. ```bash pnpm install pnpm dev pnpm build pnpm preview pnpm astro ... pnpm astro -- --help ``` -------------------------------- ### LangExtract Data Extraction (Python) Source: https://github.com/langstruct-ai/langstruct/blob/main/docs/src/content/docs/why-langstruct.mdx Illustrates the usage of LangExtract for data extraction. This approach requires manual prompt engineering and few-shot examples to define the extraction schema and guide the model. It also provides character-level provenance tracking. ```python from langextract import LangExtract # Manual prompt engineering required extractor = LangExtract( model="gemini-1.5-flash", schema={ "company": "string", "revenue": "number", "quarter": "string" }, examples=[ {"text": "...", "output": {...}}, {"text": "...", "output": {...}} ] ) result = extractor.extract(text) print(result.extractions[0].provenance) # Character tracking ```