### Install ContextGem from Source

Source: https://contextgem.dev/installation

Clones the ContextGem repository from GitHub and installs it in editable mode, suitable for local development or custom builds.

```Shell
git clone https://github.com/shcherbak-ai/contextgem.git
cd contextgem
pip install -e .
```

--------------------------------

### Verify ContextGem installation and version

Source: https://contextgem.dev/_sources/installation

After installation, run this command to confirm that ContextGem is correctly installed and to display its version number.

```bash
python -c "import contextgem; print(contextgem.__version__)"
```

--------------------------------

### Install ContextGem from source via Git

Source: https://contextgem.dev/_sources/installation

To install ContextGem directly from its source code, clone the GitHub repository and then install it in editable mode using pip.

```bash
git clone https://github.com/shcherbak-ai/contextgem.git
cd contextgem
pip install -e .
```

--------------------------------

### Set up ContextGem for development using Poetry

Source: https://contextgem.dev/_sources/installation

For development purposes, ContextGem utilizes Poetry. This setup involves installing Poetry, installing project dependencies including development extras, and activating the virtual environment.

```bash
# Install poetry if you don't have it
pip install poetry

# Install dependencies including development extras
poetry install --with dev

# Activate the virtual environment
poetry shell
```

--------------------------------

### Development Installation with Poetry

Source: https://contextgem.dev/installation

Sets up ContextGem for development using Poetry, including installing Poetry itself, resolving and installing project dependencies with development extras, and activating the project's virtual environment.

```Shell
# Install poetry if you don't have it
pip install poetry

# Install dependencies including development extras
poetry install --with dev

# Activate the virtual environment
poetry shell
```

--------------------------------

### Install ContextGem from PyPI

Source: https://contextgem.dev/installation

Installs or upgrades the ContextGem library using the Python package installer, pip, directly from the Python Package Index (PyPI).

```Shell
pip install -U contextgem
```

--------------------------------

### Verify ContextGem Installation

Source: https://contextgem.dev/installation

Executes a Python command to import the ContextGem library and print its version, confirming that the installation was successful and the library is accessible.

```Python
import contextgem; print(contextgem.__version__)
```

--------------------------------

### Create JsonObjectExample for LLM Guidance

Source: https://contextgem.dev/api/examples

Demonstrates how to create an instance of `JsonObjectExample` using `contextgem` classes. This example shows the basic initialization for guiding LLM extraction tasks.

```Python
from contextgem import JsonObjectConcept, JsonObjectExample

# Create a JSON object example
json_example = JsonObjectExample(
```

--------------------------------

### Python Example: Creating and Attaching String Examples to a StringConcept

Source: https://contextgem.dev/api/examples

Demonstrates how to create `StringExample` instances with specific content and attach them to a `StringConcept` object. This illustrates how examples guide LLM extraction by providing concrete illustrations of expected information for a concept.

```python
from contextgem import StringConcept, StringExample

# Create string examples
string_examples = [
    StringExample(content="X (Client)"),
    StringExample(content="Y (Supplier)"),
]

# Attach string examples to a StringConcept
string_concept = StringConcept(
    name="Contract party name and role",
    description="The name and role of the contract party",
    examples=string_examples  # Attach the example to the concept (optional)
)
```

--------------------------------

### Install ContextGem from PyPI using pip

Source: https://contextgem.dev/_sources/installation

The simplest way to install ContextGem is via pip. This command installs or upgrades ContextGem from the Python Package Index.

```bash
pip install -U contextgem
```

--------------------------------

### Python Example: Extracting Concepts from Documents

Source: https://contextgem.dev/llms/llm_extraction_methods

An example demonstrating the initial setup and imports required to use ContextGem for extracting concepts directly from documents.

```Python
# ContextGem: Extracting Concepts Directly from Documents

import os

from contextgem import Document, DocumentLLM, NumericalConcept, StringConcept
```

--------------------------------

### Extraction Pipeline Example (Instructor)

Source: https://contextgem.dev/_sources/vs_other_frameworks

Shows an extraction pipeline using Instructor, a library focused on structured outputs with Pydantic. This example highlights its strength in structured data extraction but also the need for manual work in building complex pipelines, including comprehensive prompt engineering, Pydantic model definition, custom assembly of components, manual reference mapping, and additional setup for concurrency and cost tracking.

```python
# See file: ../../dev/usage_examples/vs_other_frameworks/advanced/instructor.py
```

--------------------------------

### Initialize Document and Import Classes for StringConcept with Examples in Python

Source: https://contextgem.dev/concepts/string_concept

Illustrates the initial setup for using `StringConcept` with examples, including importing required ContextGem classes and creating a `Document` object from a sample contract text. This forms the basis for defining and applying string concepts.

```Python
# ContextGem: StringConcept Extraction with Examples

import os

from contextgem import Document, DocumentLLM, StringConcept, StringExample

# Create a Document object from text
contract_text = """
SERVICE AGREEMENT
This Service Agreement (the "Agreement") is entered as of January 15, 2025 by and between:
XYZ Innovations Inc., a Delaware corporation with offices at 123 Tech Avenue, San Francisco, CA
("Provider"), and
Omega Enterprises LLC, a New York limited liability company with offices at 456 Business Plaza,
New York, NY ("Customer").
"""
doc = Document(raw_text=contract_text)
```

--------------------------------

### API: StringExample Class Definition

Source: https://contextgem.dev/_modules/contextgem/public/examples

Defines the StringExample class, a Pydantic model for representing string-based examples used to guide LLM extraction tasks. It contains a 'content' field for the example text, which must be a non-empty string. This class can be attached to a StringConcept.

```python
class StringExample(_Example):
    """
    Represents a string example that can be provided by users for certain extraction tasks.

    :ivar content: A non-empty string that holds the text content of the example.
    :type content: NonEmptyStr

    Note:
        Examples are optional and can be used to guide LLM extraction tasks. They serve as reference
        points for the model to understand the expected format and content of extracted information.
        StringExample can be attached to a :class:`~contextgem.public.concepts.StringConcept`.
    """

    content: NonEmptyStr
```

--------------------------------

### Extraction Pipeline Example (LangChain)

Source: https://contextgem.dev/_sources/vs_other_frameworks

Illustrates an extraction pipeline using LangChain, a flexible framework for LLM applications. While powerful, this example highlights the development overhead for complex extraction workflows, including manual prompt engineering, Pydantic model definition, complex chain configuration, custom reference mapping, and additional setup for concurrency and cost tracking.

```python
# See file: ../../dev/usage_examples/vs_other_frameworks/advanced/langchain.py
```

--------------------------------

### API Documentation: StringExample Class Definition

Source: https://contextgem.dev/api/examples

Documents the `StringExample` class, which represents a string-based example for LLM extraction tasks. It details its inheritance, variables, parameters, and notes on its usage, including its `content` property.

```APIDOC
class contextgem.public.examples.StringExample(**data)
  Bases: _Example
  Variables:
    content: A non-empty string that holds the text content of the example.
  Parameters:
    custom_data (dict)
    content (NonEmptyStr)
  Note:
    Examples are optional and can be used to guide LLM extraction tasks. They serve as reference points for the model to understand the expected format and content of extracted information. StringExample can be attached to a StringConcept.
  Properties:
    content: NonEmptyStr
```

--------------------------------

### API Documentation for JsonObjectExample Class

Source: https://contextgem.dev/api/examples

Detailed API documentation for the `contextgem.public.examples.JsonObjectExample` class, which represents a JSON object example for LLM extraction tasks.

```APIDOC
class contextgem.public.examples.JsonObjectExample(**data)
  Bases: _Example
  Description: Represents a JSON object example that can be provided by users for certain extraction tasks.
  Variables:
    content: A JSON-serializable dict with the minimum length of 1 that holds the content of the example.
  Parameters:
    custom_data (dict)
    content (dict[str, Any])
  Note: Examples are optional and can be used to guide LLM extraction tasks. They serve as reference points for the model to understand the expected format and content of extracted information. JsonObjectExample can be attached to a JsonObjectConcept.
```

--------------------------------

### API Documentation: StringExample.clone() Method

Source: https://contextgem.dev/api/examples

Documents the `clone` method of the `StringExample` class, which creates and returns a deep copy of the current instance. This method is useful for duplicating example objects.

```APIDOC
contextgem.public.examples.StringExample.clone()
  Description: Creates and returns a deep copy of the current instance.
  Returns: A deep copy of the current instance.
  Return type: typing.Self
```

--------------------------------

### ContextGem DocumentLLMGroup Workflow Example

Source: https://contextgem.dev/_sources/how_it_works

An example demonstrating the configuration of a `DocumentLLMGroup` with three distinct LLMs (LLM 1, LLM 2, LLM 3), each assigned a specific role (extractor_text, reasoner_text, extractor_vision), model, task, and optional fallback LLM, illustrating a practical multi-LLM extraction setup.

```APIDOC
LLM Group Workflow Example:
  LLM 1:
    Role: extractor_text
    Model: gpt-4o-mini
    Task: Extract payment terms from a contract
    Fallback LLM (optional): gpt-3.5-turbo
  LLM 2:
    Role: reasoner_text
    Model: gpt-4o
    Task: Detect anomalies in the payment terms
    Fallback LLM (optional): claude-3-5-sonnet
  LLM 3:
    Role: extractor_vision
    Model: gpt-4o-mini
    Task: Extract invoice amounts
    Fallback LLM (optional): gpt-4o
```

--------------------------------

### NumericalConcept Extraction with References and Justifications Setup

Source: https://contextgem.dev/concepts/numerical_concept

This Python code snippet provides the initial setup for demonstrating advanced usage of `NumericalConcept` extraction, specifically focusing on how to enable and configure justifications and references. It includes the necessary imports from the `contextgem` library.

```python
import os

from contextgem import Document, DocumentLLM, NumericalConcept
```

--------------------------------

### Extraction Pipeline Example (ContextGem)

Source: https://contextgem.dev/_sources/vs_other_frameworks

Demonstrates ContextGem's simplified, declarative syntax for defining multi-LLM extraction pipelines. It highlights features like automated token counting, cost calculation, built-in concurrency, easy example definition, and unified result aggregation, reducing development overhead for complex workflows.

```python
# See file: ../../dev/usage_examples/docs/advanced/advanced_multiple_docs_pipeline.py
```

--------------------------------

### API Reference: Concept examples attribute

Source: https://contextgem.dev/genindex

Documents the 'examples' attribute for JsonObjectConcept and StringConcept, providing sample data or usage examples relevant to these concept types.

```APIDOC
contextgem.public.concepts.JsonObjectConcept.examples (attribute)
contextgem.public.concepts.StringConcept.examples (attribute)
```

--------------------------------

### Extracting Concepts from Documents using Vision Capabilities with ContextGem

Source: https://contextgem.dev/_sources/quickstart

This Python example illustrates ContextGem's vision capabilities for extracting structured data from documents with complex layouts or images. It shows how to process scanned contracts or analyze information from charts and graphs by providing an image path and a target schema.

```python
from contextgem import ContextGem

# Initialize ContextGem for vision-based concept extraction
gem = ContextGem(api_key="YOUR_API_KEY")

# Example: Extract data from a scanned contract image
# Assume 'scanned_contract.png' is a path to an image file
image_path = "path/to/scanned_contract.png"
contract_schema = {
    "contract_id": "string",
    "party_names": "array<string>",
    "effective_date": "string"
}
extracted_contract_data = gem.extract_concepts(image_path=image_path, schema=contract_schema)
print("Extracted Vision Concepts (Contract):", extracted_contract_data)

# Example: Identify information from a chart in a report image
# Assume 'report_chart.jpg' is a path to an image file
chart_path = "path/to/report_chart.jpg"
chart_schema = {
    "chart_title": "string",
    "data_points": "array<object>"
}
extracted_chart_info = gem.extract_concepts(image_path=chart_path, schema=chart_schema)
print("Extracted Vision Concepts (Chart):", extracted_chart_info)
```

--------------------------------

### Initialize ContextGem for JsonObjectConcept with Examples

Source: https://contextgem.dev/concepts/json_object_concept

Partial code snippet showing initial imports for using `JsonObjectConcept` and `JsonObjectExample` within ContextGem, typically for providing examples to improve extraction accuracy for complex schemas.

```python
# ContextGem: JsonObjectConcept Extraction with Examples

import os
from pprint import pprint

from contextgem import Document, DocumentLLM, JsonObjectConcept, JsonObjectExample
```

--------------------------------

### Sphinx Automodule Directive for ContextGem Examples API

Source: https://contextgem.dev/_sources/api/examples

This Sphinx `automodule` directive is used to automatically generate comprehensive API documentation for the `contextgem.public.examples` Python module. It includes all public and undocumented members, shows inheritance relationships, and excludes specific Pydantic model configuration attributes (`model_config`, `model_post_init`) to keep the documentation focused on core API functionality.

```APIDOC
.. automodule:: contextgem.public.examples
   :members:
   :undoc-members:
   :show-inheritance:
   :inherited-members:
   :exclude-members: model_config, model_post_init
```

--------------------------------

### Example of Optimizing LLM Extraction for Cost in Python

Source: https://contextgem.dev/optimizations/optimization_cost

This Python example demonstrates how to initialize a DocumentLLM instance with custom pricing details using `LLMPricing` to enable cost tracking. It shows how to retrieve and print the usage and cost details after performing extractions, allowing developers to monitor token consumption and overall expenses.

```Python
# Example of optimizing extraction for cost

import os

from contextgem import DocumentLLM, LLMPricing

llm = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
    pricing_details=LLMPricing(
        input_per_1m_tokens=0.150,
        output_per_1m_tokens=0.600,
    ),  # add pricing details to track costs
)

# ... use the LLM for extraction ...

# ... monitor usage and cost ...
usage = llm.get_usage()  # get the usage details, including tokens and calls' details.
cost = llm.get_cost()  # get the cost details, including input, output, and total costs.
print(usage)
print(cost)
```

--------------------------------

### LangChain: Initial Setup for Anomaly Extraction

Source: https://contextgem.dev/vs_other_frameworks

This Python snippet provides the initial imports and class definitions for implementing anomaly extraction using LangChain. It highlights the need for manual setup, including defining Pydantic models for structured output and importing various components for prompt engineering, output parsing, and runnable chains, which contrasts with ContextGem's more integrated approach.

```python
# LangChain implementation for extracting anomalies from a document, with source references and justifications

import os
from textwrap import dedent
from typing import Optional

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
```

--------------------------------

### Extraction Pipeline Example (LlamaIndex)

Source: https://contextgem.dev/_sources/vs_other_frameworks

Presents an extraction pipeline built with LlamaIndex, a robust data framework for LLM applications. This example demonstrates its capabilities while pointing out the manual effort required for complex extraction, such as crafting prompts, defining Pydantic models, configuring pipeline components, and custom solutions for fine-grained reference tracking, concurrency, and cost tracking.

```python
# See file: ../../dev/usage_examples/vs_other_frameworks/advanced/llama_index.py
```

--------------------------------

### ContextGem Advanced Extraction Pipeline Example

Source: https://contextgem.dev/vs_other_frameworks

This Python example demonstrates an advanced extraction workflow using ContextGem. It shows how to analyze multiple documents concurrently within a single pipeline, leveraging different LLMs, and includes built-in cost tracking. ContextGem handles boilerplate code automatically, simplifying complex LLM extraction tasks.

```Python
# Advanced Usage Example - analyzing multiple documents with a single pipeline,
# with different LLMs, concurrency and cost tracking

import os

from contextgem import (
    Aspect,
    DateConcept,
    Document,
    DocumentLLM,
    DocumentLLMGroup,
    DocumentPipeline,
    JsonObjectConcept,
    JsonObjectExample,
    LLMPricing,
    NumericalConcept,
    RatingConcept,
    RatingScale,
    StringConcept,
    StringExample,
)

# Construct documents

# Document 1 - Consultancy Agreement (shortened for brevity)
doc1 = Document(
    raw_text=(
        "Consultancy Agreement\n"
        "This agreement between Company A (Supplier) and Company B (Customer)...\n"
        "The term of the agreement is 1 year from the Effective Date...\n"
        "The Supplier shall provide consultancy services as described in Annex 2...\n"
        "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
        "All intellectual property created during the provision of services shall belong to the Customer...\n"
        "This agreement is governed by the laws of Norway...\n"
        "Annex 1: Data processing agreement...\n"
        "Annex 2: Statement of Work...\n"
        "Annex 3: Service Level Agreement...\n"
    ),
)
```

--------------------------------

### Define JSON Structure and Concept with ContextGem

Source: https://contextgem.dev/api/examples

This snippet demonstrates how to create a JsonObjectExample with sample data, define a Python class (PersonInfo) to represent the expected JSON structure, and then use JsonObjectConcept to associate the structure with the example for validation and conceptualization.

```python
json_example = JsonObjectExample(
    content={
        "name": "John Doe",
        "education": "Bachelor's degree in Computer Science",
        "skills": ["Python", "Machine Learning", "Data Analysis"],
        "hobbies": ["Reading", "Traveling", "Gaming"]
    }
)

# Define a structure for JSON object concept
class PersonInfo:
    name: str
    education: str
    skills: list[str]
    hobbies: list[str]

# Also works as a dict with type hints, e.g.
# PersonInfo = {
#     "name": str,
#     "education": str,
#     "skills": list[str],
#     "hobbies": list[str],
# }

# Attach JSON example to a JsonObjectConcept
json_concept = JsonObjectConcept(
    name="Candidate info",
    description="Structured information about a job candidate",
    structure=PersonInfo,  # Define the expected structure
    examples=[json_example]  # Attach the example to the concept (optional)
)
```

--------------------------------

### Python Example for Processing Documents with Concurrency and Cost Tracking

Source: https://contextgem.dev/vs_other_frameworks

This Python example demonstrates how to process multiple contract documents using a 'process_document' function with concurrency enabled. It initializes a 'CostTracker' to monitor API costs and then prints the detailed analysis results for each document, followed by a summary of the processing costs per model.

```python
# Example usage
# Sample contract texts (shortened for brevity)
doc1_text = (
    "Consultancy Agreement\n"
    "This agreement between Company A (Supplier) and Company B (Customer)...\n"
    "The term of the agreement is 1 year from the Effective Date...\n"
    "The Supplier shall provide consultancy services as described in Annex 2...\n"
    "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
    "All intellectual property created during the provision of services shall belong to the Customer...\n"
    "This agreement is governed by the laws of Norway...\n"
    "Annex 1: Data processing agreement...\n"
    "Annex 2: Statement of Work...\n"
    "Annex 3: Service Level Agreement...\n"
)

doc2_text = (
    "Service Level Agreement\n"
    "This agreement between TechCorp (Provider) and GlobalInc (Client)...\n"
    "The agreement shall commence on January 1, 2023 and continue for 2 years...\n"
    "The Provider shall deliver IT support services as outlined in Schedule A...\n"
    "The Client shall make monthly payments of $5,000 within 15 days of invoice receipt...\n"
    "The Provider guarantees [99.9%] uptime for all critical systems...\n"
    "Either party may terminate with 60 days written notice...\n"
    "This agreement is governed by the laws of California...\n"
    "Schedule A: Service Descriptions...\n"
    "Schedule B: Response Time Requirements...\n"
)

# Create cost tracker
cost_tracker = CostTracker()

# Process documents
print("Processing document 1 with concurrency...")
doc1_results = process_document(doc1_text, cost_tracker, use_concurrency=True)

print("Processing document 2 with concurrency...")
doc2_results = process_document(doc2_text, cost_tracker, use_concurrency=True)

# Print results
print_document_results("Document 1 (Consultancy Agreement)", doc1_results)
print_document_results("Document 2 (Service Level Agreement)", doc2_results)

# Print cost information
print("\nProcessing costs:")
costs = cost_tracker.get_costs()
for model, model_data in costs["model_costs"].items():
    print(f"\n{model}:")
    print(f"  Input cost: ${model_data['input_cost']:.4f}")
    print(f"  Output cost: ${model_data['output_cost']:.4f}")
    print(f"  Total cost: ${model_data['total_cost']:.4f}")
print(f"\nTotal across all models: ${costs['total_cost']:.4f}")
```

--------------------------------

### ContextGem: Advanced Multi-Document Extraction with LLM Pipelines

Source: https://contextgem.dev/advanced_usage

This advanced Python example demonstrates how to configure and use a `DocumentPipeline` for efficient data extraction from multiple documents. It highlights the use of `DocumentLLMGroup` for managing different LLMs, `LLMPricing` for cost tracking, and various concept types for defining extraction targets, enabling scalable and concurrent processing.

```python
# Advanced Usage Example - analyzing multiple documents with a single pipeline,
# with different LLMs, concurrency and cost tracking

import os

from contextgem import (
    Aspect,
    DateConcept,
    Document,
    DocumentLLM,
    DocumentLLMGroup,
    DocumentPipeline,
    JsonObjectConcept,
    JsonObjectExample,
    LLMPricing,
    NumericalConcept,
    RatingConcept,
    RatingScale,
    StringConcept,
    StringExample,
)

# Construct documents

```

--------------------------------

### API Documentation: StringExample.from_disk() Class Method

Source: https://contextgem.dev/api/examples

Documents the `from_disk` class method for `StringExample`, which loads an instance from a JSON file stored on disk. It specifies parameters, return type, and potential exceptions during file loading and deserialization.

```APIDOC
classmethod contextgem.public.examples.StringExample.from_disk(file_path)
  Description: Loads an instance of the class from a JSON file stored on disk. This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
  Parameters:
    file_path (str): Path to the JSON file to load (must end with ‘.json’).
  Returns: An instance of the class populated with the data from the file.
  Return type: Self
  Raises:
    ValueError: If the file path doesn’t end with ‘.json’.
    OSError: If there’s an error reading the file.
    RuntimeError: If deserialization fails.
```

--------------------------------

### API Documentation for StringExample Class Methods

Source: https://contextgem.dev/api/examples

Detailed API documentation for methods and properties of a class, likely `StringExample`, covering serialization, deserialization, and data transformation.

```APIDOC
classmethod from_json(json_string)
  Description: Creates an instance of the class from a JSON string representation. This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
  Parameters:
    json_string (str): JSON string containing the serialized object data.
  Returns: A new instance of the class with restored state. (Type: Self)
  Raises:
    TypeError: If the class name in the serialized data doesn’t match.
```

```APIDOC
to_dict()
  Description: Transforms the current object into a dictionary representation. Converts the object to a dictionary that includes: - All public attributes - Special handling for specific public and private attributes. When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
  Returns: A dictionary representation of the current object with all necessary data for serialization (Type: dict[str, Any])
```

```APIDOC
to_disk(file_path)
  Description: Saves the serialized instance to a JSON file at the specified path. This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
  Parameters:
    file_path (str): Path where the JSON file should be saved (must end with ‘.json’).
  Returns: None
  Raises:
    ValueError: If the file path doesn’t end with ‘.json’.
    IOError: If there’s an error during the file writing process.
```

```APIDOC
to_json()
  Description: Converts the object to its JSON string representation. Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
  Returns: A JSON string representation of the object. (Type: str)
```

```APIDOC
property unique_id (str)
  Description: Returns the ULID of the instance.
```

```APIDOC
custom_data (dict)
```

--------------------------------

### Initialize OpenAI Clients with Instructor Integration

Source: https://contextgem.dev/vs_other_frameworks

These utility functions provide synchronous and asynchronous methods to initialize OpenAI API clients. They integrate with the `instructor` library to enhance model capabilities, automatically retrieving the API key from environment variables if not provided.

```Python
def get_client(api_key=None):
    """Get an OpenAI client with instructor integrated"""
    api_key = api_key or os.environ.get("CONTEXTGEM_OPENAI_API_KEY", "")
    client = OpenAI(api_key=api_key)
    return instructor.from_openai(client)

async def get_async_client(api_key=None):
    """Get an AsyncOpenAI client with instructor integrated"""
    api_key = api_key or os.environ.get("CONTEXTGEM_OPENAI_API_KEY", "")
    client = AsyncOpenAI(api_key=api_key)
    return instructor.from_openai(client)
```

--------------------------------

### Example of optimizing extraction for accuracy in ContextGem

Source: https://contextgem.dev/optimizations/optimization_accuracy

This Python example demonstrates how to configure a ContextGem Document for improved extraction accuracy. It shows how to specify a larger SAT segmentation model, enable SAT-based paragraph segmentation, and define a StringConcept with justifications and examples to guide the LLM's extraction process.

```Python
# Example of optimizing extraction for accuracy

import os

from contextgem import Document, DocumentLLM, StringConcept, StringExample

# Define document
doc = Document(
    raw_text="Non-Disclosure Agreement...",
    sat_model_id="sat-6l-sm",  # default is "sat-3l-sm"
    paragraph_segmentation_mode="sat",  # default is "newlines"
    # sentence segmentation mode is always "sat", as other approaches proved to be less accurate
)

# Define document concepts
doc.concepts = [
    StringConcept(
        name="Title",  # A very simple concept, just an example for testing purposes
        description="Title of the document",
        add_justifications=True,  # enable justifications
        justification_depth="brief",  # default
        examples=[
            StringExample(
                content="Supplier Agreement",
            )
        ],
    ),
    # ... add other concepts ...
]

# ... attach other aspects/concepts to the document ...
```

--------------------------------

### Instructor Library Overview and Development Overhead

Source: https://contextgem.dev/vs_other_frameworks

This section provides an overview of the Instructor library, highlighting its focus on structured outputs from LLMs with Pydantic typing. It also details the development overheads associated with using Instructor, such as manual prompt engineering, model definition, pipeline assembly, and cost tracking setup.

```APIDOC
Instructor

Instructor is a powerful library focused on structured outputs from LLMs with strong typing support through Pydantic. It excels at extracting structured data with validation, but requires additional work to build complex extraction pipelines.

Development overhead:

* ⮚ Manual prompt engineering: Crafting comprehensive prompts that guide the LLM effectively
* ⚙ Manual model definition: Developers must define Pydantic validation models for structured output
* ⚔ Manual pipeline assembly: Requires custom code to connect extraction components involving multiple LLMs
* 🔍 Manual reference mapping: Must implement custom logic to track source references
* 📊 Embedding examples in prompts: Examples must be manually incorporated into prompts
* 🔄 Complex concurrency setup: Implementing concurrent processing requires additional setup with asyncio
* 💰 Cost tracking setup: Requires custom logic for cost tracking for each LLM
```

--------------------------------

### Anomaly Extraction with LlamaIndex RAG Setup

Source: https://contextgem.dev/_sources/vs_other_frameworks

This example illustrates anomaly extraction using LlamaIndex configured with a Retrieval-Augmented Generation (RAG) setup. While powerful for knowledge-intensive applications and complex document interactions, this approach requires more manual configuration and specialized setup for structured extraction tasks compared to streamlined alternatives.

```python
Code content for ../../dev/usage_examples/vs_other_frameworks/basic/llama_index_rag.py is not provided in the input text.
```

--------------------------------

### JsonObjectExample Class API Reference

Source: https://contextgem.dev/api/examples

Comprehensive API documentation for the JsonObjectExample class, detailing its constructor and methods for cloning, serialization, and deserialization from various formats.

```APIDOC
Class: JsonObjectExample

Constructor:
  __init__(**kwargs)
    Description: Create a new model by parsing and validating input data from keyword arguments.
    Raises: ValidationError if the input data cannot be validated to form a valid model.
    Notes: self is explicitly positional-only to allow self as a field name.
  Properties:
    content: dict[str, Any]

Method: clone()
  Description: Creates and returns a deep copy of the current instance.
  Returns: A deep copy of the current instance.
  Return Type: typing.Self

Method: from_dict(obj_dict: dict[str, Any]) (classmethod)
  Description: Reconstructs an instance of the class from a dictionary representation. This method deserializes a dictionary containing the object’s attributes and values into a new instance of the class. It handles complex nested structures like aspects, concepts, and extracted items, properly reconstructing each component.
  Parameters:
    obj_dict (dict[str, Any]): Dictionary containing the serialized object data.
  Returns: A new instance of the class with restored attributes.
  Return Type: Self

Method: from_disk(file_path: str) (classmethod)
  Description: Loads an instance of the class from a JSON file stored on disk. This method reads the JSON content from the specified file path and deserializes it into an instance of the class using the from_json method.
  Parameters:
    file_path (str): Path to the JSON file to load (must end with ‘.json’).
  Returns: An instance of the class populated with the data from the file.
  Return Type: Self
  Raises:
    ValueError: If the file path doesn’t end with ‘.json’.
    OSError: If there’s an error reading the file.
    RuntimeError: If deserialization fails.

Method: from_json(json_string: str) (classmethod)
  Description: Creates an instance of the class from a JSON string representation. This method deserializes the provided JSON string into a dictionary and uses the from_dict method to construct the class instance. It validates that the class name in the serialized data matches the current class.
  Parameters:
    json_string (str): JSON string containing the serialized object data.
  Returns: A new instance of the class with restored state.
  Return Type: Self
  Raises:
    TypeError: If the class name in the serialized data doesn’t match.

Method: to_dict()
  Description: Transforms the current object into a dictionary representation. Converts the object to a dictionary that includes all public attributes.
```

--------------------------------

### Define String Concept for Party Names and Roles

Source: https://contextgem.dev/api/concepts

This Python example demonstrates how to define a `StringConcept` using the `contextgem` library. It creates a concept named 'Party names and roles' to identify contractual parties and their roles, including an example to guide the extraction format.

```python
from contextgem import StringConcept, StringExample

# Define a string concept for identifying contract party names
# and their roles in the contract
party_names_and_roles_concept = StringConcept(
    name="Party names and roles",
    description=(
        "Names of all parties entering into the agreement "
        "and their contractual roles"
    ),
    examples=[
        StringExample(
            content="X (Client)",  # guidance regarding format
        )
    ],
)
```

--------------------------------

### Initialize Document Objects with Raw Text

Source: https://contextgem.dev/advanced_usage

Demonstrates how to create `Document` instances, populating them with raw text content from legal agreements like Consultancy Agreements and Service Level Agreements. This sets up the initial data for processing within the document pipeline.

```Python
doc1 = Document(
    raw_text=(
        "Consultancy Agreement\n"
        "This agreement between Company A (Supplier) and Company B (Customer)...\n"
        "The term of the agreement is 1 year from the Effective Date...\n"
        "The Supplier shall provide consultancy services as described in Annex 2...\n"
        "The Customer shall pay the Supplier within 30 calendar days of receiving an invoice...\n"
        "All intellectual property created during the provision of services shall belong to the Customer...\n"
        "This agreement is governed by the laws of Norway...\n"
        "Annex 1: Data processing agreement...\n"
        "Annex 2: Statement of Work...\n"
        "Annex 3: Service Level Agreement...\n"
    ),
)

doc2 = Document(
    raw_text=(
        "Service Level Agreement\n"
        "This agreement between TechCorp (Provider) and GlobalInc (Client)...\n"
        "The agreement shall commence on January 1, 2023 and continue for 2 years...\n"
        "The Provider shall deliver IT support services as outlined in Schedule A...\n"
        "The Client shall make monthly payments of $5,000 within 15 days of invoice receipt...\n"
        "The Provider guarantees [99.9%] uptime for all critical systems...\n"
        "Either party may terminate with 60 days written notice...\n"
        "This agreement is governed by the laws of California...\n"
        "Schedule A: Service Descriptions...\n"
        "Schedule B: Response Time Requirements...\n"
    ),
)
```

--------------------------------

### Extracting Product Rating with RatingConcept (Partial)

Source: https://contextgem.dev/concepts/rating_concept

This example demonstrates the initial setup for using RatingConcept to extract a product rating, showing the necessary imports from the ContextGem library.

```Python
# ContextGem: RatingConcept Extraction

import os

from contextgem import Document, DocumentLLM, RatingConcept, RatingScale
```

--------------------------------

### Define String Concepts for Termination Details

Source: https://contextgem.dev/advanced_usage

Defines three `StringConcept` objects: 'Termination for Cause', 'Notice Period', and 'Severance Package'. Each concept includes a description, optional examples to guide the LLM, and settings to add references at the sentence level, enabling precise extraction of specific clauses related to employment termination.

```python
termination_for_cause = StringConcept(
    name="Termination for Cause",
    description="Conditions under which the company can terminate the employee for cause.",
    examples=[  # optional, examples help the LLM to understand the concept better
        StringExample(content="Employee may be terminated for misconduct"),
        StringExample(content="Termination for breach of contract"),
    ],
    add_references=True,
    reference_depth="sentences",
)
notice_period = StringConcept(
    name="Notice Period",
    description="Required notification period before employment termination.",
    add_references=True,
    reference_depth="sentences",
)
severance_terms = StringConcept(
    name="Severance Package",
    description="Compensation and benefits provided upon termination.",
    add_references=True,
    reference_depth="sentences",
)
```

--------------------------------

### API: JsonObjectExample Class Definition

Source: https://contextgem.dev/_modules/contextgem/public/examples

Defines the JsonObjectExample class, a Pydantic model for representing JSON object examples used to guide LLM extraction tasks. It includes a 'content' field for the JSON-serializable dictionary (minimum length 1) and a validator to ensure its serializability. This class can be attached to a JsonObjectConcept.

```python
class JsonObjectExample(_Example):
    """
    Represents a JSON object example that can be provided by users for certain extraction tasks.

    :ivar content: A JSON-serializable dict with the minimum length of 1 that holds
        the content of the example.
    :type content: dict[str, Any]

    Note:
        Examples are optional and can be used to guide LLM extraction tasks. They serve as reference
        points for the model to understand the expected format and content of extracted information.
        JsonObjectExample can be attached to a :class:`~contextgem.public.concepts.JsonObjectConcept`.
    """

    content: dict[str, Any] = Field(default_factory=dict, min_length=1)

    @field_validator("content")
    @classmethod
    def _validate_content_serializable(cls, value: dict[str, Any]) -> dict[str, Any]:
        """
        Validates that the `content` field is serializable to JSON.

        :param value: The value of the `content` field to validate.
        :type value: dict[str, Any]
        :return: The validated `content` value.
        :rtype: dict[str, Any]
        :raises ValueError: If the `content` value is not serializable.
        """
        if not _is_json_serializable(value):
            raise ValueError(f"`content` must be JSON serializable.")
        return value
```

--------------------------------

### LlamaIndex: Structured Anomaly Extraction Program

Source: https://contextgem.dev/_sources/vs_other_frameworks

This LlamaIndex example demonstrates structured data extraction, specifically for anomalies, outside of a RAG setup. It necessitates manual definition of Pydantic models, explicit prompt construction, and the use of an output parser. While powerful for data indexing, it requires more manual setup for direct structured extraction tasks.

```python
from llama_index.core.program import LLMTextCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from llama_index.llms.openai import OpenAI

# 1. Define Pydantic model for structured output
class Anomaly(BaseModel):
    type: str = Field(description="Type of anomaly (e.g., symptom, observation)")
    description: str = Field(description="Detailed description of the anomaly")
    location: str = Field(description="Approximate location or context in the document")

class Anomalies(BaseModel):
    anomalies: list[Anomaly] = Field(description="List of anomalies found in the document")

# 2. Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo")

# 3. Initialize output parser with the Pydantic model
parser = PydanticOutputParser(output_cls=Anomalies)

# 4. Define the prompt template string
prompt_template_str = """
Extract all anomalies from the following document.
{format_instructions}
Document: {document_text}
"""

# 5. Create the LLMTextCompletionProgram
program = LLMTextCompletionProgram.from_defaults(
    output_parser=parser,
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

# 6. Define the document text
document_text = "The patient presented with unusual symptoms: high fever, persistent cough, and severe fatigue. No rash was observed."

# 7. Execute the program to extract anomalies
result = program(document_text=document_text)
print(result)
```

--------------------------------

### JsonObjectExample Class API Reference

Source: https://contextgem.dev/api/examples

Provides a comprehensive reference for the JsonObjectExample class, detailing its methods and properties for object serialization, file I/O, and unique identification.

```APIDOC
JsonObjectExample Class:
  to_dict()
    Description: Special handling for specific public and private attributes. When an LLM or LLM group is serialized, its API credentials and usage/cost stats are removed.
    Returns: A dictionary representation of the current object with all necessary data for serialization.
    Return Type: dict[str, Any]

  to_disk(file_path: str)
    Description: Saves the serialized instance to a JSON file at the specified path. This method converts the instance to a dictionary representation using to_dict(), then writes it to disk as a formatted JSON file with UTF-8 encoding.
    Parameters:
      file_path (str): Path where the JSON file should be saved (must end with ‘.json’).
    Returns: None
    Return Type: None
    Raises:
      ValueError: If the file path doesn’t end with ‘.json’.
      IOError: If there’s an error during the file writing process.

  to_json()
    Description: Converts the object to its JSON string representation. Serializes the object into a JSON-formatted string using the dictionary representation provided by the to_dict() method.
    Returns: A JSON string representation of the object.
    Return Type: str

  Property: unique_id
    Type: str
    Description: Returns the ULID of the instance.

  Property: custom_data
    Type: dict
    Description: Custom data associated with the instance.
```

--------------------------------

### Optimizing ContextGem Extraction for Speed with Concurrency and Fallback LLM

Source: https://contextgem.dev/optimizations/optimization_speed

This Python example demonstrates how to configure a ContextGem DocumentLLM for speed optimization. It shows the setup of an AsyncLimiter for concurrent processing, the inclusion of a fallback LLM to handle rate limits, and the use of the `extract_all` method with concurrency enabled. This setup helps manage API call rates and ensures robustness for faster extractions.

```Python
# Example of optimizing extraction for speed

import os

from aiolimiter import AsyncLimiter

from contextgem import Document, DocumentLLM

# Define document
document = Document(
    raw_text="document_text",
    # aspects=[Aspect(...), ...],
    # concepts=[Concept(...), ...],
)

# Define LLM with a fallback model
llm = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
    async_limiter=AsyncLimiter(
        10, 5
    ),  # e.g. 10 acquisitions per 5-second period; adjust to your LLM API setup
    fallback_llm=DocumentLLM(
        model="openai/gpt-3.5-turbo",
        api_key=os.environ.get("CONTEXTGEM_OPENAI_API_KEY"),
        is_fallback=True,
        async_limiter=AsyncLimiter(
            20, 5
        ),  # e.g. 20 acquisitions per 5-second period; adjust to your LLM API setup
    ),
)

# Use the LLM for extraction with concurrency enabled
llm.extract_all(document, use_concurrency=True)

# ... use the extracted data ...
```

--------------------------------

### Interacting with LLMs via ContextGem's Lightweight Chat Interface

Source: https://contextgem.dev/_sources/quickstart

This Python snippet showcases ContextGem's unified interface for natural language interaction with Large Language Models. It demonstrates both text-based and vision-based chat functionalities, highlighting its built-in fallback support for robust LLM communication.

```python
from contextgem import ContextGem

# Initialize ContextGem for LLM chat interface
gem = ContextGem(api_key="YOUR_API_KEY")

# Example: Simple text-based chat
response_text = gem.chat(prompt="What is the capital of France?")
print("Text Chat Response:", response_text)

# Example: Vision-based chat (e.g., asking about an image)
# Assume 'image_of_eiffel.jpg' is a path to an image file
image_path = "path/to/image_of_eiffel.jpg"
response_vision = gem.chat(prompt="Describe this image.", image_path=image_path)
print("Vision Chat Response:", response_vision)

# Example: Chat with built-in fallback support
# (ContextGem handles model fallbacks internally)
response_fallback = gem.chat(prompt="Tell me a short story about a robot.", model="gpt-4o", fallback_model="gpt-3.5-turbo")
print("Fallback Chat Response:", response_fallback)
```

--------------------------------

### Setting Up LLM Cost Tracking in ContextGem

Source: https://contextgem.dev/llms/llm_config

Shows how to configure pricing details for a DocumentLLM using LLMPricing to track input and output token costs, and how to retrieve the total cost.

```python
from contextgem import DocumentLLM, LLMPricing

llm = DocumentLLM(
    model="openai/gpt-4o-mini",
    api_key="<your-openai-api-key>",
    pricing_details=LLMPricing(
        input_per_1m_tokens=0.150,  # Cost per 1M input tokens
        output_per_1m_tokens=0.600,  # Cost per 1M output tokens
    ),
)

# Perform some extraction tasks

# Later, you can check the cost
cost_info = llm.get_cost()
```

--------------------------------

### Python: Defining Document and Concepts for Extraction with ContextGem

Source: https://contextgem.dev/advanced_usage

This Python example demonstrates how to define and prepare document-level concepts for extraction using the `contextgem` library. It initializes a `Document` object with sample text and defines various concept types such as `BooleanConcept`, `DateConcept`, and `StringConcept`, specifying their names, descriptions, and optional properties like `singular_occurrence` and `add_references`. This setup is a prerequisite for extracting structured information from documents.

```Python
# Advanced Usage Example - Extracting aspects and concepts from a document, with references,
# using concurrency

import os

from aiolimiter import AsyncLimiter

from contextgem import (
    Aspect,
    BooleanConcept,
    DateConcept,
    Document,
    DocumentLLM,
    JsonObjectConcept,
    StringConcept,
)

# Example privacy policy document (shortened for brevity)
doc = Document(
    raw_text=(
        "Privacy Policy\n\n"
        "Last Updated: March 15, 2024\n\n"
        "1. Data Collection\n"
        "We collect various types of information from our users, including:\n"
        "- Personal information (name, email address, phone number)\n"
        "- Device information (IP address, browser type, operating system)\n"
        "- Usage data (pages visited, time spent on site)\n"
        "- Location data (with your consent)\n\n"
        "2. Data Usage\n"
        "We use your information to:\n"
        "- Provide and improve our services\n"
        "- Send you marketing communications (if you opt-in)\n"
        "- Analyze website performance\n"
        "- Comply with legal obligations\n\n"
        "3. Data Sharing\n"
        "We may share your information with:\n"
        "- Service providers (for processing payments and analytics)\n"
        "- Law enforcement (when legally required)\n"
        "- Business partners (with your explicit consent)\n\n"
        "4. Data Retention\n"
        "We retain personal data for 24 months after your last interaction with our services. "
        "Analytics data is kept for 36 months.\n\n"
        "5. User Rights\n"
        "You have the right to:\n"
        "- Access your personal data\n"
        "- Request data deletion\n"
        "- Opt-out of marketing communications\n"
        "- Lodge a complaint with supervisory authorities\n\n"
        "6. Contact Information\n"
        "For privacy-related inquiries, contact our Data Protection Officer at privacy@example.com\n"
    ),
)

# Define all document-level concepts in a single declaration
document_concepts = [
    BooleanConcept(
        name="Is Privacy Policy",
        description="Verify if this document is a privacy policy",
        singular_occurrence=True,  # explicitly enforce singular extracted item (optional)
    ),
    DateConcept(
        name="Last Updated Date",
        description="The date when the privacy policy was last updated",
        singular_occurrence=True,  # explicitly enforce singular extracted item (optional)
    ),
    StringConcept(
        name="Contact Information",
        description="Contact details for privacy-related inquiries",
        add_references=True,
        reference_depth="sentences",
    ),
]
```