### Installation

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Install the RavenPack API Python client using pip.

```APIDOC
## Installation

```bash
pip install ravenpackapi
```
```

--------------------------------

### Manage RavenPack Datasets

Source: https://context7.com/ravenpack/python-api/llms.txt

Provides examples for retrieving, listing, updating, and deleting datasets using the RPApi client. Demonstrates how to get a dataset by ID, list private or public datasets, filter datasets by tags, modify dataset properties, and save or delete changes.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

# Get an existing dataset by ID
ds = api.get_dataset(dataset_id="us30")  # US 30 is a public dataset
print(f"Dataset: {ds.name}, Frequency: {ds.frequency}")
# Output: Dataset: US 30, Frequency: granular

# List all private datasets
datasets = api.list_datasets(scope="private")
for dataset in datasets:
    print(f"{dataset.id}: {dataset.name}")

# List public datasets
public_datasets = api.list_datasets(scope="public")

# List datasets by tag
tagged_datasets = api.list_datasets(tags="production")

# Update a dataset
ds = api.get_dataset("YOUR_DATASET_ID")
ds.name = "Updated Dataset Name"
ds.filters["relevance"] = {"$gte": 95}
ds.save()

# Delete a dataset
ds.delete()
```

--------------------------------

### GET /realtime

Source: https://context7.com/ravenpack/python-api/llms.txt

Establishes a persistent connection to receive real-time news analytics as they are published.

```APIDOC
## GET /realtime

### Description
Subscribes to a real-time stream of news analytics for a specific dataset.

### Method
GET

### Endpoint
/realtime

### Response
#### Success Response (200)
- **stream** (event-stream) - Continuous stream of JSON objects containing news analytics.
```

--------------------------------

### Feature: Product Aware Instance for Edge

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Enables accessing the 'edge' product by simply instantiating the API with the `product='edge'` argument. This simplifies the setup for using Edge-specific features.

```python
api = RPApi(product="edge")
```

--------------------------------

### Create RavenPack Dataset

Source: https://context7.com/ravenpack/python-api/llms.txt

Shows how to create a new dataset using the Dataset class and the RPApi client. Covers specifying dataset name, description, filters, fields, frequency, and tags for RavenPack Analytics. Also includes an example for creating a dataset for the RavenPack Edge product.

```python
from ravenpackapi import RPApi, Dataset

api = RPApi(api_key="YOUR_API_KEY")

# Create a dataset with filters for RavenPack Analytics
ds = api.create_dataset(
    Dataset(
        name="High Relevance US Companies",
        description="News analytics for high-relevance US company mentions",
        filters={
            "relevance": {"$gte": 90},
            "country_code": "US",
            "entity_type": "COMP"
        },
        fields=["timestamp_utc", "rp_entity_id", "entity_name",
                "event_relevance", "event_sentiment_score"],
        frequency="granular",
        tags=["production", "us-equities"]
    )
)

print(f"Created dataset: {ds.id}")
# Output: Created dataset: abc123-def456-ghi789

# For RavenPack Edge product
api_edge = RPApi(api_key="YOUR_API_KEY", product="edge")
ds_edge = api_edge.create_dataset(
    Dataset(
        name="Edge Dataset",
        filters={"entity_relevance": {"$gte": 90}},
        product="edge"
    )
)
```

--------------------------------

### Get Entity Mapping (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Demonstrates how to use the `get_entity_mapping` method to find RP_ENTITY_ID for a given universe of entities. The universe can include entity names, tickers, or detailed dictionaries with various identifiers. The result provides matched and unmatched entities.

```python
universe = [
    "RavenPack",
    {'ticker': 'AAPL'},
    {
        "client_id": "12345-A",
        "date": "2017-01-01",
        "name": "Amazon Inc.",
        "entity_type": "COMP",
        "isin": "US0231351067",
        "cusip": "023135106",
        "sedol": "B58WM62",
        "listing": "XNAS:AMZN"
    },
]

mapping = api.get_entity_mapping(universe)

print(f"Matched entities: {len(mapping.matched)}")
for m in mapping.matched:
    print(f"- {m.name}")

print(f"Unmatched entities: {len(mapping.unmatched)}")
```

--------------------------------

### Fix: Create Edge Dataset with Product Attribute

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Resolves an issue where creating an Edge dataset failed if the product attribute was not explicitly set in the Dataset object, even when specified in the RPApi. This example shows the correct way to initialize RPApi and create a dataset.

```python
from ravenpackapi import RPApi, Dataset

api = RPApi(api_key="YOUR_API_KEY", product="edge")

ds = api.create_dataset(
    Dataset(
        name="New Dataset",
        filters={"entity_relevance": {"$gte": 90}},
    )
)
```

--------------------------------

### Get Dataset Definition (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Illustrates how to retrieve the definition of a pre-existing dataset using its ID. It highlights that dataset IDs differ between RPA and EDGE (e.g., 'us30' for RPA vs. 'us30-edge' for EDGE).

```python
# For RPA, use 'us30'
# For EDGE, use 'us30-edge'
dataset_id = 'us30-edge' # or 'us30'

ds = api.get_dataset(dataset_id=dataset_id)
print(f"Dataset definition retrieved: {ds}")
```

--------------------------------

### Feature: Store Entity Mapping Data in Memory

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Introduces a flag to store entity mapping data in memory for the 'edge' product, which can improve performance but should be used with caution. This example demonstrates how to enable this feature and iterate through entities.

```python
eref = api.get_entity_type_reference(entity_type, "full", file_date)
eref.store_in_memory = True
for entity in eref:
    print(entity)
```

--------------------------------

### Download Data via JSON Endpoint (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Demonstrates downloading data synchronously using the `json` method of a Dataset object. This method is optimized for small requests. It specifies start and end dates and iterates through the returned records. Limitations apply to the number of records and entities for granular and indicator datasets.

```python
start_date = '2018-01-05 18:00:00'
end_date = '2018-01-05 18:01:00'

data = ds.json(start_date=start_date, end_date=end_date)

for record in data:
    print(record)
```

--------------------------------

### Get Entity Reference Information (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Shows how to retrieve detailed information for an entity using its RP_ENTITY_ID with the `get_entity_reference` method. This includes historical names, valid tickers, and other reference data associated with the entity.

```python
ALPHABET_RP_ENTITY_ID = '4A6F00'

references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)

print("Entity Names over History:")
for name in references.names:
    print(f"- {name.value} (Valid from: {name.start} to {name.end})")

print("Tickers Valid Today:")
for ticker in references.tickers:
    if ticker.is_valid():
        print(f"- {ticker}")
```

--------------------------------

### GET /jobs

Source: https://context7.com/ravenpack/python-api/llms.txt

Manages asynchronous jobs, allowing users to check status, wait for completion, or cancel pending requests.

```APIDOC
## GET /jobs/{token}

### Description
Retrieves the current status of an asynchronous job and provides access to the result URL once completed.

### Method
GET

### Endpoint
/jobs/{token}

### Parameters
#### Path Parameters
- **token** (string) - Required - The unique job token.

### Response
#### Success Response (200)
- **status** (string) - Current status (enqueued, processing, completed, error).
- **url** (string) - Download URL for the result file.
```

--------------------------------

### Getting Dataset Information

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Retrieve the definition of a pre-existing dataset using its ID.

```APIDOC
## Getting Dataset Information

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

# Example dataset IDs
rpa_dataset_id = 'us30'      # For RPA
edge_dataset_id = 'us30-edge' # For EDGE

# Get dataset description
selected_dataset_id = rpa_dataset_id # or edge_dataset_id
ds = api.get_dataset(dataset_id=selected_dataset_id)
print(f"Dataset details: {ds}")
```
```

--------------------------------

### Upload File to RavenPack API

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Uploads a file to the RavenPack Annotations platform for analysis. The function `api.upload.file()` initiates the upload process. Further options are detailed in the user guide.

```python
f = api.upload.file("_orig.doc")
```

--------------------------------

### Feature: Entity-Type-Reference with Past Dates and Deltas

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Adds support for retrieving entity references from past dates and allows specifying `reference_type='delta'` to get only daily changes for Edge. This enhances historical data analysis capabilities.

```python
api.get_entity_type_reference(entity_type, reference_type="delta")
```

--------------------------------

### Initialize RavenPack API Client (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Demonstrates how to instantiate the RPApi client, either by setting the RP_API_KEY environment variable or by passing the API key directly. It also shows how to configure the client for RavenPack EDGE.

```python
from ravenpackapi import RPApi

# Using API key directly
api = RPApi(api_key="YOUR_API_KEY")

# For RavenPack EDGE
api_edge = RPApi(api_key="YOUR_API_KEY", product="edge")
```

--------------------------------

### Initialize RavenPack API Client

Source: https://context7.com/ravenpack/python-api/llms.txt

Demonstrates how to initialize the RPApi client, including authentication methods (API key or environment variable), specifying the product (RPA or Edge), and configuring common request parameters like proxies, timeouts, and SSL verification. Also shows how to check the API connection status.

```python
from ravenpackapi import RPApi

# Initialize with explicit API key
api = RPApi(api_key="YOUR_API_KEY")

# Or use environment variable RP_API_KEY
api = RPApi()

# For RavenPack Edge product
api = RPApi(api_key="YOUR_API_KEY", product="edge")

# Configure request parameters (proxy, timeouts, SSL verification)
api.common_request_params.update({
    "proxies": {"https": "http://your_internal_proxy:9999"},
    "timeout": (20, 120),  # connect, read timeouts
    "verify": False  # disable certificate verification
})

# Check API connection status
status = api.get_status()
print(status)
# Output: {'datasets': 50, 'remaining_datafile_bytes': 100000000000}
```

--------------------------------

### Creating a New Dataset

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Demonstrates how to create a new dataset using the `create_dataset` method with a `Dataset` instance.

```APIDOC
## Creating a New Dataset

```python
from ravenpackapi import Dataset, RPApi

api = RPApi(api_key="YOUR_API_KEY")

# Define relevance filter (use 'entity_relevance' for EDGE)
entity_relevance = "relevance" # or "entity_relevance" for EDGE

dataset = Dataset(
    name="My New Dataset",
    filters={
        entity_relevance: {
            "$gte": 90
        }
    },
)

created_dataset = api.create_dataset(dataset)
print(f"Dataset created: {created_dataset}")
```
```

--------------------------------

### Organize Upload Folders

Source: https://context7.com/ravenpack/python-api/llms.txt

Shows how to list, create, update, and delete folders for batch processing of uploaded documents.

```python
from ravenpackapi import RPApi
api = RPApi(api_key="YOUR_API_KEY")
for f in api.upload.list(start_date="2024-01-01", end_date="2024-01-31", status="COMPLETED"):
    print(f"{f.file_id}: {f.file_name} - {f.status}")
folder = api.upload.folder_create(folder_name="Q1 Reports", starred=True)
uploaded = api.upload.file("report.pdf", folder=folder)
for folder in api.upload.list_folders():
    print(f"{folder.folder_id}: {folder.folder_name}")
folder.folder_name = "2024 Q1 Reports"
folder.save()
quota = api.upload.quota()
print(f"Used: {quota['used_bytes']}, Limit: {quota['limit_bytes']}")
folder.delete()
```

--------------------------------

### Create and Request Datafile with Timezone - Python

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Demonstrates how to create a custom dataset, save it, and request a datafile with specific date ranges, compression, and timezone settings. This functionality requires an active API connection and dataset configuration.

```python
custom_dataset = Dataset(
    api=api,
    name="Us30 indicators",
    filters=us30.filters,
    fields=new_fields,
    frequency='daily'
)
custom_dataset.save()
print(custom_dataset)
job = custom_dataset.request_datafile(
    start_date='2017-01-01 19:30',
    end_date='2017-01-02 19:30',
    compressed=True,
    time_zone='Europe/London',
)
```

--------------------------------

### Create a New Dataset with Filters (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Shows how to create a new dataset using the `create_dataset` method. It involves defining a Dataset object with a name and a filter, such as relevance greater than or equal to 90. The appropriate filter key ('relevance' for RPA, 'entity_relevance' for EDGE) should be used.

```python
from ravenpackapi import Dataset

# For RPA, use 'relevance'
# For EDGE, use 'entity_relevance'
entity_relevance_filter = "entity_relevance" # or "relevance"

dataset_name = "New Dataset"

data_filter = {
    entity_relevance_filter: {
        "$gte": 90
    }
}

ds = api.create_dataset(Dataset(name=dataset_name, filters=data_filter))
print(f"Dataset created: {ds}")
```

--------------------------------

### Count dataset records using Python

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Demonstrates how to retrieve a dataset and count the number of records within a specific time range.

```python
dataset = api.get_dataset('us30')
data_count = ds.count(
    start_date='2018-01-05 18:00:00',
    end_date='2018-01-05 18:01:00',
)
# {'count': 11, 'stories': 10, 'entities': 6}
```

--------------------------------

### Initialize RavenPack API Client

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Instantiates the RPApi object using an API key. This is the entry point for all subsequent API interactions.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")
```

--------------------------------

### Process data in chunks using time intervals

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Illustrates how to split a large date range into smaller intervals (e.g., daily) to download data files in chunks.

```python
from ravenpackapi.util import (
  SPLIT_YEARLY,
  SPLIT_MONTHLY,
  SPLIT_WEEKLY,
  SPLIT_DAILY,
  time_intervals
)
split = SPLIT_DAILY
for range_start, range_end in time_intervals(start_date, end_date, split=split):
    job = ds.request_datafile(
        start_date=range_start,
        end_date=range_end,
        compressed=GET_COMPRESSED,
    )
```

--------------------------------

### Request Datafile for Large Downloads (Python)

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Shows how to request a data file asynchronously for larger data downloads using the `request_datafile` method. This returns a job object that prepares the file on the server. The downloaded file can then be saved using `save_to_file`.

```python
start_date = '2018-01-05 18:00:00'
end_date = '2018-01-05 18:01:00'

job = ds.request_datafile(start_date=start_date, end_date=end_date)

# The job will take time to complete. You can monitor its status.
# Once ready, save it to a file:
output_filename = 'output.csv'
with open(output_filename, 'wb') as fp:
    job.save_to_file(filename=fp.name)
print(f"Datafile requested and saved to {output_filename}")
```

--------------------------------

### Download Data Asynchronously

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Requests a large datafile to be prepared on the server. Returns a job object that can be saved to a file once complete.

```python
job = ds.request_datafile(
    start_date='2018-01-05 18:00:00',
    end_date='2018-01-05 18:01:00',
)

with open('output.csv') as fp:
    job.save_to_file(filename=fp.name)
```

--------------------------------

### Download Bulk Entity Type Reference

Source: https://context7.com/ravenpack/python-api/llms.txt

Downloads a full reference file for all entities of a specific type. Supports streaming to file, historical snapshots for Edge users, and loading from local files.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

eref = api.get_entity_type_reference(entity_type="COMP", reference_type="full")
eref.write_to_file("companies_reference.csv")

api_edge = RPApi(api_key="YOUR_API_KEY", product="edge")
eref_historical = api_edge.get_entity_type_reference(entity_type="COMP", reference_type="full", date="2023-06-01")
```

--------------------------------

### Create a New Dataset

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Creates a new dataset definition on the server with specific filtering criteria. Requires a Dataset object instance.

```python
from ravenpackapi import Dataset

ds = api.create_dataset(
    Dataset(
        name="New Dataset",
        filters={
            "relevance": {
                "$gte": 90
            }
        },
    )
)
print("Dataset created", ds)
```

--------------------------------

### Downloading Data - Datafile Endpoint

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Use the `request_datafile` method for asynchronous preparation and retrieval of large data chunks.

```APIDOC
## Downloading Data - Datafile Endpoint

This method prepares a datafile asynchronously on the server.

```python
# Assuming 'ds' is a Dataset object obtained from api.get_dataset()

job = ds.request_datafile(
    start_date='2018-01-05 18:00:00',
    end_date='2018-01-05 18:01:00',
)

# The job will take time to complete. You can then save it to a file.
# For example, saving to 'output.csv':
# job.save_to_file(filename='output.csv')

print(f"Datafile request submitted. Job ID: {job.job_id}")
# To retrieve the data, you would typically poll the job status and then download.
```
```

--------------------------------

### Download Data via JSON Endpoint

Source: https://context7.com/ravenpack/python-api/llms.txt

Illustrates how to download data synchronously from a dataset using the JSON endpoint. This method is suitable for smaller requests and shows how to specify date ranges and time zones, and then iterate through the returned analytics records.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

# Get dataset and query data synchronously
ds = api.get_dataset(dataset_id="us30")
data = ds.json(
    start_date="2024-01-05 18:00:00",
    end_date="2024-01-05 18:30:00",
    time_zone="America/New_York"
)

# Iterate through analytics records
for record in data:
    print(f"{record.timestamp_utc} | {record.entity_name} | "
          f"Sentiment: {record.event_sentiment_score}")
# Output: 2024-01-05 18:05:23 | Apple Inc. | Sentiment: 0.85
```

--------------------------------

### Download Bulk Datafiles

Source: https://context7.com/ravenpack/python-api/llms.txt

Handles large asynchronous data requests by queuing jobs on RavenPack servers. This snippet demonstrates saving results to a file, iterating through rows without disk storage, and chunking large date ranges.

```python
job = ds.request_datafile(start_date="2024-01-01", end_date="2024-01-31", output_format="csv", compressed=True)
job.save_to_file(filename="analytics_jan2024.zip")

for row in job.iterate_results(include_headers=False):
    print(f"{row[0]}: {row[2]}")

for range_start, range_end in time_intervals("2024-01-01", "2024-03-31", split=SPLIT_WEEKLY):
    job = ds.request_datafile(start_date=range_start, end_date=range_end, compressed=True)
    if job: job.save_to_file(filename=f"data-{range_start.strftime('%Y-%m-%d')}.zip")
```

--------------------------------

### POST /datafile

Source: https://context7.com/ravenpack/python-api/llms.txt

Requests a bulk datafile for historical data analysis, supporting asynchronous processing and various output formats.

```APIDOC
## POST /datafile

### Description
Queues a job on RavenPack servers to generate a bulk datafile. This is an asynchronous operation returning a job token.

### Method
POST

### Endpoint
/datafile

### Parameters
#### Request Body
- **start_date** (string) - Required - Start date for the data range.
- **end_date** (string) - Required - End date for the data range.
- **output_format** (string) - Optional - Format of the output (e.g., 'csv').
- **compressed** (boolean) - Optional - Whether to return a compressed file.

### Request Example
{
  "start_date": "2024-01-01",
  "end_date": "2024-01-31",
  "output_format": "csv",
  "compressed": true
}

### Response
#### Success Response (200)
- **job_token** (string) - Unique identifier for the asynchronous job.
```

--------------------------------

### Run Acceptance Tests (Bash)

Source: https://github.com/ravenpack/python-api/blob/master/ravenpackapi/tests/README.md

Executes the acceptance tests for the RavenPack Python API. These tests interact with the actual API and require a valid API key.

```bash
RP_API_KEY="XXXX" pytest ravenpackapi/tests/acceptance
```

--------------------------------

### Manage Uploaded Documents

Source: https://context7.com/ravenpack/python-api/llms.txt

Demonstrates how to save analytics, extract text, manage metadata, and delete uploaded files using the RavenPack API.

```python
uploaded_file.save_analytics("analytics.json", output_format="application/json")
uploaded_file.save_analytics("analytics.csv", output_format="text/csv")
analytics = uploaded_file.get_analytics(output_format="application/json")
for record in analytics:
    print(f"Entity: {record['entity_name']}, Sentiment: {record['sentiment']}")
uploaded_file.save_annotated("annotated.json", output_format="application/json")
uploaded_file.save_text_extraction("extracted.json", output_format="application/json")
uploaded_file.save_original("original_backup.pdf")
metadata = uploaded_file.get_metadata()
print(f"File: {metadata['file_name']}, Size: {metadata['raw_size']}")
uploaded_file.set_metadata(tags=["quarterly-report", "2024-Q1"], starred=True)
uploaded_file.delete()
```

--------------------------------

### Handle API and Connection Errors in Python

Source: https://context7.com/ravenpack/python-api/llms.txt

Demonstrates how to gracefully handle various exceptions that may occur when interacting with the RavenPack API, including general API errors, connection issues, and specific job-related failures like timeouts or processing errors. This ensures robust application behavior.

```python
from ravenpackapi import RPApi, ApiConnectionError
from ravenpackapi.exceptions import (
    APIException,
    DataFileTimeout,
    JobNotProcessing
)

api = RPApi(api_key="YOUR_API_KEY")

try:
    # API operations
    ds = api.get_dataset("invalid-id")
    data = ds.json(start_date="2024-01-01", end_date="2024-01-02")

except APIException as e:
    # General API errors (400, 401, 403, 404, 500, etc.)
    print(f"API Error: {e}")
    print(f"Response: {e.response.text}")

except ApiConnectionError as e:
    # Network connectivity issues
    print(f"Connection failed: {e}")

try:
    job = ds.request_datafile(
        start_date="2024-01-01",
        end_date="2024-12-31"
    )
    job.wait_for_completion(timeout_seconds=300)

except DataFileTimeout:
    print("Job timed out - try a smaller date range")

except JobNotProcessing as e:
    print(f"Job failed with status: {e.status}")
```

--------------------------------

### Map Entities to RavenPack IDs

Source: https://context7.com/ravenpack/python-api/llms.txt

Demonstrates how to map a list of entities using various identifiers like tickers, ISINs, or names to RavenPack Entity IDs. It includes handling matched entities and processing errors for non-matching inputs.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

universe = [
    "RavenPack",
    {"ticker": "AAPL"},
    {"ticker": "JPM", "name": "JPMorgan"},
    {"listing": "XNYS:DVN"},
    {
        "client_id": "my-id-12345",
        "date": "2024-01-01",
        "name": "Amazon Inc.",
        "entity_type": "COMP",
        "isin": "US0231351067",
        "cusip": "023135106",
        "sedol": "B58WM62",
        "listing": "XNAS:AMZN"
    },
    {"isin": "US88339J1051", "name": "TRADE DESK INC/THE -CLASS A"}
]

mapping = api.get_entity_mapping(universe)

print(f"Matched: {len(mapping.matched)}/{len(universe)}")
for match in mapping.matched:
    print(f"  {match.id}: {match.name} ({match.type}) - Score: {match.score}")

for error in mapping.errors:
    print(f"Error matching: {error.request}")
```

--------------------------------

### Fix: Download Taxonomy Reference Files

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Fixes a crash when attempting to retrieve and save 'occupations-taxonomy' and 'jobs-taxonomy' reference files. This code demonstrates how to use the `get_entity_type_reference` method and write the output to a CSV file.

```python
from ravenpackapi import RPApi
api = RPApi(product="edge")

occupations = api.get_entity_type_reference("occupations-taxonomy")
occupations.write_to_file("occupations-reference.csv")

jobs = api.get_entity_type_reference("jobs-taxonomy")
jobs.write_to_file("jobs-reference.csv")
```

--------------------------------

### Configure Common Request Parameters in Python

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

This snippet demonstrates how to update the `common_request_params` attribute of the RPApi object to set global request parameters. It shows how to configure proxy settings and disable SSL certificate verification for all subsequent API requests made using the wrapper. This is useful for environments with internal proxies or when dealing with self-signed certificates.

```python
from ravenpack import RPApi

api = RPApi()
api.common_request_params.update(
    dict(
        proxies={'https': 'http://your_internal_proxy:9999'},
        verify=False,
    )
)

# use the api to do requests
```

--------------------------------

### Download Data Synchronously

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Downloads small chunks of data synchronously using the JSON endpoint. Best suited for limited record sets.

```python
data = ds.json(
    start_date='2018-01-05 18:00:00',
    end_date='2018-01-05 18:01:00',
)

for record in data:
    print(record)
```

--------------------------------

### Downloading Data - JSON Endpoint

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Use the JSON endpoint for synchronous data retrieval, optimized for smaller requests.

```APIDOC
## Downloading Data - JSON Endpoint

*Note: Limited to 10,000 records for granular datasets and 500 entities/1 year for indicator datasets.*

```python
# Assuming 'ds' is a Dataset object obtained from api.get_dataset()

data = ds.json(
    start_date='2018-01-05 18:00:00',
    end_date='2018-01-05 18:01:00',
)

for record in data:
    print(record)
```
```

--------------------------------

### Run Unit Tests (Bash)

Source: https://github.com/ravenpack/python-api/blob/master/ravenpackapi/tests/README.md

Executes the unit tests for the RavenPack Python API. These tests are isolated and do not require an active API key.

```bash
RP_API_KEY="" pytest ravenpackapi/tests/unit
```

--------------------------------

### Download Historical Flat Files

Source: https://context7.com/ravenpack/python-api/llms.txt

Retrieves pre-generated historical analytics data files, supporting both direct downloads and streaming for custom processing.

```python
from ravenpackapi import RPApi
api = RPApi(api_key="YOUR_API_KEY")
files = api.get_flatfile_list("companies")
for f in files["files"]:
    print(f"{f['id']}: {f['date']}")
api.save_flatfile(flatfile_type="companies", flatfile_id="companies-2024-01.zip", output_filename="./data/companies-2024-01.zip", overwrite=True)
response = api.get_flatfile("companies", "companies-2024-01.zip")
with open("output.zip", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)
```

--------------------------------

### Configure Low-Level API Requests

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Configures common request parameters for the RavenPack API wrapper, which utilizes the 'requests' library. This includes setting proxies, timeouts, and disabling SSL certificate verification.

```python
api = RPApi()
api.common_request_params.update(
    dict(
        proxies={'https': 'http://your_internal_proxy:9999'},
        timeout=(20, 120), # connect, read timeouts
        verify=False, # disable certificate verification
    )
)

# use the api to do requests
```

--------------------------------

### Manage Asynchronous Jobs

Source: https://context7.com/ravenpack/python-api/llms.txt

Provides methods to monitor, refresh, and cancel background datafile requests. It includes error handling for timeouts and job failures, as well as listing historical jobs.

```python
job = ds.request_datafile(start_date="2024-01-01", end_date="2024-01-31")
job.get_status()
try:
    job.wait_for_completion(timeout_seconds=600)
    print(f"Job completed: {job.url}")
except DataFileTimeout:
    print("Job timed out")

pending_job = ds.request_datafile(start_date="2024-01-01", end_date="2024-12-31")
pending_job.cancel()
```

--------------------------------

### Access Key Events API

Source: https://context7.com/ravenpack/python-api/llms.txt

Provides methods to list and download insider transaction archives and earnings date files.

```python
from ravenpackapi import RPApi
api = RPApi(api_key="YOUR_API_KEY")
yearly_files = api.insider_trasactions.list_yearly_archives()
api.insider_trasactions.download_file(file_id="insider-transactions-2024-01-15.csv", filename="insider_jan15.csv")
earnings_yearly = api.earnings_dates.list_yearly_archives()
api.earnings_dates.download_file(file_id="earnings-dates-2024.zip", filename="earnings_2024.zip")
```

--------------------------------

### API Key Configuration

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Configure your RavenPack API key either by setting the RP_API_KEY environment variable or directly in your code.

```APIDOC
## API Key Configuration

### Using Environment Variable
Set the `RP_API_KEY` environment variable before running your script.

### Setting API Key in Code
```python
from ravenpackapi import RPApi

# For standard RavenPack API
api = RPApi(api_key="YOUR_API_KEY")

# For RavenPack EDGE
api_edge = RPApi(api_key="YOUR_API_KEY", product="edge")
```
```

--------------------------------

### Handling API and Job Exceptions

Source: https://context7.com/ravenpack/python-api/llms.txt

This section demonstrates how to catch and handle specific exceptions such as APIException, ApiConnectionError, DataFileTimeout, and JobNotProcessing when interacting with the RavenPack API.

```APIDOC
## Exception Handling in RavenPack API

### Description
The RavenPack Python library uses custom exception classes to distinguish between API-level errors, network connectivity issues, and specific job processing failures. Proper implementation of these blocks ensures robust application behavior.

### Exception Types
- **APIException** - Raised for standard HTTP errors (4xx, 5xx).
- **ApiConnectionError** - Raised when network issues prevent communication with the API.
- **DataFileTimeout** - Raised when a bulk data job exceeds the specified timeout.
- **JobNotProcessing** - Raised when a requested job fails to enter a processing state.

### Implementation Example
```python
from ravenpackapi import RPApi, ApiConnectionError
from ravenpackapi.exceptions import APIException, DataFileTimeout, JobNotProcessing

api = RPApi(api_key="YOUR_API_KEY")

try:
    ds = api.get_dataset("invalid-id")
    data = ds.json(start_date="2024-01-01", end_date="2024-01-02")
except APIException as e:
    print(f"API Error: {e}")
except ApiConnectionError as e:
    print(f"Connection failed: {e}")

try:
    job = ds.request_datafile(start_date="2024-01-01", end_date="2024-12-31")
    job.wait_for_completion(timeout_seconds=300)
except DataFileTimeout:
    print("Job timed out")
except JobNotProcessing as e:
    print(f"Job failed: {e.status}")
```
```

--------------------------------

### Retrieve Dataset Definition

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Fetches the definition of an existing dataset from the RavenPack server using its unique identifier.

```python
ds = api.get_dataset(dataset_id='us30')
```

--------------------------------

### POST /json

Source: https://context7.com/ravenpack/python-api/llms.txt

Performs an ad-hoc JSON query on RavenPack data without requiring a pre-saved dataset configuration.

```APIDOC
## POST /json

### Description
Executes a direct query against the RavenPack analytics database to retrieve specific fields based on date ranges and filters.

### Method
POST

### Endpoint
/json

### Parameters
#### Request Body
- **start_date** (string) - Required - ISO format start date.
- **end_date** (string) - Required - ISO format end date.
- **filters** (object) - Optional - Dictionary of filter criteria (e.g., relevance, entity_type).
- **fields** (array) - Optional - List of specific fields to return.
- **frequency** (string) - Optional - Data granularity (e.g., 'granular').

### Request Example
{
  "start_date": "2024-01-01",
  "end_date": "2024-01-02",
  "filters": {"relevance": {"$gte": 90}, "entity_type": "COMP"},
  "fields": ["timestamp_utc", "entity_name", "headline"],
  "frequency": "granular"
}

### Response
#### Success Response (200)
- **results** (array) - List of records matching the query criteria.
```

--------------------------------

### Feature: Text Analytics Upload via Source URL

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Introduces the capability to upload text analytics data using a source URL, offering a more flexible way to ingest external data.

```python
api.upload.file(source_url="http://example.com/data.txt", upload_mode="RPJSON")
```

--------------------------------

### Upload Documents for NLP Analysis

Source: https://context7.com/ravenpack/python-api/llms.txt

Uploads files or URLs to the RavenPack platform for NLP analysis. The process includes setting document properties and waiting for the completion of the extraction task.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

uploaded_file = api.upload.file("document.pdf", properties={"primary_entity": "Apple Inc."})
uploaded_file.wait_for_completion()
print(f"Status: {uploaded_file.status}")
```

--------------------------------

### Cancel an API job using Python

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Shows how to cancel a data request job if it is currently in the ENQUEUED state.

```python
job = ds.request_datafile(...)
job.cancel()
```

--------------------------------

### Feature: Insider Transactions and Earnings Dates API

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Provides API support for 'Insider-transactions' and 'Earnings-Dates', enabling users to list available files and download them programmatically for automated processing.

```python
api.list_files(dataset_name="insider-transactions")
api.download_file(dataset_name="earnings-dates", file_name="some_file.csv")
```

--------------------------------

### Retrieve Entity Reference Information

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Fetches detailed historical and current information for a specific entity using its RP_ENTITY_ID.

```python
ALPHABET_RP_ENTITY_ID = '4A6F00'
references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)

for name in references.names:
    print(name.value, name.start, name.end)

for ticker in references.tickers:
    if ticker.is_valid():
        print(ticker)
```

--------------------------------

### Perform Ad-hoc JSON Queries

Source: https://context7.com/ravenpack/python-api/llms.txt

Executes direct queries against RavenPack datasets without requiring a pre-saved configuration. It supports filtering, field selection, and frequency settings, returning an iterable of records.

```python
results = api.json(
    start_date="2024-01-01",
    end_date="2024-01-02",
    filters={"relevance": {"$gte": 90}, "entity_type": "COMP"},
    fields=["timestamp_utc", "entity_name", "headline"],
    frequency="granular"
)
for record in results:
    print(record["headline"])

count_info = ds.count(
    start_date="2024-01-05 18:00:00",
    end_date="2024-01-05 18:01:00"
)
print(count_info)
```

--------------------------------

### Streaming Real-time Data

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Subscribe to a real-time data stream for a dataset to receive analytics records as they are published.

```APIDOC
## Streaming Real-time Data

It is possible to subscribe to a real-time stream for a dataset.

*It is suggested to handle possible disconnection with a retry policy.*

*A detailed real-time streaming example can be found in the library's examples folder.*

```python
# Example structure for setting up a stream (details may vary based on library implementation)
# from ravenpackapi import RPApi
# api = RPApi(api_key="YOUR_API_KEY")
# dataset_id = 'your_dataset_id'
# stream = api.get_realtime_stream(dataset_id=dataset_id)
# 
# for record in stream:
#     print(record)
#     # Process the record, e.g., record.timestamp_utc is a datetime object
```
```

--------------------------------

### Entity Mapping

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Map a universe of entities (names, tickers, etc.) to their corresponding RavenPack Entity IDs (RP_ENTITY_ID).

```APIDOC
## Entity Mapping

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

universe = [
    "RavenPack",
    {'ticker': 'AAPL'},
    {
        "client_id": "12345-A",
        "date": "2017-01-01",
        "name": "Amazon Inc.",
        "entity_type": "COMP",
        "isin": "US0231351067",
        "cusip": "023135106",
        "sedol": "B58WM62",
        "listing": "XNAS:AMZN"
    },
]

mapping = api.get_entity_mapping(universe)

print(f"Matched entities: {len(mapping.matched)}")
for m in mapping.matched:
    print(f"- {m.name} (RP ID: {m.rp_entity_id})")

print(f"Unmatched entities: {len(mapping.unmatched)}")
for u in mapping.unmatched:
    print(f"- {u.query}")
```
```

--------------------------------

### Fix: Lazy Loading Product Bug

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Addresses a bug in lazy loading where the incorrect product ('RPA') was sometimes sent when saving a dataset without modifications. This snippet shows the scenario where the issue could occur.

```python
from ravenpackapi import RPApi
api = RPApi(product="edge")
ds = api.get_dataset("SOME_DATASET_ID")
ds.save()
```

--------------------------------

### Retrieve Document URL

Source: https://context7.com/ravenpack/python-api/llms.txt

Resolves a RavenPack Story ID to its original source URL.

```python
from ravenpackapi import RPApi
api = RPApi(api_key="YOUR_API_KEY")
rp_story_id = "ABC123DEF456"
url = api.get_document_url(rp_story_id)
print(f"Original article: {url}")
```

--------------------------------

### Retrieve Entity Reference Data

Source: https://context7.com/ravenpack/python-api/llms.txt

Retrieves comprehensive historical and current reference data for a specific entity using its RavenPack Entity ID, including name history, tickers, and other identifiers.

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

ALPHABET_RP_ENTITY_ID = "4A6F00"
ref = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)

print(f"Entity: {ref.name}")
for name in ref.names:
    print(f"  {name.value}: {name.start} to {name.end}")

for ticker in ref.tickers:
    if ticker.is_valid():
        print(f"  {ticker.value}")
```

--------------------------------

### Entity Reference

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Retrieve detailed information for a specific entity using its RP_ENTITY_ID.

```APIDOC
## Entity Reference

```python
from ravenpackapi import RPApi

api = RPApi(api_key="YOUR_API_KEY")

ALPHABET_RP_ENTITY_ID = '4A6F00' # Example RP_ENTITY_ID for Alphabet Inc.

references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)

print("Entity Names:")
for name in references.names:
    print(f"- {name.value} (Valid from: {name.start}, to: {name.end})")

print("Tickers:")
for ticker in references.tickers:
    if ticker.is_valid():
        print(f"- {ticker.value} (Exchange: {ticker.exchange})")
```
```

--------------------------------

### Stream Real-Time Analytics

Source: https://context7.com/ravenpack/python-api/llms.txt

Subscribes to a real-time feed for a specific dataset. Includes logic for incremental backoff reconnection to ensure continuous data flow despite network interruptions.

```python
wait_time = incremental_backoff()
while True:
    try:
        for record in ds.request_realtime():
            print(f"Headline: {record.headline}, Sentiment: {record.event_sentiment_score}")
    except ApiConnectionError as e:
        logger.error(f"Connection error: {e}. Reconnecting...")
        time.sleep(next(wait_time))
```

--------------------------------

### Compatibility: RavenPack Edge Enhancements

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Enhances compatibility with RavenPack Edge by loosening validation for entity-types and RT fields to accommodate dynamic EDETs and new fields specific to Edge.

```python
# No specific code example, but implies changes in API handling of Edge data.
```

--------------------------------

### Save Annotated Document in JSON Format

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Obtains normalized content in JSON format, along with annotations for entities, events, and analytics. The `save_annotated` method allows specifying the output format, defaulting to JSON.

```python
f.save_annotated("_annotated_document.json", output_format='application/json')
```

--------------------------------

### Feature: Text Analytics Retry on /metadata Endpoint

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Implements retry logic for the '/metadata' endpoint in the Text Analytics API to handle cases where requests might be too early, improving robustness.

```python
api.get_text_analytics_metadata(entity_id="some_id")
```

--------------------------------

### Feature: Support for PRDT and Retry on 425 Status

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Adds support for 'PRDT' (product-type) in the entity_reference endpoint and includes retry logic for the 425 status code in certain text-analytics API calls.

```python
api.get_entity_type_reference(entity_type, product_type="PRDT")
```

--------------------------------

### Save Analytics from Processed Files

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Saves the analytics for processed files in either JSON-Lines or CSV format. This is done using the `save_analytics` method on the uploaded file object.

```python
f.save_analytics("_analytics.json")
```

--------------------------------

### Feature: Text Analytics /text-extraction Endpoint

Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md

Adds support for the '/text-extraction' endpoint within the Text Analytics API, allowing for direct text extraction functionalities.

```python
api.extract_text(text="This is a sample text.")
```

--------------------------------

### Map Entities to RP_ENTITY_ID

Source: https://github.com/ravenpack/python-api/blob/master/README.rst

Resolves a list of entity identifiers (tickers, names, ISINs) to RavenPack's internal RP_ENTITY_ID format.

```python
universe = [
    "RavenPack",
    {'ticker': 'AAPL'},
    {
        "client_id": "12345-A",
        "date": "2017-01-01",
        "name": "Amazon Inc.",
        "entity_type": "COMP",
        "isin": "US0231351067",
        "cusip": "023135106",
        "sedol": "B58WM62",
        "listing": "XNAS:AMZN"
    },
]
mapping = api.get_entity_mapping(universe)
```

--------------------------------

### Extract Normalized Documents

Source: https://github.com/ravenpack/python-api/blob/master/README.md

Retrieves normalized document content in JSON format, including text categorization, tables, and metadata. The `save_text_extraction` method is used for this purpose.

```python
f.save_text_extraction("_text_extraction.json")
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.