### Installation Source: https://github.com/ravenpack/python-api/blob/master/README.md Install the RavenPack API Python client using pip. ```APIDOC ## Installation ```bash pip install ravenpackapi ``` ``` -------------------------------- ### Manage RavenPack Datasets Source: https://context7.com/ravenpack/python-api/llms.txt Provides examples for retrieving, listing, updating, and deleting datasets using the RPApi client. Demonstrates how to get a dataset by ID, list private or public datasets, filter datasets by tags, modify dataset properties, and save or delete changes. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") # Get an existing dataset by ID ds = api.get_dataset(dataset_id="us30") # US 30 is a public dataset print(f"Dataset: {ds.name}, Frequency: {ds.frequency}") # Output: Dataset: US 30, Frequency: granular # List all private datasets datasets = api.list_datasets(scope="private") for dataset in datasets: print(f"{dataset.id}: {dataset.name}") # List public datasets public_datasets = api.list_datasets(scope="public") # List datasets by tag tagged_datasets = api.list_datasets(tags="production") # Update a dataset ds = api.get_dataset("YOUR_DATASET_ID") ds.name = "Updated Dataset Name" ds.filters["relevance"] = {"$gte": 95} ds.save() # Delete a dataset ds.delete() ``` -------------------------------- ### GET /realtime Source: https://context7.com/ravenpack/python-api/llms.txt Establishes a persistent connection to receive real-time news analytics as they are published. ```APIDOC ## GET /realtime ### Description Subscribes to a real-time stream of news analytics for a specific dataset. ### Method GET ### Endpoint /realtime ### Response #### Success Response (200) - **stream** (event-stream) - Continuous stream of JSON objects containing news analytics. ``` -------------------------------- ### Feature: Product Aware Instance for Edge Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Enables accessing the 'edge' product by simply instantiating the API with the `product='edge'` argument. This simplifies the setup for using Edge-specific features. ```python api = RPApi(product="edge") ``` -------------------------------- ### Create RavenPack Dataset Source: https://context7.com/ravenpack/python-api/llms.txt Shows how to create a new dataset using the Dataset class and the RPApi client. Covers specifying dataset name, description, filters, fields, frequency, and tags for RavenPack Analytics. Also includes an example for creating a dataset for the RavenPack Edge product. ```python from ravenpackapi import RPApi, Dataset api = RPApi(api_key="YOUR_API_KEY") # Create a dataset with filters for RavenPack Analytics ds = api.create_dataset( Dataset( name="High Relevance US Companies", description="News analytics for high-relevance US company mentions", filters={ "relevance": {"$gte": 90}, "country_code": "US", "entity_type": "COMP" }, fields=["timestamp_utc", "rp_entity_id", "entity_name", "event_relevance", "event_sentiment_score"], frequency="granular", tags=["production", "us-equities"] ) ) print(f"Created dataset: {ds.id}") # Output: Created dataset: abc123-def456-ghi789 # For RavenPack Edge product api_edge = RPApi(api_key="YOUR_API_KEY", product="edge") ds_edge = api_edge.create_dataset( Dataset( name="Edge Dataset", filters={"entity_relevance": {"$gte": 90}}, product="edge" ) ) ``` -------------------------------- ### Get Entity Mapping (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Demonstrates how to use the `get_entity_mapping` method to find RP_ENTITY_ID for a given universe of entities. The universe can include entity names, tickers, or detailed dictionaries with various identifiers. The result provides matched and unmatched entities. ```python universe = [ "RavenPack", {'ticker': 'AAPL'}, { "client_id": "12345-A", "date": "2017-01-01", "name": "Amazon Inc.", "entity_type": "COMP", "isin": "US0231351067", "cusip": "023135106", "sedol": "B58WM62", "listing": "XNAS:AMZN" }, ] mapping = api.get_entity_mapping(universe) print(f"Matched entities: {len(mapping.matched)}") for m in mapping.matched: print(f"- {m.name}") print(f"Unmatched entities: {len(mapping.unmatched)}") ``` -------------------------------- ### Fix: Create Edge Dataset with Product Attribute Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Resolves an issue where creating an Edge dataset failed if the product attribute was not explicitly set in the Dataset object, even when specified in the RPApi. This example shows the correct way to initialize RPApi and create a dataset. ```python from ravenpackapi import RPApi, Dataset api = RPApi(api_key="YOUR_API_KEY", product="edge") ds = api.create_dataset( Dataset( name="New Dataset", filters={"entity_relevance": {"$gte": 90}}, ) ) ``` -------------------------------- ### Get Dataset Definition (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Illustrates how to retrieve the definition of a pre-existing dataset using its ID. It highlights that dataset IDs differ between RPA and EDGE (e.g., 'us30' for RPA vs. 'us30-edge' for EDGE). ```python # For RPA, use 'us30' # For EDGE, use 'us30-edge' dataset_id = 'us30-edge' # or 'us30' ds = api.get_dataset(dataset_id=dataset_id) print(f"Dataset definition retrieved: {ds}") ``` -------------------------------- ### Feature: Store Entity Mapping Data in Memory Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Introduces a flag to store entity mapping data in memory for the 'edge' product, which can improve performance but should be used with caution. This example demonstrates how to enable this feature and iterate through entities. ```python eref = api.get_entity_type_reference(entity_type, "full", file_date) eref.store_in_memory = True for entity in eref: print(entity) ``` -------------------------------- ### Download Data via JSON Endpoint (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Demonstrates downloading data synchronously using the `json` method of a Dataset object. This method is optimized for small requests. It specifies start and end dates and iterates through the returned records. Limitations apply to the number of records and entities for granular and indicator datasets. ```python start_date = '2018-01-05 18:00:00' end_date = '2018-01-05 18:01:00' data = ds.json(start_date=start_date, end_date=end_date) for record in data: print(record) ``` -------------------------------- ### Get Entity Reference Information (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Shows how to retrieve detailed information for an entity using its RP_ENTITY_ID with the `get_entity_reference` method. This includes historical names, valid tickers, and other reference data associated with the entity. ```python ALPHABET_RP_ENTITY_ID = '4A6F00' references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID) print("Entity Names over History:") for name in references.names: print(f"- {name.value} (Valid from: {name.start} to {name.end})") print("Tickers Valid Today:") for ticker in references.tickers: if ticker.is_valid(): print(f"- {ticker}") ``` -------------------------------- ### GET /jobs Source: https://context7.com/ravenpack/python-api/llms.txt Manages asynchronous jobs, allowing users to check status, wait for completion, or cancel pending requests. ```APIDOC ## GET /jobs/{token} ### Description Retrieves the current status of an asynchronous job and provides access to the result URL once completed. ### Method GET ### Endpoint /jobs/{token} ### Parameters #### Path Parameters - **token** (string) - Required - The unique job token. ### Response #### Success Response (200) - **status** (string) - Current status (enqueued, processing, completed, error). - **url** (string) - Download URL for the result file. ``` -------------------------------- ### Getting Dataset Information Source: https://github.com/ravenpack/python-api/blob/master/README.md Retrieve the definition of a pre-existing dataset using its ID. ```APIDOC ## Getting Dataset Information ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") # Example dataset IDs rpa_dataset_id = 'us30' # For RPA edge_dataset_id = 'us30-edge' # For EDGE # Get dataset description selected_dataset_id = rpa_dataset_id # or edge_dataset_id ds = api.get_dataset(dataset_id=selected_dataset_id) print(f"Dataset details: {ds}") ``` ``` -------------------------------- ### Upload File to RavenPack API Source: https://github.com/ravenpack/python-api/blob/master/README.md Uploads a file to the RavenPack Annotations platform for analysis. The function `api.upload.file()` initiates the upload process. Further options are detailed in the user guide. ```python f = api.upload.file("_orig.doc") ``` -------------------------------- ### Feature: Entity-Type-Reference with Past Dates and Deltas Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Adds support for retrieving entity references from past dates and allows specifying `reference_type='delta'` to get only daily changes for Edge. This enhances historical data analysis capabilities. ```python api.get_entity_type_reference(entity_type, reference_type="delta") ``` -------------------------------- ### Initialize RavenPack API Client (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Demonstrates how to instantiate the RPApi client, either by setting the RP_API_KEY environment variable or by passing the API key directly. It also shows how to configure the client for RavenPack EDGE. ```python from ravenpackapi import RPApi # Using API key directly api = RPApi(api_key="YOUR_API_KEY") # For RavenPack EDGE api_edge = RPApi(api_key="YOUR_API_KEY", product="edge") ``` -------------------------------- ### Initialize RavenPack API Client Source: https://context7.com/ravenpack/python-api/llms.txt Demonstrates how to initialize the RPApi client, including authentication methods (API key or environment variable), specifying the product (RPA or Edge), and configuring common request parameters like proxies, timeouts, and SSL verification. Also shows how to check the API connection status. ```python from ravenpackapi import RPApi # Initialize with explicit API key api = RPApi(api_key="YOUR_API_KEY") # Or use environment variable RP_API_KEY api = RPApi() # For RavenPack Edge product api = RPApi(api_key="YOUR_API_KEY", product="edge") # Configure request parameters (proxy, timeouts, SSL verification) api.common_request_params.update({ "proxies": {"https": "http://your_internal_proxy:9999"}, "timeout": (20, 120), # connect, read timeouts "verify": False # disable certificate verification }) # Check API connection status status = api.get_status() print(status) # Output: {'datasets': 50, 'remaining_datafile_bytes': 100000000000} ``` -------------------------------- ### Creating a New Dataset Source: https://github.com/ravenpack/python-api/blob/master/README.md Demonstrates how to create a new dataset using the `create_dataset` method with a `Dataset` instance. ```APIDOC ## Creating a New Dataset ```python from ravenpackapi import Dataset, RPApi api = RPApi(api_key="YOUR_API_KEY") # Define relevance filter (use 'entity_relevance' for EDGE) entity_relevance = "relevance" # or "entity_relevance" for EDGE dataset = Dataset( name="My New Dataset", filters={ entity_relevance: { "$gte": 90 } }, ) created_dataset = api.create_dataset(dataset) print(f"Dataset created: {created_dataset}") ``` ``` -------------------------------- ### Organize Upload Folders Source: https://context7.com/ravenpack/python-api/llms.txt Shows how to list, create, update, and delete folders for batch processing of uploaded documents. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") for f in api.upload.list(start_date="2024-01-01", end_date="2024-01-31", status="COMPLETED"): print(f"{f.file_id}: {f.file_name} - {f.status}") folder = api.upload.folder_create(folder_name="Q1 Reports", starred=True) uploaded = api.upload.file("report.pdf", folder=folder) for folder in api.upload.list_folders(): print(f"{folder.folder_id}: {folder.folder_name}") folder.folder_name = "2024 Q1 Reports" folder.save() quota = api.upload.quota() print(f"Used: {quota['used_bytes']}, Limit: {quota['limit_bytes']}") folder.delete() ``` -------------------------------- ### Create and Request Datafile with Timezone - Python Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Demonstrates how to create a custom dataset, save it, and request a datafile with specific date ranges, compression, and timezone settings. This functionality requires an active API connection and dataset configuration. ```python custom_dataset = Dataset( api=api, name="Us30 indicators", filters=us30.filters, fields=new_fields, frequency='daily' ) custom_dataset.save() print(custom_dataset) job = custom_dataset.request_datafile( start_date='2017-01-01 19:30', end_date='2017-01-02 19:30', compressed=True, time_zone='Europe/London', ) ``` -------------------------------- ### Create a New Dataset with Filters (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Shows how to create a new dataset using the `create_dataset` method. It involves defining a Dataset object with a name and a filter, such as relevance greater than or equal to 90. The appropriate filter key ('relevance' for RPA, 'entity_relevance' for EDGE) should be used. ```python from ravenpackapi import Dataset # For RPA, use 'relevance' # For EDGE, use 'entity_relevance' entity_relevance_filter = "entity_relevance" # or "relevance" dataset_name = "New Dataset" data_filter = { entity_relevance_filter: { "$gte": 90 } } ds = api.create_dataset(Dataset(name=dataset_name, filters=data_filter)) print(f"Dataset created: {ds}") ``` -------------------------------- ### Count dataset records using Python Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Demonstrates how to retrieve a dataset and count the number of records within a specific time range. ```python dataset = api.get_dataset('us30') data_count = ds.count( start_date='2018-01-05 18:00:00', end_date='2018-01-05 18:01:00', ) # {'count': 11, 'stories': 10, 'entities': 6} ``` -------------------------------- ### Initialize RavenPack API Client Source: https://github.com/ravenpack/python-api/blob/master/README.rst Instantiates the RPApi object using an API key. This is the entry point for all subsequent API interactions. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") ``` -------------------------------- ### Process data in chunks using time intervals Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Illustrates how to split a large date range into smaller intervals (e.g., daily) to download data files in chunks. ```python from ravenpackapi.util import ( SPLIT_YEARLY, SPLIT_MONTHLY, SPLIT_WEEKLY, SPLIT_DAILY, time_intervals ) split = SPLIT_DAILY for range_start, range_end in time_intervals(start_date, end_date, split=split): job = ds.request_datafile( start_date=range_start, end_date=range_end, compressed=GET_COMPRESSED, ) ``` -------------------------------- ### Request Datafile for Large Downloads (Python) Source: https://github.com/ravenpack/python-api/blob/master/README.md Shows how to request a data file asynchronously for larger data downloads using the `request_datafile` method. This returns a job object that prepares the file on the server. The downloaded file can then be saved using `save_to_file`. ```python start_date = '2018-01-05 18:00:00' end_date = '2018-01-05 18:01:00' job = ds.request_datafile(start_date=start_date, end_date=end_date) # The job will take time to complete. You can monitor its status. # Once ready, save it to a file: output_filename = 'output.csv' with open(output_filename, 'wb') as fp: job.save_to_file(filename=fp.name) print(f"Datafile requested and saved to {output_filename}") ``` -------------------------------- ### Download Data Asynchronously Source: https://github.com/ravenpack/python-api/blob/master/README.rst Requests a large datafile to be prepared on the server. Returns a job object that can be saved to a file once complete. ```python job = ds.request_datafile( start_date='2018-01-05 18:00:00', end_date='2018-01-05 18:01:00', ) with open('output.csv') as fp: job.save_to_file(filename=fp.name) ``` -------------------------------- ### Download Bulk Entity Type Reference Source: https://context7.com/ravenpack/python-api/llms.txt Downloads a full reference file for all entities of a specific type. Supports streaming to file, historical snapshots for Edge users, and loading from local files. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") eref = api.get_entity_type_reference(entity_type="COMP", reference_type="full") eref.write_to_file("companies_reference.csv") api_edge = RPApi(api_key="YOUR_API_KEY", product="edge") eref_historical = api_edge.get_entity_type_reference(entity_type="COMP", reference_type="full", date="2023-06-01") ``` -------------------------------- ### Create a New Dataset Source: https://github.com/ravenpack/python-api/blob/master/README.rst Creates a new dataset definition on the server with specific filtering criteria. Requires a Dataset object instance. ```python from ravenpackapi import Dataset ds = api.create_dataset( Dataset( name="New Dataset", filters={ "relevance": { "$gte": 90 } }, ) ) print("Dataset created", ds) ``` -------------------------------- ### Downloading Data - Datafile Endpoint Source: https://github.com/ravenpack/python-api/blob/master/README.md Use the `request_datafile` method for asynchronous preparation and retrieval of large data chunks. ```APIDOC ## Downloading Data - Datafile Endpoint This method prepares a datafile asynchronously on the server. ```python # Assuming 'ds' is a Dataset object obtained from api.get_dataset() job = ds.request_datafile( start_date='2018-01-05 18:00:00', end_date='2018-01-05 18:01:00', ) # The job will take time to complete. You can then save it to a file. # For example, saving to 'output.csv': # job.save_to_file(filename='output.csv') print(f"Datafile request submitted. Job ID: {job.job_id}") # To retrieve the data, you would typically poll the job status and then download. ``` ``` -------------------------------- ### Download Data via JSON Endpoint Source: https://context7.com/ravenpack/python-api/llms.txt Illustrates how to download data synchronously from a dataset using the JSON endpoint. This method is suitable for smaller requests and shows how to specify date ranges and time zones, and then iterate through the returned analytics records. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") # Get dataset and query data synchronously ds = api.get_dataset(dataset_id="us30") data = ds.json( start_date="2024-01-05 18:00:00", end_date="2024-01-05 18:30:00", time_zone="America/New_York" ) # Iterate through analytics records for record in data: print(f"{record.timestamp_utc} | {record.entity_name} | " f"Sentiment: {record.event_sentiment_score}") # Output: 2024-01-05 18:05:23 | Apple Inc. | Sentiment: 0.85 ``` -------------------------------- ### Download Bulk Datafiles Source: https://context7.com/ravenpack/python-api/llms.txt Handles large asynchronous data requests by queuing jobs on RavenPack servers. This snippet demonstrates saving results to a file, iterating through rows without disk storage, and chunking large date ranges. ```python job = ds.request_datafile(start_date="2024-01-01", end_date="2024-01-31", output_format="csv", compressed=True) job.save_to_file(filename="analytics_jan2024.zip") for row in job.iterate_results(include_headers=False): print(f"{row[0]}: {row[2]}") for range_start, range_end in time_intervals("2024-01-01", "2024-03-31", split=SPLIT_WEEKLY): job = ds.request_datafile(start_date=range_start, end_date=range_end, compressed=True) if job: job.save_to_file(filename=f"data-{range_start.strftime('%Y-%m-%d')}.zip") ``` -------------------------------- ### POST /datafile Source: https://context7.com/ravenpack/python-api/llms.txt Requests a bulk datafile for historical data analysis, supporting asynchronous processing and various output formats. ```APIDOC ## POST /datafile ### Description Queues a job on RavenPack servers to generate a bulk datafile. This is an asynchronous operation returning a job token. ### Method POST ### Endpoint /datafile ### Parameters #### Request Body - **start_date** (string) - Required - Start date for the data range. - **end_date** (string) - Required - End date for the data range. - **output_format** (string) - Optional - Format of the output (e.g., 'csv'). - **compressed** (boolean) - Optional - Whether to return a compressed file. ### Request Example { "start_date": "2024-01-01", "end_date": "2024-01-31", "output_format": "csv", "compressed": true } ### Response #### Success Response (200) - **job_token** (string) - Unique identifier for the asynchronous job. ``` -------------------------------- ### Run Acceptance Tests (Bash) Source: https://github.com/ravenpack/python-api/blob/master/ravenpackapi/tests/README.md Executes the acceptance tests for the RavenPack Python API. These tests interact with the actual API and require a valid API key. ```bash RP_API_KEY="XXXX" pytest ravenpackapi/tests/acceptance ``` -------------------------------- ### Manage Uploaded Documents Source: https://context7.com/ravenpack/python-api/llms.txt Demonstrates how to save analytics, extract text, manage metadata, and delete uploaded files using the RavenPack API. ```python uploaded_file.save_analytics("analytics.json", output_format="application/json") uploaded_file.save_analytics("analytics.csv", output_format="text/csv") analytics = uploaded_file.get_analytics(output_format="application/json") for record in analytics: print(f"Entity: {record['entity_name']}, Sentiment: {record['sentiment']}") uploaded_file.save_annotated("annotated.json", output_format="application/json") uploaded_file.save_text_extraction("extracted.json", output_format="application/json") uploaded_file.save_original("original_backup.pdf") metadata = uploaded_file.get_metadata() print(f"File: {metadata['file_name']}, Size: {metadata['raw_size']}") uploaded_file.set_metadata(tags=["quarterly-report", "2024-Q1"], starred=True) uploaded_file.delete() ``` -------------------------------- ### Handle API and Connection Errors in Python Source: https://context7.com/ravenpack/python-api/llms.txt Demonstrates how to gracefully handle various exceptions that may occur when interacting with the RavenPack API, including general API errors, connection issues, and specific job-related failures like timeouts or processing errors. This ensures robust application behavior. ```python from ravenpackapi import RPApi, ApiConnectionError from ravenpackapi.exceptions import ( APIException, DataFileTimeout, JobNotProcessing ) api = RPApi(api_key="YOUR_API_KEY") try: # API operations ds = api.get_dataset("invalid-id") data = ds.json(start_date="2024-01-01", end_date="2024-01-02") except APIException as e: # General API errors (400, 401, 403, 404, 500, etc.) print(f"API Error: {e}") print(f"Response: {e.response.text}") except ApiConnectionError as e: # Network connectivity issues print(f"Connection failed: {e}") try: job = ds.request_datafile( start_date="2024-01-01", end_date="2024-12-31" ) job.wait_for_completion(timeout_seconds=300) except DataFileTimeout: print("Job timed out - try a smaller date range") except JobNotProcessing as e: print(f"Job failed with status: {e.status}") ``` -------------------------------- ### Map Entities to RavenPack IDs Source: https://context7.com/ravenpack/python-api/llms.txt Demonstrates how to map a list of entities using various identifiers like tickers, ISINs, or names to RavenPack Entity IDs. It includes handling matched entities and processing errors for non-matching inputs. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") universe = [ "RavenPack", {"ticker": "AAPL"}, {"ticker": "JPM", "name": "JPMorgan"}, {"listing": "XNYS:DVN"}, { "client_id": "my-id-12345", "date": "2024-01-01", "name": "Amazon Inc.", "entity_type": "COMP", "isin": "US0231351067", "cusip": "023135106", "sedol": "B58WM62", "listing": "XNAS:AMZN" }, {"isin": "US88339J1051", "name": "TRADE DESK INC/THE -CLASS A"} ] mapping = api.get_entity_mapping(universe) print(f"Matched: {len(mapping.matched)}/{len(universe)}") for match in mapping.matched: print(f" {match.id}: {match.name} ({match.type}) - Score: {match.score}") for error in mapping.errors: print(f"Error matching: {error.request}") ``` -------------------------------- ### Fix: Download Taxonomy Reference Files Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Fixes a crash when attempting to retrieve and save 'occupations-taxonomy' and 'jobs-taxonomy' reference files. This code demonstrates how to use the `get_entity_type_reference` method and write the output to a CSV file. ```python from ravenpackapi import RPApi api = RPApi(product="edge") occupations = api.get_entity_type_reference("occupations-taxonomy") occupations.write_to_file("occupations-reference.csv") jobs = api.get_entity_type_reference("jobs-taxonomy") jobs.write_to_file("jobs-reference.csv") ``` -------------------------------- ### Configure Common Request Parameters in Python Source: https://github.com/ravenpack/python-api/blob/master/README.rst This snippet demonstrates how to update the `common_request_params` attribute of the RPApi object to set global request parameters. It shows how to configure proxy settings and disable SSL certificate verification for all subsequent API requests made using the wrapper. This is useful for environments with internal proxies or when dealing with self-signed certificates. ```python from ravenpack import RPApi api = RPApi() api.common_request_params.update( dict( proxies={'https': 'http://your_internal_proxy:9999'}, verify=False, ) ) # use the api to do requests ``` -------------------------------- ### Download Data Synchronously Source: https://github.com/ravenpack/python-api/blob/master/README.rst Downloads small chunks of data synchronously using the JSON endpoint. Best suited for limited record sets. ```python data = ds.json( start_date='2018-01-05 18:00:00', end_date='2018-01-05 18:01:00', ) for record in data: print(record) ``` -------------------------------- ### Downloading Data - JSON Endpoint Source: https://github.com/ravenpack/python-api/blob/master/README.md Use the JSON endpoint for synchronous data retrieval, optimized for smaller requests. ```APIDOC ## Downloading Data - JSON Endpoint *Note: Limited to 10,000 records for granular datasets and 500 entities/1 year for indicator datasets.* ```python # Assuming 'ds' is a Dataset object obtained from api.get_dataset() data = ds.json( start_date='2018-01-05 18:00:00', end_date='2018-01-05 18:01:00', ) for record in data: print(record) ``` ``` -------------------------------- ### Run Unit Tests (Bash) Source: https://github.com/ravenpack/python-api/blob/master/ravenpackapi/tests/README.md Executes the unit tests for the RavenPack Python API. These tests are isolated and do not require an active API key. ```bash RP_API_KEY="" pytest ravenpackapi/tests/unit ``` -------------------------------- ### Download Historical Flat Files Source: https://context7.com/ravenpack/python-api/llms.txt Retrieves pre-generated historical analytics data files, supporting both direct downloads and streaming for custom processing. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") files = api.get_flatfile_list("companies") for f in files["files"]: print(f"{f['id']}: {f['date']}") api.save_flatfile(flatfile_type="companies", flatfile_id="companies-2024-01.zip", output_filename="./data/companies-2024-01.zip", overwrite=True) response = api.get_flatfile("companies", "companies-2024-01.zip") with open("output.zip", "wb") as f: for chunk in response.iter_content(chunk_size=8192): f.write(chunk) ``` -------------------------------- ### Configure Low-Level API Requests Source: https://github.com/ravenpack/python-api/blob/master/README.md Configures common request parameters for the RavenPack API wrapper, which utilizes the 'requests' library. This includes setting proxies, timeouts, and disabling SSL certificate verification. ```python api = RPApi() api.common_request_params.update( dict( proxies={'https': 'http://your_internal_proxy:9999'}, timeout=(20, 120), # connect, read timeouts verify=False, # disable certificate verification ) ) # use the api to do requests ``` -------------------------------- ### Manage Asynchronous Jobs Source: https://context7.com/ravenpack/python-api/llms.txt Provides methods to monitor, refresh, and cancel background datafile requests. It includes error handling for timeouts and job failures, as well as listing historical jobs. ```python job = ds.request_datafile(start_date="2024-01-01", end_date="2024-01-31") job.get_status() try: job.wait_for_completion(timeout_seconds=600) print(f"Job completed: {job.url}") except DataFileTimeout: print("Job timed out") pending_job = ds.request_datafile(start_date="2024-01-01", end_date="2024-12-31") pending_job.cancel() ``` -------------------------------- ### Access Key Events API Source: https://context7.com/ravenpack/python-api/llms.txt Provides methods to list and download insider transaction archives and earnings date files. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") yearly_files = api.insider_trasactions.list_yearly_archives() api.insider_trasactions.download_file(file_id="insider-transactions-2024-01-15.csv", filename="insider_jan15.csv") earnings_yearly = api.earnings_dates.list_yearly_archives() api.earnings_dates.download_file(file_id="earnings-dates-2024.zip", filename="earnings_2024.zip") ``` -------------------------------- ### API Key Configuration Source: https://github.com/ravenpack/python-api/blob/master/README.md Configure your RavenPack API key either by setting the RP_API_KEY environment variable or directly in your code. ```APIDOC ## API Key Configuration ### Using Environment Variable Set the `RP_API_KEY` environment variable before running your script. ### Setting API Key in Code ```python from ravenpackapi import RPApi # For standard RavenPack API api = RPApi(api_key="YOUR_API_KEY") # For RavenPack EDGE api_edge = RPApi(api_key="YOUR_API_KEY", product="edge") ``` ``` -------------------------------- ### Handling API and Job Exceptions Source: https://context7.com/ravenpack/python-api/llms.txt This section demonstrates how to catch and handle specific exceptions such as APIException, ApiConnectionError, DataFileTimeout, and JobNotProcessing when interacting with the RavenPack API. ```APIDOC ## Exception Handling in RavenPack API ### Description The RavenPack Python library uses custom exception classes to distinguish between API-level errors, network connectivity issues, and specific job processing failures. Proper implementation of these blocks ensures robust application behavior. ### Exception Types - **APIException** - Raised for standard HTTP errors (4xx, 5xx). - **ApiConnectionError** - Raised when network issues prevent communication with the API. - **DataFileTimeout** - Raised when a bulk data job exceeds the specified timeout. - **JobNotProcessing** - Raised when a requested job fails to enter a processing state. ### Implementation Example ```python from ravenpackapi import RPApi, ApiConnectionError from ravenpackapi.exceptions import APIException, DataFileTimeout, JobNotProcessing api = RPApi(api_key="YOUR_API_KEY") try: ds = api.get_dataset("invalid-id") data = ds.json(start_date="2024-01-01", end_date="2024-01-02") except APIException as e: print(f"API Error: {e}") except ApiConnectionError as e: print(f"Connection failed: {e}") try: job = ds.request_datafile(start_date="2024-01-01", end_date="2024-12-31") job.wait_for_completion(timeout_seconds=300) except DataFileTimeout: print("Job timed out") except JobNotProcessing as e: print(f"Job failed: {e.status}") ``` ``` -------------------------------- ### Retrieve Dataset Definition Source: https://github.com/ravenpack/python-api/blob/master/README.rst Fetches the definition of an existing dataset from the RavenPack server using its unique identifier. ```python ds = api.get_dataset(dataset_id='us30') ``` -------------------------------- ### POST /json Source: https://context7.com/ravenpack/python-api/llms.txt Performs an ad-hoc JSON query on RavenPack data without requiring a pre-saved dataset configuration. ```APIDOC ## POST /json ### Description Executes a direct query against the RavenPack analytics database to retrieve specific fields based on date ranges and filters. ### Method POST ### Endpoint /json ### Parameters #### Request Body - **start_date** (string) - Required - ISO format start date. - **end_date** (string) - Required - ISO format end date. - **filters** (object) - Optional - Dictionary of filter criteria (e.g., relevance, entity_type). - **fields** (array) - Optional - List of specific fields to return. - **frequency** (string) - Optional - Data granularity (e.g., 'granular'). ### Request Example { "start_date": "2024-01-01", "end_date": "2024-01-02", "filters": {"relevance": {"$gte": 90}, "entity_type": "COMP"}, "fields": ["timestamp_utc", "entity_name", "headline"], "frequency": "granular" } ### Response #### Success Response (200) - **results** (array) - List of records matching the query criteria. ``` -------------------------------- ### Feature: Text Analytics Upload via Source URL Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Introduces the capability to upload text analytics data using a source URL, offering a more flexible way to ingest external data. ```python api.upload.file(source_url="http://example.com/data.txt", upload_mode="RPJSON") ``` -------------------------------- ### Upload Documents for NLP Analysis Source: https://context7.com/ravenpack/python-api/llms.txt Uploads files or URLs to the RavenPack platform for NLP analysis. The process includes setting document properties and waiting for the completion of the extraction task. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") uploaded_file = api.upload.file("document.pdf", properties={"primary_entity": "Apple Inc."}) uploaded_file.wait_for_completion() print(f"Status: {uploaded_file.status}") ``` -------------------------------- ### Cancel an API job using Python Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Shows how to cancel a data request job if it is currently in the ENQUEUED state. ```python job = ds.request_datafile(...) job.cancel() ``` -------------------------------- ### Feature: Insider Transactions and Earnings Dates API Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Provides API support for 'Insider-transactions' and 'Earnings-Dates', enabling users to list available files and download them programmatically for automated processing. ```python api.list_files(dataset_name="insider-transactions") api.download_file(dataset_name="earnings-dates", file_name="some_file.csv") ``` -------------------------------- ### Retrieve Entity Reference Information Source: https://github.com/ravenpack/python-api/blob/master/README.rst Fetches detailed historical and current information for a specific entity using its RP_ENTITY_ID. ```python ALPHABET_RP_ENTITY_ID = '4A6F00' references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID) for name in references.names: print(name.value, name.start, name.end) for ticker in references.tickers: if ticker.is_valid(): print(ticker) ``` -------------------------------- ### Perform Ad-hoc JSON Queries Source: https://context7.com/ravenpack/python-api/llms.txt Executes direct queries against RavenPack datasets without requiring a pre-saved configuration. It supports filtering, field selection, and frequency settings, returning an iterable of records. ```python results = api.json( start_date="2024-01-01", end_date="2024-01-02", filters={"relevance": {"$gte": 90}, "entity_type": "COMP"}, fields=["timestamp_utc", "entity_name", "headline"], frequency="granular" ) for record in results: print(record["headline"]) count_info = ds.count( start_date="2024-01-05 18:00:00", end_date="2024-01-05 18:01:00" ) print(count_info) ``` -------------------------------- ### Streaming Real-time Data Source: https://github.com/ravenpack/python-api/blob/master/README.md Subscribe to a real-time data stream for a dataset to receive analytics records as they are published. ```APIDOC ## Streaming Real-time Data It is possible to subscribe to a real-time stream for a dataset. *It is suggested to handle possible disconnection with a retry policy.* *A detailed real-time streaming example can be found in the library's examples folder.* ```python # Example structure for setting up a stream (details may vary based on library implementation) # from ravenpackapi import RPApi # api = RPApi(api_key="YOUR_API_KEY") # dataset_id = 'your_dataset_id' # stream = api.get_realtime_stream(dataset_id=dataset_id) # # for record in stream: # print(record) # # Process the record, e.g., record.timestamp_utc is a datetime object ``` ``` -------------------------------- ### Entity Mapping Source: https://github.com/ravenpack/python-api/blob/master/README.md Map a universe of entities (names, tickers, etc.) to their corresponding RavenPack Entity IDs (RP_ENTITY_ID). ```APIDOC ## Entity Mapping ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") universe = [ "RavenPack", {'ticker': 'AAPL'}, { "client_id": "12345-A", "date": "2017-01-01", "name": "Amazon Inc.", "entity_type": "COMP", "isin": "US0231351067", "cusip": "023135106", "sedol": "B58WM62", "listing": "XNAS:AMZN" }, ] mapping = api.get_entity_mapping(universe) print(f"Matched entities: {len(mapping.matched)}") for m in mapping.matched: print(f"- {m.name} (RP ID: {m.rp_entity_id})") print(f"Unmatched entities: {len(mapping.unmatched)}") for u in mapping.unmatched: print(f"- {u.query}") ``` ``` -------------------------------- ### Fix: Lazy Loading Product Bug Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Addresses a bug in lazy loading where the incorrect product ('RPA') was sometimes sent when saving a dataset without modifications. This snippet shows the scenario where the issue could occur. ```python from ravenpackapi import RPApi api = RPApi(product="edge") ds = api.get_dataset("SOME_DATASET_ID") ds.save() ``` -------------------------------- ### Retrieve Document URL Source: https://context7.com/ravenpack/python-api/llms.txt Resolves a RavenPack Story ID to its original source URL. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") rp_story_id = "ABC123DEF456" url = api.get_document_url(rp_story_id) print(f"Original article: {url}") ``` -------------------------------- ### Retrieve Entity Reference Data Source: https://context7.com/ravenpack/python-api/llms.txt Retrieves comprehensive historical and current reference data for a specific entity using its RavenPack Entity ID, including name history, tickers, and other identifiers. ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") ALPHABET_RP_ENTITY_ID = "4A6F00" ref = api.get_entity_reference(ALPHABET_RP_ENTITY_ID) print(f"Entity: {ref.name}") for name in ref.names: print(f" {name.value}: {name.start} to {name.end}") for ticker in ref.tickers: if ticker.is_valid(): print(f" {ticker.value}") ``` -------------------------------- ### Entity Reference Source: https://github.com/ravenpack/python-api/blob/master/README.md Retrieve detailed information for a specific entity using its RP_ENTITY_ID. ```APIDOC ## Entity Reference ```python from ravenpackapi import RPApi api = RPApi(api_key="YOUR_API_KEY") ALPHABET_RP_ENTITY_ID = '4A6F00' # Example RP_ENTITY_ID for Alphabet Inc. references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID) print("Entity Names:") for name in references.names: print(f"- {name.value} (Valid from: {name.start}, to: {name.end})") print("Tickers:") for ticker in references.tickers: if ticker.is_valid(): print(f"- {ticker.value} (Exchange: {ticker.exchange})") ``` ``` -------------------------------- ### Stream Real-Time Analytics Source: https://context7.com/ravenpack/python-api/llms.txt Subscribes to a real-time feed for a specific dataset. Includes logic for incremental backoff reconnection to ensure continuous data flow despite network interruptions. ```python wait_time = incremental_backoff() while True: try: for record in ds.request_realtime(): print(f"Headline: {record.headline}, Sentiment: {record.event_sentiment_score}") except ApiConnectionError as e: logger.error(f"Connection error: {e}. Reconnecting...") time.sleep(next(wait_time)) ``` -------------------------------- ### Compatibility: RavenPack Edge Enhancements Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Enhances compatibility with RavenPack Edge by loosening validation for entity-types and RT fields to accommodate dynamic EDETs and new fields specific to Edge. ```python # No specific code example, but implies changes in API handling of Edge data. ``` -------------------------------- ### Save Annotated Document in JSON Format Source: https://github.com/ravenpack/python-api/blob/master/README.md Obtains normalized content in JSON format, along with annotations for entities, events, and analytics. The `save_annotated` method allows specifying the output format, defaulting to JSON. ```python f.save_annotated("_annotated_document.json", output_format='application/json') ``` -------------------------------- ### Feature: Text Analytics Retry on /metadata Endpoint Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Implements retry logic for the '/metadata' endpoint in the Text Analytics API to handle cases where requests might be too early, improving robustness. ```python api.get_text_analytics_metadata(entity_id="some_id") ``` -------------------------------- ### Feature: Support for PRDT and Retry on 425 Status Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Adds support for 'PRDT' (product-type) in the entity_reference endpoint and includes retry logic for the 425 status code in certain text-analytics API calls. ```python api.get_entity_type_reference(entity_type, product_type="PRDT") ``` -------------------------------- ### Save Analytics from Processed Files Source: https://github.com/ravenpack/python-api/blob/master/README.md Saves the analytics for processed files in either JSON-Lines or CSV format. This is done using the `save_analytics` method on the uploaded file object. ```python f.save_analytics("_analytics.json") ``` -------------------------------- ### Feature: Text Analytics /text-extraction Endpoint Source: https://github.com/ravenpack/python-api/blob/master/CHANGELOG.md Adds support for the '/text-extraction' endpoint within the Text Analytics API, allowing for direct text extraction functionalities. ```python api.extract_text(text="This is a sample text.") ``` -------------------------------- ### Map Entities to RP_ENTITY_ID Source: https://github.com/ravenpack/python-api/blob/master/README.rst Resolves a list of entity identifiers (tickers, names, ISINs) to RavenPack's internal RP_ENTITY_ID format. ```python universe = [ "RavenPack", {'ticker': 'AAPL'}, { "client_id": "12345-A", "date": "2017-01-01", "name": "Amazon Inc.", "entity_type": "COMP", "isin": "US0231351067", "cusip": "023135106", "sedol": "B58WM62", "listing": "XNAS:AMZN" }, ] mapping = api.get_entity_mapping(universe) ``` -------------------------------- ### Extract Normalized Documents Source: https://github.com/ravenpack/python-api/blob/master/README.md Retrieves normalized document content in JSON format, including text categorization, tables, and metadata. The `save_text_extraction` method is used for this purpose. ```python f.save_text_extraction("_text_extraction.json") ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.