### Install Unstructured Client Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt Commands to install the SDK using common Python package managers. ```bash pip install unstructured-client ``` ```bash uv add unstructured-client ``` ```bash poetry add unstructured-client ``` -------------------------------- ### Install SDK with PIP Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Use the standard pip package installer to add the SDK. ```bash pip install unstructured-client ``` -------------------------------- ### Install SDK with uv Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Use uv to add the unstructured-client package to your project. ```bash uv add unstructured-client ``` -------------------------------- ### Install SDK with Poetry Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Use poetry to manage the SDK dependency in your project. ```bash poetry add unstructured-client ``` -------------------------------- ### GET /api/v1/sources/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieves a list of all configured source connectors. ```APIDOC ## GET /api/v1/sources/ ### Description Retrieves a list of all configured source connectors. ### Method GET ### Endpoint /api/v1/sources/ ### Parameters #### Request Body - **request** (operations.ListSourcesRequest) - Required - The request object to use for the request. - **retries** (Optional[utils.RetryConfig]) - Optional - Configuration to override the default retry behavior of the client. ### Response - **operations.ListSourcesResponse** - The response object containing the list of sources. ``` -------------------------------- ### GET /api/v1/templates/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/templates/README.md Retrieves a list of all available templates along with their metadata. ```APIDOC ## GET /api/v1/templates/ ### Description Retrieve a list of available templates with their metadata. ### Method GET ### Endpoint /api/v1/templates/ ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.templates.list_templates(request={}) assert res.response_list_templates is not None # Handle response print(res.response_list_templates) ``` ### Response #### Success Response (200) - **response_list_templates** (object) - A list of available templates with their metadata. #### Response Example ```json { "response_list_templates": [ { "id": "string", "name": "string" } ] } ``` ### Errors - **errors.HTTPValidationError** (422, application/json) - **errors.SDKError** (4XX, 5XX, */*) ``` -------------------------------- ### Asynchronous Document Partitioning Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt This example demonstrates how to perform document partitioning asynchronously using the `partition_async` method. Ensure you are running this within an async context. ```python import asyncio from unstructured_client import UnstructuredClient from unstructured_client.models import operations, shared async def process_document(): async with UnstructuredClient(api_key_auth="your-api-key") as client: with open("document.pdf", "rb") as f: content = f.read() response = await client.general.partition_async( request=operations.PartitionRequest( partition_parameters=shared.PartitionParameters( files=shared.Files( content=content, file_name="document.pdf" ), strategy="fast", ) ) ) return response.elements # Run async function elements = asyncio.run(process_document()) ``` -------------------------------- ### GET /api/v1/sources Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieve a list of available source connectors. ```APIDOC ## GET /api/v1/sources ### Description Retrieve a list of available source connectors. ### Method GET ### Endpoint /api/v1/sources ### Response #### Success Response (200) - **sources** (array) - A list of available source connectors. ### Errors - **422** - HTTPValidationError - **4XX, 5XX** - SDKError ``` -------------------------------- ### Initialize UnstructuredClient Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt Setup the client with API key authentication, custom retry strategies, and timeout settings. Using the client as a context manager is recommended for proper resource management. ```python from unstructured_client import UnstructuredClient from unstructured_client.utils import BackoffStrategy, RetryConfig import os # Basic initialization client = UnstructuredClient( api_key_auth=os.environ.get("UNSTRUCTURED_API_KEY"), ) # Advanced initialization with retry config and timeout client = UnstructuredClient( api_key_auth=os.environ.get("UNSTRUCTURED_API_KEY"), retry_config=RetryConfig( strategy="backoff", backoff=BackoffStrategy( initial_interval=1000, # 1 second max_interval=50000, # 50 seconds exponent=1.5, max_elapsed_time=300000 # 5 minutes ), retry_connection_errors=True ), timeout_ms=120000 # 2 minute timeout ) # Using as context manager (recommended for resource management) with UnstructuredClient(api_key_auth="your-api-key") as client: # Use client here pass ``` -------------------------------- ### List and Get Templates with Unstructured Client Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt Shows how to list all available templates and retrieve details for a specific template using the Unstructured client. ```python from unstructured_client import UnstructuredClient with UnstructuredClient(api_key_auth="your-api-key") as client: # List all available templates templates = client.templates.list_templates(request={}) for template in templates.response_list_templates: print(f"Template: {template.name} (ID: {template.id})") print(f" Description: {template.description}") # Get specific template details template = client.templates.get_template( request={"template_id": "template-uuid-here"} ) print(f"Template config: {template.template_information.config}") ``` -------------------------------- ### Manage Source Connectors (S3 Example) Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt This snippet shows how to manage source connectors, including creating an S3 connector with authentication details, listing all connectors, checking a connection, and deleting a connector. ```python from unstructured_client import UnstructuredClient from unstructured_client.models import operations, shared with UnstructuredClient(api_key_auth="your-api-key") as client: # Create an S3 source connector response = client.sources.create_source( request=operations.CreateSourceRequest( create_source_connector=shared.CreateSourceConnector( name="my-s3-source", type="s3", config={ "remote_url": "s3://my-bucket/documents/", "key": "AKIAIOSFODNN7EXAMPLE", "secret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" } ) ) ) source_id = response.source_connector_information.id print(f"Created source: {source_id}") # List all source connectors sources = client.sources.list_sources(request={}) for source in sources.response_list_sources: print(f"Source: {source.name} (ID: {source.id})") # Test source connection check = client.sources.create_connection_check_sources( request={"source_id": source_id} ) print(f"Connection status: {check.dag_node_connection_check.status}") # Delete source connector client.sources.delete_source(request={"source_id": source_id}) ``` -------------------------------- ### Configure PDF Partitioning Parameters Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Examples for configuring PDF processing, including concurrency levels, page ranges, and error handling modes. ```python req = operations.PartitionRequest( partition_parameters=shared.PartitionParameters( files=files, strategy="fast", languages=["eng"], split_pdf_concurrency_level=8 ) ) ``` ```python req = operations.PartitionRequest( partition_parameters=shared.PartitionParameters( files=files, strategy="fast", languages=["eng"], split_pdf_page_range=[10,15], ) ) ``` ```python req = operations.PartitionRequest( partition_parameters=shared.PartitionParameters( files=files, strategy="fast", languages=["eng"], split_pdf_allow_failed=True, ) ) ``` -------------------------------- ### GET /api/v1/destinations/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/destinations/README.md Retrieve a list of available destination connectors. ```APIDOC ## GET /api/v1/destinations/ ### Description Retrieve a list of available destination connectors. ### Method GET ### Endpoint /api/v1/destinations/ ### Parameters #### Query Parameters - **request** (operations.ListDestinationsRequest) - Required - The request object to use for the request. - **retries** (Optional[utils.RetryConfig]) - Optional - Configuration to override the default retry behavior of the client. ### Response #### Success Response (200) - **response_list_destinations** (operations.ListDestinationsResponse) - Description ### Request Example ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.destinations.list_destinations(request={}) assert res.response_list_destinations is not None # Handle response print(res.response_list_destinations) ``` ### Response Example ```json { "example": "response body" } ``` ### Errors - **errors.HTTPValidationError** (422) - application/json - **errors.SDKError** (4XX, 5XX) - */* ``` -------------------------------- ### Get Job Information with Python Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves detailed information for a specific job using its unique job ID. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.jobs.get_job(request={ "job_id": "d95a05b3-3446-4f3d-806c-904b6a7ba40a", }) assert res.job_information is not None # Handle response print(res.job_information) ``` -------------------------------- ### Get Source Connector Information Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieve detailed information for a specific source connector using its ID. Requires the UnstructuredClient to be initialized. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.sources.get_source(request={ "source_id": "df7d5ab1-bb15-4f1a-8dc0-c92a9a28a585", }) assert res.source_connector_information is not None # Handle response print(res.source_connector_information) ``` -------------------------------- ### Partition files using the Unstructured client Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/general/README.md Demonstrates how to initialize the UnstructuredClient and call the partition method with specific chunking and VLM parameters. ```python from unstructured_client import UnstructuredClient from unstructured_client.models import shared with UnstructuredClient() as uc_client: res = uc_client.general.partition(request={ "partition_parameters": { "chunking_strategy": "by_title", "files": { "content": open("example.file", "rb"), "file_name": "example.file", }, "split_pdf_cache_tmp_data_dir": "", "split_pdf_page_range": [ 1, 10, ], "strategy": shared.Strategy.AUTO, "vlm_model": "gpt-4o", "vlm_model_provider": shared.VLMModelProvider.OPENAI, }, }) assert res.elements is not None # Handle response print(res.elements) ``` -------------------------------- ### GET /jobs/{job_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/operations/getjobdetailsrequest.md Retrieves the details of a specific job by its ID. ```APIDOC ## GET /jobs/{job_id} ### Description Retrieves the details of a specific job using the provided job_id. ### Method GET ### Endpoint /jobs/{job_id} ### Parameters #### Path Parameters - **job_id** (str) - Required - The unique identifier of the job. #### Request Body - **unstructured_api_key** (OptionalNullable[str]) - Optional - The API key for authentication. ``` -------------------------------- ### Create and Monitor Jobs with Unstructured Client Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt Demonstrates creating an on-demand job, listing jobs with filters, polling job status, retrieving processing details, handling failed files, downloading output, and cancelling jobs. ```python from unstructured_client import UnstructuredClient from unstructured_client.models import operations, shared import time with UnstructuredClient(api_key_auth="your-api-key") as client: # Create an on-demand job with a file with open("document.pdf", "rb") as f: response = client.jobs.create_job( request=operations.CreateJobRequest( body_create_job=shared.BodyCreateJob( request_data='{"template_id": "your-template-id"}' ) ) ) job_id = response.job_information.id # List all jobs with optional filtering jobs = client.jobs.list_jobs( request=operations.ListJobsRequest( workflow_id="workflow-uuid", # Optional filter status="running" # Optional filter ) ) for job in jobs.response_list_jobs: print(f"Job: {job.id} (Status: {job.status})") # Poll job status until completion while True: job_info = client.jobs.get_job(request={"job_id": job_id}) status = job_info.job_information.status print(f"Job status: {status}") if status in ["completed", "failed", "cancelled"]: break time.sleep(10) # Get detailed job processing information details = client.jobs.get_job_details(request={"job_id": job_id}) print(f"Files processed: {details.job_details.files_processed}") print(f"Files failed: {details.job_details.files_failed}") # Get list of failed files with error details failed = client.jobs.get_job_failed_files(request={"job_id": job_id}) for file in failed.job_failed_files.files: print(f"Failed: {file.filename} - {file.error}") # Download job output (for workflows with runtime file input) output = client.jobs.download_job_output(request={"job_id": job_id}) # Cancel a running job client.jobs.cancel_job(request={"job_id": job_id}) ``` -------------------------------- ### Create a job with the Unstructured Client Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Demonstrates initializing the client and creating a job. Use this pattern for standard API interactions. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.jobs.create_job(request={ "body_create_job": { "request_data": "", }, }) assert res.job_information is not None # Handle response print(res.job_information) ``` -------------------------------- ### Get Job Details Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves processing details for a specific job by its ID. ```APIDOC ## GET /api/v1/jobs/{job_id}/details ### Description Retrieve processing details for a specific job by its ID. ### Method GET ### Endpoint /api/v1/jobs/{job_id}/details ### Parameters #### Path Parameters - **job_id** (string) - Required - The ID of the job to retrieve details for. ### Response #### Success Response (200) - **details** (object) - The processing details of the job. #### Response Example ```json { "details": { "example": "job processing details" } } ``` ### Errors - **errors.HTTPValidationError** (422) - Unprocessable Entity - **errors.SDKError** (4XX, 5XX) - SDK Error ``` -------------------------------- ### Manage SDK resources with context managers Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Shows how to use the client as a context manager for both synchronous and asynchronous applications to ensure proper resource cleanup. ```python from unstructured_client import UnstructuredClient def main(): with UnstructuredClient() as uc_client: # Rest of application here... # Or when using async: async def amain(): async with UnstructuredClient() as uc_client: # Rest of application here... ``` -------------------------------- ### Get Job Information Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves detailed information for a specific job by its ID. ```APIDOC ## GET /api/v1/jobs/{job_id} ### Description Retrieve detailed information for a specific job by its ID. ### Method GET ### Endpoint /api/v1/jobs/{job_id} ### Parameters #### Path Parameters - **job_id** (string) - Required - The ID of the job to retrieve. ### Response #### Success Response (200) - **job_information** (object) - The job information. #### Response Example ```json { "job_information": { "example": "job details" } } ``` ### Errors - **errors.HTTPValidationError** (422) - Unprocessable Entity - **errors.SDKError** (4XX, 5XX) - SDK Error ``` -------------------------------- ### GET /api/v1/workflows/{workflow_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/workflows/README.md Retrieve detailed information for a specific workflow by its ID. ```APIDOC ## GET /api/v1/workflows/{workflow_id} ### Description Retrieve detailed information for a specific workflow by its ID. ### Method GET ### Endpoint /api/v1/workflows/{workflow_id} ### Parameters #### Path Parameters - **workflow_id** (string) - Required - The ID of the workflow to retrieve. #### Query Parameters None #### Request Body None ### Request Example ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.workflows.get_workflow(request={ "workflow_id": "d031b0e5-7ca7-4a2b-b3cc-d869d2df3e76", }) assert res.workflow_information is not None # Handle response print(res.workflow_information) ``` ### Response #### Success Response (200) - **workflow_information** (object) - Contains detailed information about the workflow. #### Response Example ```json { "workflow_information": { "id": "d031b0e5-7ca7-4a2b-b3cc-d869d2df3e76", "name": "Example Workflow", "state": "active", "created_at": "2023-01-01T12:00:00Z", "updated_at": "2023-01-01T12:00:00Z" } } ``` ### Errors - errors.HTTPValidationError (422, application/json) - errors.SDKError (4XX, 5XX, */*) ``` -------------------------------- ### Set Up Debug Logging for Unstructured Client Source: https://context7.com/unstructured-io/unstructured-python-client/llms.txt Configure basic logging to DEBUG level and initialize the UnstructuredClient with an API key and the debug logger. This is useful for troubleshooting client interactions. ```python import logging logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger("unstructured_client") client = UnstructuredClient( api_key_auth="your-api-key", debug_logger=logger ) ``` -------------------------------- ### GET /download_job_output Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/operations/downloadjoboutputrequest.md Retrieves the output file for a specific job and file ID. ```APIDOC ## GET /download_job_output ### Description Retrieves the output file associated with a specific job and file ID. ### Method GET ### Parameters #### Query Parameters - **file_id** (str) - Required - ID of the file to download - **job_id** (str) - Required - N/A - **node_id** (OptionalNullable[str]) - Optional - Node ID to retrieve the corresponding output file. If not provided, uses the last node in the workflow. - **unstructured_api_key** (OptionalNullable[str]) - Optional - N/A ``` -------------------------------- ### Get Job Details API Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves detailed processing information for a specific job. ```APIDOC ## GET /api/v1/jobs/{job_id}/details/ ### Description Get job processing details. ### Method GET ### Endpoint /api/v1/jobs/{job_id}/details/ ### Parameters #### Path Parameters - **job_id** (string) - Required - The ID of the job to get details for. ### Response #### Success Response (200) - **details** (object) - The processing details of the job. #### Response Example ```json { "details": { "example": "job processing details" } } ``` ### Errors - errors.SDKError (4XX, 5XX) - */* ``` -------------------------------- ### Run SDK in Python shell with uvx Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Execute the SDK directly in a Python shell using uvx. ```shell uvx --from unstructured-client python ``` -------------------------------- ### Get Job API Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves details for a specific job using its unique identifier. ```APIDOC ## GET /api/v1/jobs/{job_id}/ ### Description Get a specific job. ### Method GET ### Endpoint /api/v1/jobs/{job_id}/ ### Parameters #### Path Parameters - **job_id** (string) - Required - The ID of the job to retrieve. ### Response #### Success Response (200) - **job** (object) - The job object with details. #### Response Example ```json { "job": { "example": "job object details" } } ``` ### Errors - errors.SDKError (4XX, 5XX) - */* ``` -------------------------------- ### GET /api/v1/destinations/{destination_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/destinations/README.md Retrieves detailed information for a specific destination connector by its ID. ```APIDOC ## GET /api/v1/destinations/{destination_id} ### Description Retrieve detailed information for a specific destination connector by its ID. ### Method GET ### Endpoint /api/v1/destinations/{destination_id} ### Parameters #### Path Parameters - **destination_id** (string) - Required - The ID of the destination connector. ### Response #### Success Response (200) - **destination_connector_information** (object) - Detailed information about the destination connector. ``` -------------------------------- ### Partition a Document with Unstructured Client Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/_jupyter/README_example.ipynb Initializes the UnstructuredClient with an API key and partitions a specified PDF file. It demonstrates how to read a file, create a partition request with OCR options, and handle potential SDK errors. ```python from unstructured_client import UnstructuredClient from unstructured_client.models import shared from unstructured_client.models.errors import SDKError s = UnstructuredClient(api_key_auth=get_api_key()) filename = "../_sample_docs/layout-parser-paper-fast.pdf" with open(filename, "rb") as f: # Note that this currently only supports a single filea files=shared.Files( content=f.read(), file_name=filename, ) req = shared.PartitionParameters( files=files, # Other partition params strategy='ocr_only', languages=["eng"], ) try: resp = s.general.partition(req) print(resp.elements[0]) except SDKError as e: print(e) ``` -------------------------------- ### POST /api/v1/sources/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Create a new source connector using the provided configuration and name. ```APIDOC ## POST /api/v1/sources/ ### Description Create a new source connector using the provided configuration and name. ### Method POST ### Endpoint /api/v1/sources/ ### Parameters #### Request Body - **request** (operations.CreateSourceRequestRequest) - Required - The request object containing source connector details. - **create_source_connector** (shared.CreateSourceConnector) - Required - Configuration for the new source connector. - **config** (shared.SourceConnectorConfig) - Required - Configuration details for the source connector. - **catalog** (string) - Optional - The catalog for the source connector. - **client_id** (string) - Optional - The client ID for authentication. - **client_secret** (string) - Optional - The client secret for authentication. - **host** (string) - Required - The host address for the source connector. - **volume** (string) - Optional - The volume for the source connector. - **volume_path** (string) - Optional - The volume path for the source connector. - **name** (string) - Required - The name of the source connector. - **type** (shared.SourceConnectorType) - Required - The type of the source connector (e.g., SALESFORCE). ### Request Example ```json { "create_source_connector": { "config": { "catalog": "", "client_id": "", "client_secret": "", "host": "athletic-nudge.org", "volume": "", "volume_path": "" }, "name": "", "type": "SALESFORCE" } } ``` ### Response #### Success Response (200) - **source_connector_information** (shared.SourceConnectorInformation) - Information about the newly created source connector. #### Response Example ```json { "source_connector_information": { "id": "a1b2c3d4-e5f6-7890-1234-567890abcdef", "name": "My Salesforce Source", "type": "SALESFORCE", "last_connection_check": null } } ``` ``` -------------------------------- ### GET /api/v1/destinations/{destination_id}/connection-check Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/destinations/README.md Performs a connection check for a specific destination connector. ```APIDOC ## GET /api/v1/destinations/{destination_id}/connection-check ### Description Performs a connection check for a specific destination connector by its ID. ### Method GET ### Endpoint /api/v1/destinations/{destination_id}/connection-check ### Parameters #### Path Parameters - **destination_id** (string) - Required - The ID of the destination connector. ### Response #### Success Response (200) - **dag_node_connection_check** (object) - The result of the connection check. ### Errors - **422** - HTTPValidationError - **4XX, 5XX** - SDKError ``` -------------------------------- ### GET /api/v1/sources/{source_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieve detailed information for a specific source connector by its ID. ```APIDOC ## GET /api/v1/sources/{source_id} ### Description Retrieve detailed information for a specific source connector by its ID. ### Method GET ### Endpoint /api/v1/sources/{source_id} ### Parameters #### Path Parameters - **source_id** (string) - Required - The unique identifier of the source connector. ### Response #### Success Response (200) - **source_connector_information** (object) - Detailed information about the source connector. ### Errors - **422** - HTTPValidationError - **4XX, 5XX** - SDKError ``` -------------------------------- ### List Jobs with UnstructuredClient Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Uses the UnstructuredClient to list jobs. Requires an initialized client instance. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.jobs.list_jobs(request={}) assert res.response_list_jobs is not None # Handle response print(res.response_list_jobs) ``` -------------------------------- ### POST /general/v0/general Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/general/README.md The partition endpoint processes documents, allowing for chunking and strategy-based partitioning. ```APIDOC ## POST /general/v0/general ### Description Processes documents using specified partitioning strategies and options. ### Method POST ### Endpoint /general/v0/general ### Parameters #### Request Body - **request** (operations.PartitionRequest) - Required - The request object to use for the request. - **retries** (Optional[utils.RetryConfig]) - Optional - Configuration to override the default retry behavior of the client. ### Request Example ```json { "partition_parameters": { "chunking_strategy": "by_title", "files": { "content": "", "file_name": "example.file" }, "split_pdf_cache_tmp_data_dir": "", "split_pdf_page_range": [ 1, 10 ], "strategy": "auto", "vlm_model": "gpt-4o", "vlm_model_provider": "openai" } } ``` ### Response #### Success Response (200) - **elements** (list) - Description of the elements returned after partitioning. #### Response Example ```json { "elements": [ { "type": "title", "text": "Example Document Title", "metadata": { "file_name": "example.file", "file_type": "text/plain", "page_number": 1 } } ] } ``` ### Errors - **errors.HTTPValidationError** (422) - Unprocessable Entity - **errors.ServerError** (5XX) - Server Error - **errors.SDKError** (4XX) - Client Error ``` -------------------------------- ### GET /templates/{template_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/operations/gettemplaterequest.md Retrieves a specific template by its ID using the Unstructured API. ```APIDOC ## GET /templates/{template_id} ### Description Retrieves a specific template by its ID. ### Method GET ### Endpoint /templates/{template_id} ### Parameters #### Path Parameters - **template_id** (str) - Required - The unique identifier of the template. #### Request Body - **unstructured_api_key** (OptionalNullable[str]) - Optional - The API key for authentication. ``` -------------------------------- ### GET /api/v1/jobs/{job_id}/failed-files Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves a list of files that failed processing for a specific job. ```APIDOC ## GET /api/v1/jobs/{job_id}/failed-files ### Description Retrieves a list of files that failed processing for a specific job. ### Method GET ### Endpoint /api/v1/jobs/{job_id}/failed-files ### Parameters #### Path Parameters - **job_id** (string) - Required - The unique identifier of the job. ### Response #### Success Response (200) - **job_failed_files** (array) - A list of files that failed during the job execution. ``` -------------------------------- ### Run standalone script with uv Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Create a standalone Python script with embedded dependency management using uv. ```python #!/usr/bin/env -S uv run --script # /// script # requires-python = ">=3.9" # dependencies = [ # "unstructured-client", # ] # /// from unstructured_client import UnstructuredClient sdk = UnstructuredClient( # SDK arguments ) # Rest of script here... ``` -------------------------------- ### GET /api/v1/jobs/{job_id}/details Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Retrieves the detailed information for a specific job by its unique identifier. ```APIDOC ## GET /api/v1/jobs/{job_id}/details ### Description Retrieves the detailed information for a specific job by its unique identifier. ### Method GET ### Endpoint /api/v1/jobs/{job_id}/details ### Parameters #### Path Parameters - **job_id** (string) - Required - The unique identifier of the job. ### Response #### Success Response (200) - **job_details** (object) - The details of the requested job. ``` -------------------------------- ### GET /api/v1/destinations/{destination_id}/check Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/destinations/README.md Retrieves the most recent connection check for the specified destination connector. ```APIDOC ## GET /api/v1/destinations/{destination_id}/check ### Description Retrieves the most recent connection check for the specified destination connector. ### Method GET ### Endpoint /api/v1/destinations/{destination_id}/check ``` -------------------------------- ### AzureDestinationConnectorConfigInput Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/shared/azuredestinationconnectorconfiginput.md Configuration details for connecting to Azure storage. ```APIDOC ## AzureDestinationConnectorConfigInput ### Description Configuration parameters for the Azure Destination Connector. ### Fields #### Request Body - **account_key** (OptionalNullable[str]) - Optional - Azure storage account key. - **account_name** (OptionalNullable[str]) - Optional - Azure storage account name. - **connection_string** (OptionalNullable[str]) - Optional - Azure storage connection string. - **remote_url** (str) - Required - The remote URL for Azure storage. - **sas_token** (OptionalNullable[str]) - Optional - Shared access signature token for Azure storage. ``` -------------------------------- ### POST /api/v1/destinations/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/destinations/README.md Creates a new destination connector with the specified configuration. ```APIDOC ## POST /api/v1/destinations/ ### Description Create a new destination connector using the provided configuration and name. ### Method POST ### Endpoint /api/v1/destinations/ ### Parameters #### Request Body - **request** (operations.CreateDestinationRequest) - Required - The request object containing the destination connector details. - **create_destination_connector** (shared.CreateDestinationConnector) - Required - Configuration for the new destination connector. - **name** (string) - Required - The name of the destination connector. - **type** (shared.DestinationConnectorType) - Required - The type of the destination connector (e.g., ELASTICSEARCH). - **config** (shared.MongoDBConnectorConfigInput) - Required - Configuration specific to the MongoDB connector. - **uri** (string) - Required - The MongoDB connection URI. - **database** (string) - Required - The name of the database. - **collection** (string) - Required - The name of the collection. ### Request Example ```json { "create_destination_connector": { "name": "", "type": "ELASTICSEARCH", "config": { "uri": "https://criminal-bowler.com", "database": "", "collection": "" } } } ``` ### Response #### Success Response (200) - **destination_connector_information** (shared.DestinationConnectorInformation) - Information about the newly created destination connector. #### Response Example ```json { "destination_connector_information": { "id": "some-uuid", "name": "my-mongo-destination", "type": "ELASTICSEARCH", "last_error": null, "last_success": null, "enabled": true } } ``` ``` -------------------------------- ### GET /api/v1/sources/{source_id}/connection-check Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieves the most recent connection check for the specified source connector. ```APIDOC ## GET /api/v1/sources/{source_id}/connection-check ### Description Retrieves the most recent connection check for the specified source connector. ### Method GET ### Endpoint /api/v1/sources/{source_id}/connection-check ### Parameters #### Path Parameters - **source_id** (string) - Required - The ID of the source connector. ### Response #### Success Response (200) - **dag_node_connection_check** (object) - The connection check details. ``` -------------------------------- ### Create New Source Connector Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Creates a new source connector by providing configuration details and the connector type. ```python from unstructured_client import UnstructuredClient from unstructured_client.models import shared with UnstructuredClient() as uc_client: res = uc_client.sources.create_source(request={ "create_source_connector": { "config": { "catalog": "", "client_id": "", "client_secret": "", "host": "athletic-nudge.org", "volume": "", "volume_path": "", }, "name": "", "type": shared.SourceConnectorType.SALESFORCE, }, }) assert res.source_connector_information is not None # Handle response print(res.source_connector_information) ``` -------------------------------- ### GET /api/v1/templates/{template_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/templates/README.md Retrieves detailed information and DAG for a specific template using its ID. ```APIDOC ## GET /api/v1/templates/{template_id} ### Description Retrieve detailed information and DAG for a specific template. ### Method GET ### Endpoint /api/v1/templates/{template_id} ### Parameters #### Path Parameters - **template_id** (string) - Required - The ID of the template to retrieve. #### Query Parameters None #### Request Body None ### Request Example ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.templates.get_template(request={ "template_id": "", }) assert res.template_detail is not None # Handle response print(res.template_detail) ``` ### Response #### Success Response (200) - **template_detail** (object) - Detailed information and DAG for the template. #### Response Example ```json { "template_detail": { "id": "string", "name": "string", "dag": "string" } } ``` ### Errors - **errors.HTTPValidationError** (422, application/json) - **errors.SDKError** (4XX, 5XX, */*) ``` -------------------------------- ### Perform Connection Check with Unstructured Python SDK Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/USAGE.md Demonstrates how to verify a destination connection using either synchronous or asynchronous patterns. ```python # Synchronous Example from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.destinations.create_connection_check_destinations(request={ "destination_id": "cb9e35c1-0b04-4d98-83fa-fa6241323f96", }) assert res.dag_node_connection_check is not None # Handle response print(res.dag_node_connection_check) ``` ```python # Asynchronous Example import asyncio from unstructured_client import UnstructuredClient async def main(): async with UnstructuredClient() as uc_client: res = await uc_client.destinations.create_connection_check_destinations_async(request={ "destination_id": "cb9e35c1-0b04-4d98-83fa-fa6241323f96", }) assert res.dag_node_connection_check is not None # Handle response print(res.dag_node_connection_check) asyncio.run(main()) ``` -------------------------------- ### GET /jobs/{job_id} Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/operations/getjobrequest.md Retrieves the status and details of a specific job using its unique identifier. ```APIDOC ## GET /jobs/{job_id} ### Description Retrieves the status and details of a specific job using its unique identifier. ### Method GET ### Endpoint /jobs/{job_id} ### Parameters #### Path Parameters - **job_id** (str) - Required - The unique identifier of the job. #### Request Body - **unstructured_api_key** (OptionalNullable[str]) - Optional - The API key for authentication. ``` -------------------------------- ### GET /jobs/{job_id}/failed_files Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/operations/getjobfailedfilesrequest.md Retrieves a list of files that failed during the processing of a specified job. ```APIDOC ## GET /jobs/{job_id}/failed_files ### Description Retrieves the list of files that failed to process for a given job ID. ### Method GET ### Endpoint /jobs/{job_id}/failed_files ### Parameters #### Path Parameters - **job_id** (str) - Required - The unique identifier of the job. #### Request Body - **unstructured_api_key** (OptionalNullable[str]) - Optional - The API key for authentication. ``` -------------------------------- ### List Destinations with Unstructured Client Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/destinations/README.md Use this snippet to retrieve a list of all available destination connectors configured in the system. Ensure the UnstructuredClient is initialized. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.destinations.list_destinations(request={}) assert res.response_list_destinations is not None # Handle response print(res.response_list_destinations) ``` -------------------------------- ### GET /jobs Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/operations/listjobsrequest.md Retrieves a list of jobs based on the provided status, workflow ID, or API key. ```APIDOC ## GET /jobs ### Description Retrieves a list of jobs filtered by status, workflow ID, or API key. ### Method GET ### Endpoint /jobs ### Parameters #### Query Parameters - **status** (OptionalNullable[str]) - Optional - N/A - **unstructured_api_key** (OptionalNullable[str]) - Optional - N/A - **workflow_id** (OptionalNullable[str]) - Optional - N/A ``` -------------------------------- ### Configure Custom HTTP Headers Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/README.md Initialize the UnstructuredClient with a custom httpx.Client instance to inject headers into every request. ```python from unstructured_client import UnstructuredClient import httpx http_client = httpx.Client(headers={"x-custom-header": "someValue"}) s = UnstructuredClient(client=http_client) ``` -------------------------------- ### POST /api/v1/workflows/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/workflows/README.md Creates a new custom or auto workflow with specified configuration settings. ```APIDOC ## POST /api/v1/workflows/ ### Description Create a new workflow, either custom or auto, and configure its settings. ### Method POST ### Endpoint /api/v1/workflows/ ### Parameters #### Request Body - **request** (operations.CreateWorkflowRequest) - Required - The request object containing workflow configuration. - **retries** (utils.RetryConfig) - Optional - Configuration to override the default retry behavior. ### Response #### Success Response (200) - **workflow_information** (object) - The created workflow details. ``` -------------------------------- ### Get connection check for a source Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieves the most recent connection check status for a given source connector ID. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.sources.get_connection_check_sources(request={ "source_id": "4df23b66-dae2-44ea-8dd3-329184d5644a", }) assert res.dag_node_connection_check is not None # Handle response print(res.dag_node_connection_check) ``` -------------------------------- ### List Source Connectors Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/sources/README.md Retrieves a list of available source connectors using the UnstructuredClient. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.sources.list_sources(request={}) assert res.response_list_sources is not None # Handle response print(res.response_list_sources) ``` -------------------------------- ### Get Unstructured Job Details Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/jobs/README.md Use this to retrieve the status and details of a specific job by its ID. Ensure the UnstructuredClient is initialized. ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.jobs.get_job_details(request={ "job_id": "14cc95f9-4174-46b3-81f5-7089b87a4787", }) assert res.job_details is not None # Handle response print(res.job_details) ``` -------------------------------- ### Qdrant Cloud Destination Connector Configuration Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/shared/destinationconnectorinformationconfig.md Configuration for the Qdrant Cloud destination connector. ```python value: shared.QdrantCloudDestinationConnectorConfig = /* values here */ ``` -------------------------------- ### Client Configuration Fields Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/models/shared/opensearchconnectorconfig.md Details the configuration parameters for setting up the client, including required host lists and optional AWS authentication credentials. ```APIDOC ## Client Configuration Fields ### Description Defines the configuration fields used to initialize the client connection, supporting both standard host lists and AWS IAM authentication. ### Parameters #### Request Body - **hosts** (List[str]) - Required - List of OpenSearch hosts to connect to - **aws_access_key_id** (OptionalNullable[str]) - Optional - AWS access key ID for IAM authentication. When provided with aws_secret_access_key, IAM authentication is used instead of basic auth. - **aws_secret_access_key** (OptionalNullable[str]) - Optional - AWS secret access key for IAM authentication. Required when aws_access_key_id is provided. - **aws_session_token** (OptionalNullable[str]) - Optional - AWS session token for temporary credentials. Only used when aws_access_key_id and aws_secret_access_key are provided. ``` -------------------------------- ### GET /api/v1/workflows/ Source: https://github.com/unstructured-io/unstructured-python-client/blob/main/docs/sdks/workflows/README.md Retrieve a list of workflows, optionally filtered by source, destination, state, name, date range, and supports pagination and sorting. ```APIDOC ## GET /api/v1/workflows/ ### Description Retrieve a list of workflows, optionally filtered by source, destination, state, name, date range, and supports pagination and sorting. ### Method GET ### Endpoint /api/v1/workflows/ ### Parameters #### Path Parameters None #### Query Parameters - **source** (string) - Optional - Filter workflows by source. - **destination** (string) - Optional - Filter workflows by destination. - **state** (string) - Optional - Filter workflows by state. - **name** (string) - Optional - Filter workflows by name. - **start_date** (string) - Optional - Filter workflows by start date (YYYY-MM-DD). - **end_date** (string) - Optional - Filter workflows by end date (YYYY-MM-DD). - **sort_by** (string) - Optional - Field to sort workflows by. - **order** (string) - Optional - Order of sorting (asc or desc). - **page_number** (integer) - Optional - Page number for pagination. - **page_size** (integer) - Optional - Number of items per page. #### Request Body None ### Request Example ```python from unstructured_client import UnstructuredClient with UnstructuredClient() as uc_client: res = uc_client.workflows.list_workflows(request={}) assert res.response_list_workflows is not None # Handle response print(res.response_list_workflows) ``` ### Response #### Success Response (200) - **response_list_workflows** (array) - A list of workflow objects. #### Response Example ```json { "response_list_workflows": [ { "id": "d031b0e5-7ca7-4a2b-b3cc-d869d2df3e76", "name": "Example Workflow 1", "state": "active", "created_at": "2023-01-01T12:00:00Z", "updated_at": "2023-01-01T12:00:00Z" }, { "id": "a1b2c3d4-e5f6-7890-1234-567890abcdef", "name": "Example Workflow 2", "state": "inactive", "created_at": "2023-01-02T12:00:00Z", "updated_at": "2023-01-02T12:00:00Z" } ] } ``` ### Errors - errors.HTTPValidationError (422, application/json) - errors.SDKError (4XX, 5XX, */*) ```