### Install ZeroEntropy SDK Source: https://docs.zeroentropy.dev/quickstart Install the ZeroEntropy SDK using pip for Python or npm for Node.js. ```text ZeroEntropy SDK Helper ### Description: ZeroEntropy is a state-of-the-art retrieval API for documents, pages, snippets and reranking. It provides low-latency, high-accuracy search over your private corpus via a simple Python SDK. ZeroEntropy can be installed using: • Python: pip install zeroentropy • Node.js: npm install zeroentropy ### Client Usage from zeroentropy import ZeroEntropy client = ZeroEntropy(api_key="your_api_key") Auth & Configuration: • ENV VARS read by SDK: ZEROENTROPY_API_KEY Missing key triggers authentication error on instantiation. Instantiate: from dotenv import load_dotenv load_dotenv() from zeroentropy import AsyncZeroEntropy, ConflictError, HTTPStatusError zclient = AsyncZeroEntropy() # picks up ENV VARS ### SDK Structure: All methods are async, grouped under: zclient.collections zclient.documents zclient.status zclient.queries zclient.models Each method returns structured responses defined by pydantic.BaseModel. ### Collections • client.collections.add(collection_name: str) -> None Always specify a collection name using client.collections.add(collection_name="my_collection") If the collection already exists, it will be throw an error, so you need to check if the collection exists first. • client.collections.get_list() -> List[str] • client.collections.delete(collection_name: str) -> None ### Documents • client.documents.add(collection_name: str, path: str, content, metadata: dict = None, overwrite: bool = False) -> None The add method already handles parsing for PDFs etc. The content dict can take the following formats: content={"type":"auto", "base64_data":"my_document.pdf"} for a PDF, content={"type":"text", "text":"my_document.pdf"} for a text file, and content={"type":"text-pages", "pages":[ "page 1 content", "page 2 content"]} for pages of text. If the document already exists, it will be throw an error, so you need to check if the document exists first. • client.documents.get_info(collection_name: str, path: str, include_content: bool = False) -> DocumentResponse • client.documents.get_info_list(collection_name: str, limit: int = 1024, path_gt: Optional[str] = None) -> List[DocumentGetInfoListResponse] • client.documents.update(collection_name: str, path: str, metadata: Optional[dict]) -> UpdateDocumentResponse • client.documents.delete(collection_name: str, path: Union[str, List[str]]) -> DocumentDeleteResponse ### Queries • client.queries.top_documents(collection_name: str, query: str, k: int, filter: Optional[dict] = None, include_metadata: bool = False, latency_mode: str = "low") -> List[DocumentRetrievalResponse] • client.queries.top_pages(collection_name: str, query: str, k: int, filter: Optional[dict] = None, include_content: bool = False, latency_mode: str = "low") -> List[PageRetrievalResponse] • client.queries.top_snippets(collection_name: str, query: str, k: int, filter: Optional[dict] = None, precise_responses: bool = False) -> List[SnippetResponse] ### Status • client.status.get_status(collection_name: Optional[str] = None) -> StatusGetStatusResponse ### Models • client.models.embed(input: Union[str, List[str]], input_type: "query" | "document", model: str, dimensions: Optional[int] = None, encoding_format: "float" | "base64" = "float", latency: Optional["fast" | "slow"] = None) -> ModelEmbedResponse • client.models.rerank(documents: List[str], model: str, query: str, top_n: Optional[int] = None) -> ModelRerankResponse Common Patterns: 1 Collections try: await zclient.collections.add(collection_name="my_col") except ConflictError: pass names = (await zclient.collections.get_list()).collection_names await zclient.collections.delete(collection_name="my_col") 2 Documents # Add text await zclient.documents.add( collection_name="col", path="doc.txt", content={"type":"text","text":text}, metadata={"source":"notes"}, ) # Add PDF via OCR b64 = base64.b64encode(open(path,"rb").read()).decode() await zclient.documents.add( collection_name="col", path="doc.pdf", content={"type":"auto","base64_data":b64}, metadata={"type":"pdf"}, ) # Add CSV lines for i,line in enumerate(open(path).read().splitlines()): await zclient.documents.add( collection_name="col", path=f"{path}_{i}", content={"type":"text","text":line}, metadata={"type":"csv"}, ) # Delete await zclient.documents.delete(collection_name="col", path="doc.txt") ``` -------------------------------- ### Install Python SDK Source: https://docs.zeroentropy.dev/models Install the Zeroentropy Python SDK using pip. This is the first step to using the API and SDKs. ```python pip install zeroentropy ``` -------------------------------- ### Install Node.js SDK Source: https://docs.zeroentropy.dev/models Install the Zeroentropy Node.js SDK using npm. This is required for Node.js applications interacting with the API. ```javascript npm install zeroentropy ``` -------------------------------- ### Complete Async Example for ZeroEntropy SDK Source: https://docs.zeroentropy.dev/quickstart This example demonstrates how to use the ZeroEntropy SDK asynchronously to add a collection, add a document with metadata, retrieve collection status, and perform a top-k document query. It includes necessary imports, environment variable loading, and error handling for conflicts. ```python import asyncio from dotenv import load_dotenv from zeroentropy import AsyncZeroEntropy, ConflictError, HTTPStatusError import base64 load_dotenv() zclient = AsyncZeroEntropy() async def main(): try: await zclient.collections.add(collection_name="my_col") except ConflictError: pass text = "Hello ZeroEntropy" await zclient.documents.add( collection_name="my_col", path="hello.txt", content={"type":"text","text":text}, metadata={"lang":"en"}, ) status = await zclient.status.get_status(collection_name="my_col") print("Indexed:", status.num_indexed_documents) docs = await zclient.queries.top_documents( collection_name="my_col", query="Hello", k=1, include_metadata=True, ) print(docs.results) if __name__ == "__main__": asyncio.run(main()) ``` -------------------------------- ### Example Document Metadata Source: https://docs.zeroentropy.dev/metadata-filtering This is an example of how to structure metadata for a document. Note the use of 'list:' prefix for list-type attributes. ```python { "timestamp": "2024-12-12T20:00:45", "author": "Nicholas Pipitone", "language": "en", "list:tags": ["Artificial Intelligence", "Technology", "Documentation"], "list:write-permissions": ["admin", "author"], "list:read-permissions": ["all"] } ``` -------------------------------- ### Install ZeroEntropy Package (TypeScript/JavaScript) Source: https://docs.zeroentropy.dev/examples/setup Install the official ZeroEntropy package for TypeScript or JavaScript using npm. ```typescript npm install zeroentropy ``` -------------------------------- ### Install ZeroEntropy Package (Python) Source: https://docs.zeroentropy.dev/examples/setup Install the official ZeroEntropy package for Python using pip. ```python pip install zeroentropy ``` -------------------------------- ### ZeroEntropy SDK Usage Source: https://docs.zeroentropy.dev/quickstart This snippet demonstrates the basic usage of the ZeroEntropy SDK, including installation, authentication, and client instantiation. ```APIDOC ## ZeroEntropy SDK Usage ### Description This section covers the installation, authentication, and client instantiation for the ZeroEntropy Python SDK. ### Installation - **Python:** `pip install zeroentropy` - **Node.js:** `npm install zeroentropy` ### Authentication & Configuration - **Environment Variables:** The SDK reads `ZEROENTROPY_API_KEY` from environment variables. - Missing API key triggers an authentication error on instantiation. ### Client Instantiation **Synchronous Client:** ```python from zeroentropy import ZeroEntropy client = ZeroEntropy(api_key="your_api_key") ``` **Asynchronous Client (using environment variables):** ```python from dotenv import load_dotenv load_dotenv() from zeroentropy import AsyncZeroEntropy zclient = AsyncZeroEntropy() # picks up ENV VARS ``` ### SDK Structure All methods are asynchronous and grouped under the following attributes: - `zclient.collections` - `zclient.documents` - `zclient.status` - `zclient.queries` - `zclient.models` Each method returns structured responses defined by pydantic.BaseModel. ``` -------------------------------- ### Embed text with custom parameters Source: https://docs.zeroentropy.dev/examples/embed Use these examples to configure embedding output dimensions, encoding format, and latency settings. ```python response = zclient.models.embed( model="zembed-1", input="What is RAG?", input_type="query", dimensions=320, encoding_format="float", latency="fast", ) ``` ```typescript const response = await zclient.models.embed({ model: "zembed-1", input: "What is RAG?", input_type: "query", dimensions: 320, encoding_format: "float", latency: "fast", }); ``` -------------------------------- ### Send First Query in Python Source: https://docs.zeroentropy.dev/quickstart Use this Python snippet to create a collection, add a document, wait for it to be indexed, and then query it. Ensure you have the ZeroEntropy library installed. ```Python from zeroentropy import ZeroEntropy import time zclient = ZeroEntropy() # Create a collection collection = zclient.collections.add(collection_name="default") # Add a text file to the collection document = zclient.documents.add( collection_name="default", path="docs/document.txt", content={ "type": "text", "text": "My favorite apple is the Granny Smith.", }, ) # Wait until the document is indexed while True: status = zclient.documents.get_info(collection_name="default", path="docs/document.txt") if status.document.index_status == "indexed": print("Document is indexed.") break time.sleep(1) # Query the collection response = zclient.queries.top_documents( collection_name="default", query="What is the best apple?", k=1, ) print(response.results) ``` -------------------------------- ### DocumentGetInfoResponse Model Source: https://docs.zeroentropy.dev/quickstart Defines the structure for the response when getting document information, including document details and optional content. ```python class DocumentGetInfoResponse(BaseModel): document: Document class Document(BaseModel): id: str # UUID of the document collection_name: str path: str file_url: str # URL to download raw document size: int # Raw document size in bytes metadata: Dict[str, Union[str, List[str]]] # Metadata key-value pairs index_status: str # Enum: "not_parsed", "parsing", "not_indexed", "indexing", "indexed", "parsing_failed", "indexing_failed" num_pages: Optional[int] = None # Can be null content: Optional[str] = None # Null unless `include_content=True` ``` -------------------------------- ### Send First Query in TypeScript Source: https://docs.zeroentropy.dev/quickstart This TypeScript snippet shows how to create a collection, add a document, wait for indexing, and then query. It requires the ZeroEntropy library to be installed. ```TypeScript import { ZeroEntropy } from 'zeroentropy' const zclient = new ZeroEntropy() // Create a collection const collection = await zclient.collections.add({ collection_name: "default", }) // Add a text file to the collection const document = await zclient.documents.add({ collection_name: "default", path: "docs/document.txt", content: { type: "text", text: "My favorite apple is the Granny Smith.", }, }) // Wait until the document is indexed let indexed = false; while (!indexed) { const status = await zclient.documents.getInfo({ collection_name: "default", path: "docs/document.txt", }); if (status.document.index_status === "indexed") { console.log("Document is indexed."); indexed = true; } else { await new Promise(resolve => setTimeout(resolve, 1000)); } } // Query the collection const response = await zclient.queries.topDocuments({ collection_name: "default", query: "What is the best apple?", k: 1, }) console.log(response.results) ``` -------------------------------- ### Upload documents with pages in Python and TypeScript Source: https://docs.zeroentropy.dev/examples/upload Use these examples to upload multi-page text documents to a collection. Ensure the collection is created before adding documents. ```python from zeroentropy import ZeroEntropy zclient = ZeroEntropy() response = zclient.collections.add(collection_name="pages") # Upload text with pages, for TopK pages queries response = zclient.documents.add( collection_name="pages", path="docs/document_pages.txt", content={ "type": "text-pages", "pages": [ "page 1 content: My favorite apple is the Granny Smith.", "page 2 content: Search is a fun problem to work on.", ], }, ) print(response.message) ``` ```typescript import { ZeroEntropy } from 'zeroentropy' const zclient = new ZeroEntropy() // Add a document with pages to a new collection async function addTextPages() { try { // Add a new collection await zclient.collections.add({ collection_name: "pages", }); console.log("Collection 'pages' created successfully."); // Upload text with pages for TopK page queries const response = await zclient.documents.add({ collection_name: "pages", path: "docs/document_pages.txt", content: { type: "text-pages", pages: [ "page 1 content", "page 2 content", ], }, }); console.log(response.message); } catch (error) { console.error("Error:", error); } } addTextPages(); ``` -------------------------------- ### Get Status Source: https://docs.zeroentropy.dev/quickstart Retrieves the overall status of the ZeroEntropy system or the status for a specific collection. ```APIDOC ## GET /status/get_status ### Description Retrieves the overall status of the ZeroEntropy system or the status for a specific collection. ### Method GET ### Endpoint `/status/get_status` ### Parameters #### Query Parameters - **collection_name** (string) - Optional - The name of the collection to get status for. If not provided, returns overall status. ### Response #### Success Response (200) - **status** (StatusGetStatusResponse) - The status information. class StatusGetStatusResponse(BaseModel): num_documents: int num_parsing_documents: int num_indexing_documents: int num_indexed_documents: int num_failed_documents: int num_indexed_bytes: int ### Response Example { "num_documents": 100, "num_parsing_documents": 0, "num_indexing_documents": 5, "num_indexed_documents": 95, "num_failed_documents": 0, "num_indexed_bytes": 1048576 } ``` -------------------------------- ### Get Page Info Source: https://docs.zeroentropy.dev/quickstart Retrieves information about a specific page within a document, with an option to include content. ```APIDOC ## GET /documents/get_page_info ### Description Retrieves information about a specific page within a document, with an option to include content. ### Method GET ### Endpoint `/documents/get_page_info` ### Parameters #### Query Parameters - **collection_name** (string) - Required - The name of the collection. - **path** (string) - Required - The path to the document. - **page_index** (integer) - Required - The 0-indexed page number. - **include_content** (boolean) - Optional - Whether to include the page's content in the response. ### Response #### Success Response (200) - **page** (PageInfo) - Information about the specified page. class PageInfo(BaseModel): page_index: int content: Optional[str] = None ### Response Example { "page": { "page_index": 2, "content": "Content of page 2..." } } ``` -------------------------------- ### Get Page Info Source: https://docs.zeroentropy.dev/api-reference/documents/get-page-info Retrieves information about a specific page. The request parameters define what information you would like to receive. ```APIDOC ## GET /pages/{collectionName}/{documentPath} ### Description Retrieves information about a specific page. The request parameters define what information you would like to receive. A `404 Not Found` will be returned if either the collection name does not exist, or the document path does not exist within the provided collection. ### Method GET ### Endpoint /pages/{collectionName}/{documentPath} ### Parameters #### Path Parameters - **collectionName** (string) - Required - The name of the collection. - **documentPath** (string) - Required - The path to the document within the collection. #### Response #### Success Response (200) - **pageInfo** (object) - Information about the page. - **title** (string) - The title of the page. - **url** (string) - The URL of the page. - **lastModified** (string) - The last modified date of the page. #### Response Example ```json { "pageInfo": { "title": "Example Page Title", "url": "/example/page", "lastModified": "2023-10-27T10:00:00Z" } } ``` ``` -------------------------------- ### Top Pages Source: https://docs.zeroentropy.dev/api-reference/queries/top-pages Get the top K pages that match the given query. This endpoint allows for detailed search customization through various parameters. ```APIDOC ## POST /queries/top-pages ### Description Get the top K pages that match the given query. ### Method POST ### Endpoint /queries/top-pages ### Parameters #### Request Body - **collection_name** (string) - Required - The name of the collection. - **query** (string) - Required - The natural language query to search with. This cannot exceed 4096 UTF-8 bytes. - **k** (integer) - Required - The number of pages to return. If there are not enough pages matching your filters, then fewer may be returned. This number must be between 1 and 1024, inclusive. - **filter** (object | null) - Optional - The query filter to apply. Please read [Metadata Filtering](/metadata-filtering) for more information. If not provided, then all documents will be searched. - **include_content** (boolean) - Optional - If set to true, then the content of all pages will be returned. Defaults to false. - **include_metadata** (boolean) - Optional - Whether or not to include the document metadata in the response. If not provided, then the default will be `False`. Defaults to false. - **latency_mode** (string) - Optional - This option selects between our two latency modes. The higher latency mode takes longer, but can allow for more accurate responses. If desired, test both to customize your search experience for your particular use-case, or use the default of "low" and only swap if you need an additional improvement in search result quality. Enum: [low, high]. Defaults to low. ### Response #### Success Response (200) - **results** (array) - An array of page retrieval responses. - **document_results** (array) - The array of associated document information. Note how each result page has an associated document path. After deduplicating the document paths, this array will contain document info for each document path that is referenced by at least one page result. #### Error Response (400) - **detail** (string) - Description of Error #### Error Response (404) - **detail** (string) - Description of Error #### Error Response (422) - **detail** (array) - Contains validation error details. ``` -------------------------------- ### Rerank Documents with ZeroEntropy SDK (JavaScript) Source: https://docs.zeroentropy.dev/models Use the rerank model to improve document relevance for a given query in JavaScript. Requires API key and SDK installation. Reads API key from environment variable. ```javascript // Create an API Key at https://dashboard.zeroentropy.dev // npm install zeroentropy import ZeroEntropy from 'zeroentropy'; // or: const { ZeroEntropy } = require('zeroentropy'); // Initialize the ZeroEntropy client (reads ZEROENTROPY_API_KEY from env) const zclient = new ZeroEntropy(); const response = await zclient.models.rerank({ model: 'zerank-2', query: 'What is 2+2?', documents: [ '4', 'The answer is definitely 1 million.', ] }); console.log(JSON.stringify(response, null, 2)); ``` -------------------------------- ### Get Page Info Source: https://docs.zeroentropy.dev/quickstart Retrieve information for a specific page within a document, optionally including its content. Useful for detailed page-level inspection. ```python page = await zclient.documents.get_page_info( collection_name="col", path="doc.pdf", page_index=2, include_content=True, ) ``` -------------------------------- ### Filter by list attributes Source: https://docs.zeroentropy.dev/metadata-filtering Use the list: prefix for list-type attributes to ensure correct filtering. This example excludes documents tagged with 'tech' or 'food'. ```python results = await zclient.queries.top_snippets( collection_name="default", query="I'm looking for documents about apples", k=5, filter={ # Only `true` if "list:tags" contains NEITHER "tech" NOR "food" "list:tags": { "$nin": ["tech", "food"] } }, ) ``` -------------------------------- ### Rerank Documents with ZeroEntropy SDK Source: https://docs.zeroentropy.dev/models Use the rerank model to improve document relevance for a given query. Requires API key and SDK installation. Reads API key from environment variable. ```python # Create an API Key at https://dashboard.zeroentropy.dev # pip install zeroentropy from zeroentropy import ZeroEntropy # Initialize the ZeroEntropy client (reads ZEROENTROPY_API_KEY from env) zclient = ZeroEntropy() response = zclient.models.rerank( model="zerank-2", query="What is 2+2?", documents=[ "4", "The answer is definitely 1 million.", ], ) print(response.model_dump_json(indent=4)) ``` -------------------------------- ### Top Snippets Source: https://docs.zeroentropy.dev/api-reference/queries/top-snippets Get the top K snippets that match the given query. You can choose between coarse and precise snippets. Precise snippets will average ~200 characters, while coarse snippets will average ~2000 characters. The default is coarse snippets. Use the `precise_responses` parameter to adjust. ```APIDOC ## POST /queries/top-snippets ### Description Get the top K snippets that match the given query. You may choose between coarse and precise snippets. Precise snippets will average ~200 characters, while coarse snippets will average ~2000 characters. The default is coarse snippets. Use the `precise_responses` parameter to adjust. ### Method POST ### Endpoint /queries/top-snippets ### Parameters #### Request Body - **collection_name** (string) - Required - The name of the collection. - **query** (string) - Required - The natural language query to search with. This cannot exceed 4096 characters (A single UTF-8 codepoint, is considered to be 1 character). - **k** (integer) - Required - The number of snippets to return. If there are not enough snippets matching your filters, then fewer may be returned. This number must be between 1 and 128, inclusive. - **reranker** (string | null) - Optional - The reranker to use after initial retrieval. The default is `null`. You can find available model ids, along with more information, at [/models/rerank](/api-reference/models/rerank). - **filter** (string | null) - Optional - The query filter to apply. Please read [Metadata Filtering](/metadata-filtering) for more information. If not provided, then all documents will be searched. - **precise_responses** (boolean) - Optional - Enable precise responses. Precise responses will have higher latency, but provide much more precise snippets. When `precise_responses` is set to `true`, the responses will average 200 characters. If set to `false`, the responses will average 2000 characters. The default is `false`. - **include_document_metadata** (boolean) - Optional - If true, the `document_results` returns will additionally contain document metadata. This is false by default, as returning metadata can add overhead if the amount of data to return is large. ### Response #### Success Response (200) - **results** (array[SnippetResponse]) - The array of snippets returned by this endpoint. Each snippet result refers to a particular document path, and index range. Note that all documents, regardless of filetype, are converted into `UTF-8`-encoded strings. The `start_index` and `end_index` of a snippet refer to the range of characters in that string, that have been matched by this snippet. - **document_results** (array[DocumentRetrievalResponse]) - #### Error Response (400) - **detail** (Description of Error) - #### Error Response (404) - **detail** (Description of Error) - #### Error Response (422) - (HTTPValidationError) - ``` -------------------------------- ### OpenAPI Specification for Get Page Info Source: https://docs.zeroentropy.dev/api-reference/documents/get-page-info This OpenAPI 3.1.0 specification defines the POST request for the /documents/get-page-info endpoint. It includes details on request parameters, response schemas, and error handling. ```yaml openapi: 3.1.0 info: title: ZeroEntropy API description: This API provides access to ZeroEntropy's SoTA retrieval pipeline. Enjoy! version: 0.1.0 servers: - url: https://api.zeroentropy.dev/v1 description: ZeroEntropy API - url: https://eu-api.zeroentropy.dev/v1 description: ZeroEntropy API (EU datacenters) security: [] paths: /documents/get-page-info: post: tags: - Documents summary: Get Page Info description: >- Retrieves information about a specific page. The request parameters define what information you would like to receive. A `404 Not Found` will be returned if either the collection name does not exist, or the document path does not exist within the provided collection. operationId: get_page_info_documents_get_page_info_post requestBody: content: application/json: schema: $ref: '#/components/schemas/GetPageInfoRequest' required: true responses: '200': description: Successful Response content: application/json: schema: $ref: '#/components/schemas/GetPageInfoResponse' '400': description: Bad Request content: application/json: example: detail: Description of Error '404': description: Not Found content: application/json: example: detail: Description of Error '422': description: Validation Error content: application/json: schema: $ref: '#/components/schemas/HTTPValidationError' security: - HTTPBearer: [] components: schemas: GetPageInfoRequest: properties: collection_name: type: string title: Collection Name description: The name of the collection. path: type: string title: Path description: >- The filepath of the document whose page you are requesting. A `404 Not Found` status code will be returned if no document with this path was found. page_index: type: integer title: Page Index description: >- The specific page index whose info is being requested. Pages are 0-indexed, so that the 1st page of a PDF is of page index 0. You may use the `num_pages` attribute of `/documents/get-document-info` or `/documents/get-document-info-list` to know what the range of valid indices are. A `404 Not Found` status code will be returned if no such page index exists. include_content: type: boolean title: Include Content description: >- If `true`, then the response will have the `content` attribute be a `string`, rather than `null`. This string will contain the full contents of the page. default: false type: object required: - collection_name - path - page_index title: GetPageInfoRequest GetPageInfoResponse: properties: page: $ref: '#/components/schemas/PageResponse' type: object required: - page title: GetPageInfoResponse HTTPValidationError: properties: detail: items: $ref: '#/components/schemas/ValidationError' type: array title: Detail type: object title: HTTPValidationError PageResponse: properties: id: type: string format: uuid title: Id collection_name: type: string title: Collection Name description: The name of the collection. path: type: string title: Path description: The filepath of the document associated with this page. page_index: type: integer title: Page Index description: >- The specific page index of this page. Pages are 0-indexed, so that the 1st page of a PDF is of page index 0. content: anyOf: - type: string - type: 'null' title: Content description: >- The content of the page. This field will only be provided if `include_content` was set to `true`, and the document has finished parsing. Otherwise, this field will be set to `null`. image_url: anyOf: - type: string - type: 'null' title: Image Url description: >- A URL to an image of the page. This field will only be provided if ``` -------------------------------- ### Get Page Info Source: https://docs.zeroentropy.dev/api-reference/documents/get-page-info Retrieves information about a specific page. The request parameters define what information you would like to receive. A 404 Not Found will be returned if either the collection name does not exist, or the document path does not exist within the provided collection. ```APIDOC ## POST /documents/get-page-info ### Description Retrieves information about a specific page. The request parameters define what information you would like to receive. A `404 Not Found` will be returned if either the collection name does not exist, or the document path does not exist within the provided collection. ### Method POST ### Endpoint /documents/get-page-info ### Parameters #### Request Body - **collection_name** (string) - Required - The name of the collection. - **path** (string) - Required - The filepath of the document whose page you are requesting. A `404 Not Found` status code will be returned if no document with this path was found. - **page_index** (integer) - Required - The specific page index whose info is being requested. Pages are 0-indexed, so that the 1st page of a PDF is of page index 0. You may use the `num_pages` attribute of `/documents/get-document-info` or `/documents/get-document-info-list` to know what the range of valid indices are. A `404 Not Found` status code will be returned if no such page index exists. - **include_content** (boolean) - Optional - If `true`, then the response will have the `content` attribute be a `string`, rather than `null`. This string will contain the full contents of the page. (default: false) ### Response #### Success Response (200) - **page** (object) - Contains the page details. - **id** (string) - The unique identifier for the page. - **collection_name** (string) - The name of the collection. - **path** (string) - The filepath of the document associated with this page. - **page_index** (integer) - The specific page index of this page. - **content** (string | null) - The content of the page. This field will only be provided if `include_content` was set to `true`, and the document has finished parsing. Otherwise, this field will be set to `null`. - **image_url** (string | null) - A URL to an image of the page. This field will only be provided if `include_content` was set to `true`, and the document has finished parsing. Otherwise, this field will be set to `null`. #### Error Response (400) - **detail** (string) - Description of Error #### Error Response (404) - **detail** (string) - Description of Error #### Error Response (422) - **detail** (array) - Contains a list of validation errors. ``` -------------------------------- ### Get Status Source: https://docs.zeroentropy.dev/api-reference/documents/add-document Retrieves the current status of document indexing operations. ```APIDOC ## GET /status/get-status ### Description Retrieves the current status of document indexing operations. This endpoint can be used to check the progress of document insertions. ### Method GET ### Endpoint /status/get-status ### Parameters (No parameters are specified in the provided text) ### Request Example (No request example is provided in the provided text) ### Response #### Success Response (200 OK) - **status** (string) - The current status of the indexing process. - **progress** (integer) - The percentage of completion for the indexing process. #### Response Example ```json { "status": "indexing", "progress": 75 } ``` ``` -------------------------------- ### Manage Collections with ZeroEntropy Source: https://docs.zeroentropy.dev/quickstart Demonstrates adding, listing, and deleting collections. Handles potential conflicts when adding a collection. ```python from zeroentropy import AsyncZeroEntropy, ConflictError zclient = AsyncZeroEntropy() try: await zclient.collections.add(collection_name="my_col") except ConflictError: pass names = (await zclient.collections.get_list()).collection_names await zclient.collections.delete(collection_name="my_col") ``` -------------------------------- ### Get Document Info Source: https://docs.zeroentropy.dev/quickstart Retrieves information about a specific document, with an option to include its content. ```APIDOC ## GET /documents/get_info ### Description Retrieves information about a specific document, with an option to include its content. ### Method GET ### Endpoint `/documents/get_info` ### Parameters #### Query Parameters - **collection_name** (string) - Required - The name of the collection. - **path** (string) - Required - The path to the document. - **include_content** (boolean) - Optional - Whether to include the document's content in the response. ### Response #### Success Response (200) - **document** (Document) - Information about the document. class Document(BaseModel): id: str collection_name: str path: str file_url: str size: int metadata: Dict[str, Union[str, List[str]]] index_status: str num_pages: Optional[int] = None content: Optional[str] = None ### Response Example { "document": { "id": "uuid-string", "collection_name": "col", "path": "doc.txt", "file_url": "http://example.com/doc.txt", "size": 1024, "metadata": {"reviewed": "yes"}, "index_status": "indexed", "num_pages": null, "content": "Document content here..." } } ``` -------------------------------- ### GET /documents Source: https://docs.zeroentropy.dev/api-reference/documents/get-document-info Retrieves information about a specific document based on collection and path parameters. ```APIDOC ## GET /documents ### Description Retrieves information about a specific document. The request parameters define what information you would like to receive. ### Method GET ### Endpoint /documents ### Parameters #### Query Parameters - **collection** (string) - Required - The name of the collection containing the document. - **path** (string) - Required - The path to the document within the collection. ### Response #### Success Response (200) - **info** (object) - The requested document information. #### Error Response (404) - Returned if the collection name does not exist or the document path does not exist within the provided collection. ``` -------------------------------- ### Upload PDF File to ZeroEntropy Source: https://docs.zeroentropy.dev/examples/upload Use this to upload a PDF file by providing its URL. The content type is set to 'auto' and the file is encoded in base64. Metadata can include timestamps and tags. ```python import requests import base64 from datetime import datetime from zeroentropy import ZeroEntropy zclient = ZeroEntropy() # Create new collection response = zclient.collections.add( collection_name="pdfs" ) document = requests.get( "https://arxiv.org/pdf/2408.10343.pdf" ) # Convert to base64 base64_content = base64.b64encode(document.content).decode('utf-8') response = zclient.documents.add( collection_name="pdfs", path="docs/document.pdf", content={ "type": "auto", "base64_data": base64_content, }, metadata={ "timestamp": datetime.now().isoformat(), "list:tags": ["arxiv", "research"], } ) print(response.message) ``` -------------------------------- ### Export API Key (Bash) Source: https://docs.zeroentropy.dev/examples/setup Export your ZeroEntropy API key as an environment variable for use in MacOS or Linux. ```bash export ZEROENTROPY_API_KEY="your_api_key" ``` -------------------------------- ### Upload Text File to ZeroEntropy Source: https://docs.zeroentropy.dev/examples/upload Use this to upload a text file to a specified collection. Ensure the collection exists or is created first. Metadata can include timestamps and tags. ```python from datetime import datetime from zeroentropy import ZeroEntropy zclient = ZeroEntropy() # Add a new collection response = zclient.collections.add(collection_name="default") # Add a document to the collection response = zclient.documents.add( collection_name="default", path="docs/document.txt", content={ "type": "text", "text": "My favorite apple is the Granny Smith.", }, metadata={ "timestamp": datetime.now().isoformat(), "list:tags": ["tag 1", "tag 2"], } ) print(response.message) ``` -------------------------------- ### Get Document Info Source: https://docs.zeroentropy.dev/quickstart Retrieve information about a specific document, optionally including its content. Useful for inspecting individual documents. ```python info = await zclient.documents.get_info( collection_name="col", path="doc.txt", include_content=True ) ``` -------------------------------- ### Upload PDF File to ZeroEntropy (TypeScript) Source: https://docs.zeroentropy.dev/examples/upload Use this to upload a PDF file by providing its URL. The content type is set to 'auto' and the file is encoded in base64. Metadata can include timestamps and tags. ```typescript import axios from 'axios'; import { ZeroEntropy } from 'zeroentropy'; const zclient = new ZeroEntropy(); // Add a document to a new collection async function addPdf() { try { // Add a new collection await zclient.collections.add({ collection_name: "pdfs", }); console.log("Collection 'pdfs' created successfully."); // Fetch the document const documentResponse = await axios.get('https://arxiv.org/pdf/2408.10343.pdf', { responseType: 'arraybuffer' }); // Convert document to Base64 const base64Content = Buffer.from(documentResponse.data).toString('base64'); // Add a document to the collection const response = await zclient.documents.add({ collection_name: "pdfs", path: "docs/document.pdf", content: { type: "auto", base64_data: base64Content, }, metadata: { timestamp: new Date().toISOString(), "list:tags": ["arxiv", "research"], } }); console.log(response.message); } catch (error) { console.error("Error:", error); } } addPdf(); ``` -------------------------------- ### Get System Status Source: https://docs.zeroentropy.dev/quickstart Retrieve the overall system status or the status for a specific collection. Useful for monitoring indexing and document counts. ```python status_all = await zclient.status.get_status() status_col = await zclient.status.get_status(collection_name="col") ``` -------------------------------- ### Add Documents with List Metadata Source: https://docs.zeroentropy.dev/metadata-filtering Demonstrates adding documents with 'list:' prefixed metadata attributes. This is necessary for filtering based on list-type metadata. ```python # Upload two blog posts, one about tech, and the other about food. await zclient.documents.add( collection_name="default", path="ai_blog.txt", content={ "type": "text", "text": "This is a blog post about artificial intelligence." }, metadata={ "list:tags": ["blog", "tech"] } ) await zclient.documents.add( collection_name="default", path="food_blog.txt", content={ "type": "text", "text": "This is a blog post about food." }, metadata={ "list:tags": ["blog", "food"] } ) await zclient.documents.add( collection_name="default", path="empty.txt", content={ "type": "text", "text": "This is an empty file with no tags." }, metadata={} # Omission is equivalent to `list:tags` being an empty array ) ``` -------------------------------- ### Embed Queries and Documents with zembed-1 Source: https://docs.zeroentropy.dev/examples/embed Use the ZeroEntropy SDK to embed a query and a list of documents using the 'zembed-1' model. Specify 'query' or 'document' for the input_type. ```python from zeroentropy import ZeroEntropy zclient = ZeroEntropy() query = "What is Retrieval Augmented Generation?" documents = [ "RAG combines retrieval with generation by conditioning the LLM on external documents.", "Retrieval-Augmented Generation is a machine learning technique introduced by Meta AI in 2020.", "It uses reinforcement learning to generate music sequences.", "RAG can improve factual accuracy by grounding answers in retrieved evidence.", "Transformers are a type of deep learning architecture." ] # Embed the query query_response = zclient.models.embed( model="zembed-1", input=query, input_type="query", ) # Embed the documents docs_response = zclient.models.embed( model="zembed-1", input=documents, input_type="document", ) ``` ```typescript import ZeroEntropy from 'zeroentropy'; const zclient = new ZeroEntropy(); const query = "What is Retrieval Augmented Generation?"; const documents = [ "RAG combines retrieval with generation by conditioning the LLM on external documents.", "Retrieval-Augmented Generation is a machine learning technique introduced by Meta AI in 2020.", "It uses reinforcement learning to generate music sequences.", "RAG can improve factual accuracy by grounding answers in retrieved evidence.", "Transformers are a type of deep learning architecture." ]; // Embed the query const queryResponse = await zclient.models.embed({ model: "zembed-1", input: query, input_type: "query", }); // Embed the documents const docsResponse = await zclient.models.embed({ model: "zembed-1", input: documents, input_type: "document", }); ``` -------------------------------- ### Get Status Source: https://docs.zeroentropy.dev/api-reference/status/get-status Retrieves the current indexing status across all documents or for a specific collection. Returns a 404 if a non-existent collection name is provided. ```APIDOC ## GET /websites/zeroentropy_dev/status ### Description Gets the current indexing status across all documents. If a collection name is passed in, it will get the indexing status of only the documents within that collection. Otherwise, it will show the cumulative status across all of your collections. A `404 Not Found` status code will be returned, if a collection name was provided, but it does not exist. ### Method GET ### Endpoint /websites/zeroentropy_dev/status ### Query Parameters - **collection** (string) - Optional - The name of the collection to get the indexing status for. ``` -------------------------------- ### Export API Key (Windows PowerShell) Source: https://docs.zeroentropy.dev/examples/setup Export your ZeroEntropy API key as an environment variable for use in Windows. ```powershell setx ZEROENTROPY_API_KEY "your_api_key" ``` -------------------------------- ### Upload Text File to ZeroEntropy (TypeScript) Source: https://docs.zeroentropy.dev/examples/upload Use this to upload a text file to a specified collection. Ensure the collection exists or is created first. Metadata can include timestamps and tags. ```typescript import { ZeroEntropy } from 'zeroentropy'; const zclient = new ZeroEntropy(); // Add a document to a new collection async function addDocument() { try { // Add a new collection await zclient.collections.add({ collection_name: "default", }); console.log("Collection 'default' created successfully."); // Add a document to the collection const response = await zclient.documents.add({ collection_name: "default", path: "docs/document.txt", content: { type: "text", text: "My favorite apple is the Granny Smith.", }, metadata: { timestamp: new Date().toISOString(), "list:tags": ["tag 1", "tag 2"], } }); console.log(response.message); } catch (error) { console.error("Error:", error); } } addDocument(); ```