### Install dify-dataset-sdk using pip

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Installs the Dify Knowledge Base SDK using pip, the Python package installer. This is the primary way to get the library into your Python environment.

```bash
pip install dify-dataset-sdk
```

--------------------------------

### Clone Repository and Install Dependencies (Bash)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Commands to clone the Dify SDK repository from GitHub and install its development dependencies using pip. This sets up the local environment for development.

```bash
# Clone the repository
git clone https://github.com/LeekJay/dify-dataset-sdk.git
cd dify-dataset-sdk

# Install dependencies
pip install -e ".[dev]"
```

--------------------------------

### Advanced Retrieval Methods (Semantic, Hybrid, Full-Text)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Shows examples of performing different types of searches within a dataset: semantic search, hybrid search (combining semantic and full-text), and pure full-text search, with configurable parameters like top_k and score_threshold.

```python
# Semantic search
results = client.retrieve(
    dataset_id=dataset_id,
    query="How to implement authentication?",
    retrieval_config={
        "search_method": "semantic_search",
        "top_k": 5,
        "score_threshold": 0.7
    }
)

# Hybrid search (combining semantic and full-text)
results = client.retrieve(
    dataset_id=dataset_id,
    query="API documentation",
    retrieval_config={
        "search_method": "hybrid_search",
        "top_k": 10,
        "rerank_model": {
            "model": "rerank-multilingual-v2.0",
            "mode": "reranking_model"
        }
    }
)

# Full-text search
results = client.retrieve(
    dataset_id=dataset_id,
    query="database configuration",
    retrieval_config={"search_method": "full_text_search", "top_k": 5}
)
```

--------------------------------

### Initialize DifyDatasetClient and manage datasets

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Demonstrates initializing the DifyDatasetClient with an API key and base URL. It shows how to create a new dataset, list existing datasets with pagination, and delete a dataset.

```python
from dify_dataset_sdk import DifyDatasetClient

# Initialize the client with API key and custom base URL
client = DifyDatasetClient(
    api_key="your-api-key",
    base_url="https://your-custom-dify-instance.com",
    timeout=60.0  # Custom timeout in seconds
)

# Create a new dataset (knowledge base)
dataset = client.create_dataset(
    name="My Knowledge Base",
    permission="only_me"
)

# Create a dataset with description
dataset = client.create_dataset(
    name="Technical Documentation",
    permission="only_me",
    description="Internal technical docs"
)

# List datasets with pagination
datasets = client.list_datasets(page=1, limit=20)

# Delete a dataset (ensure dataset_id is defined)
# client.delete_dataset(dataset_id)

# Close the client
client.close()
```

--------------------------------

### Run Tests with Pytest (Bash)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Instructions for running tests using the pytest framework. Includes commands for running all tests, specific files, and with verbose output.

```bash
# Run all tests
pytest

# Run specific test file
python tests/test_all_39_apis.py

# Run with verbose output
pytest -v
```

--------------------------------

### Client Configuration

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Details on how to configure and initialize the Dify Dataset Client.

```APIDOC
## Client Configuration

### Description
Initialize the Dify Dataset Client with your API key and optional parameters.

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
DifyDatasetClient(
    api_key: str,           # Required: Your Dify API key
    base_url: str,          # Optional: API base URL (default: "https://api.dify.ai")
    timeout: float          # Optional: Request timeout in seconds (default: 30.0)
)
```

### Response
None
```

--------------------------------

### Create documents from text and files

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Shows how to create documents within a dataset. Supports creating documents directly from plain text content or from local files like PDFs, with options for indexing techniques and processing rules.

```python
from dify_dataset_sdk import DifyDatasetClient

# Assume client is initialized and dataset_id is available
# client = DifyDatasetClient(api_key="your-api-key")
# dataset_id = dataset.id

# Create a document from text
doc_response = client.create_document_by_text(
    dataset_id=dataset_id,
    name="Sample Document",
    text="This is a sample document for the knowledge base.",
    indexing_technique="high_quality"
)

# Create document from text with custom processing mode
doc_response = client.create_document_by_text(
    dataset_id=dataset_id,
    name="API Documentation",
    text="Complete API documentation content...",
    indexing_technique="high_quality",
    process_rule_mode="automatic"
)

# Create document from a local file
doc_response = client.create_document_by_file(
    dataset_id=dataset_id,
    file_path="./documentation.pdf",
    indexing_technique="high_quality"
)
```

--------------------------------

### Initialize DifyDatasetClient (Python)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Configuration for the DifyDatasetClient, specifying the API key, optional base URL, and request timeout. This client is used to interact with the Dify API.

```python
DifyDatasetClient(
    api_key: str,           # Required: Your Dify API key
    base_url: str,          # Optional: API base URL (default: "https://api.dify.ai")
    timeout: float          # Optional: Request timeout in seconds (default: 30.0)
)
```

--------------------------------

### Manage Knowledge Tags and Bind Datasets

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Demonstrates creating knowledge tags, binding datasets to these tags, listing all tags, retrieving tags for a specific dataset, and filtering datasets by tags using the Dify SDK.

```python
# Create knowledge tags
tag = client.create_knowledge_tag(name="Technical Documentation")
dept_tag = client.create_knowledge_tag(name="Engineering Department")

# Bind datasets to tags
client.bind_dataset_to_tag(dataset_id, [tag.id, dept_tag.id])

# List all knowledge tags
tags = client.list_knowledge_tags()

# Get tags for a specific dataset
dataset_tags = client.get_dataset_tags(dataset_id)

# Filter datasets by tags
filtered_datasets = client.list_datasets(tag_ids=[tag.id])
```

--------------------------------

### Health Monitoring

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Provides an overview of how to monitor the SDK's performance and API health.

```APIDOC
## Health Monitoring

### Description
Monitor SDK performance and API health by tracking requests, errors, and response times.

### Example Usage
```python
class SDKMonitor:
    def __init__(self, client):
        self.client = client
        self.metrics = {"requests": 0, "errors": 0, "avg_response_time": 0}

    def health_check(self):
        try:
            start_time = time.time()
            self.client.list_datasets(limit=1)
            response_time = time.time() - start_time
            return {"status": "healthy", "response_time": response_time}
        except Exception as e:
            return {"status": "unhealthy", "error": str(e)}
```
```

--------------------------------

### Batch Document Upload using ThreadPoolExecutor

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Shows how to efficiently process multiple documents in parallel using Python's `concurrent.futures.ThreadPoolExecutor` for uploading documents to a dataset with specified indexing quality.

```python
from concurrent.futures import ThreadPoolExecutor

def upload_document(file_path):
    return client.create_document_by_file(
        dataset_id=dataset_id,
        file_path=file_path,
        indexing_technique="high_quality"
    )

# Parallel document upload
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(upload_document, file) for file in file_list]
    results = [future.result() for future in futures]
```

--------------------------------

### Knowledge Tag Management API

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

APIs for creating, listing, and binding knowledge tags to datasets.

```APIDOC
## POST /api/knowledge/tags

### Description
Creates a new knowledge tag.

### Method
POST

### Endpoint
/api/knowledge/tags

### Parameters
#### Request Body
- **name** (string) - Required - The name of the knowledge tag.

### Request Example
```json
{
  "name": "Technical Documentation"
}
```

### Response
#### Success Response (200)
- **id** (string) - The ID of the created tag.
- **name** (string) - The name of the created tag.

### Response Example
```json
{
  "id": "tag_123",
  "name": "Technical Documentation"
}
```

## POST /api/datasets/{dataset_id}/tags

### Description
Binds one or more knowledge tags to a specific dataset.

### Method
POST

### Endpoint
/api/datasets/{dataset_id}/tags

### Parameters
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset.

#### Request Body
- **tag_ids** (array of strings) - Required - A list of tag IDs to bind to the dataset.

### Request Example
```json
{
  "tag_ids": ["tag_123", "tag_456"]
}
```

### Response
#### Success Response (200)
- **message** (string) - Confirmation message.

### Response Example
```json
{
  "message": "Tags bound successfully."
}
```

## GET /api/knowledge/tags

### Description
Lists all available knowledge tags.

### Method
GET

### Endpoint
/api/knowledge/tags

### Response
#### Success Response (200)
- **tags** (array of objects) - A list of knowledge tags, each with 'id' and 'name'.

### Response Example
```json
{
  "tags": [
    {"id": "tag_123", "name": "Technical Documentation"},
    {"id": "tag_456", "name": "Engineering Department"}
  ]
}
```

## GET /api/datasets/{dataset_id}/tags

### Description
Retrieves all knowledge tags associated with a specific dataset.

### Method
GET

### Endpoint
/api/datasets/{dataset_id}/tags

### Parameters
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset.

### Response
#### Success Response (200)
- **tags** (array of objects) - A list of tags associated with the dataset, each with 'id' and 'name'.

### Response Example
```json
{
  "tags": [
    {"id": "tag_123", "name": "Technical Documentation"}
  ]
}
```

## GET /api/datasets

### Description
Lists datasets, with an option to filter by knowledge tags.

### Method
GET

### Endpoint
/api/datasets

### Parameters
#### Query Parameters
- **tag_ids** (array of strings) - Optional - Filters datasets by a list of tag IDs.

### Response
#### Success Response (200)
- **datasets** (array of objects) - A list of datasets matching the filter criteria.

### Response Example
```json
{
  "datasets": [
    {"id": "dataset_abc", "name": "Dataset A"},
    {"id": "dataset_def", "name": "Dataset B"}
  ]
}
```
```

--------------------------------

### Format and Check Code with Ruff (Bash)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Commands to format code and check for linting issues using the Ruff tool. These commands help maintain code quality and consistency.

```bash
# Format code
ruff format dify_dataset_sdk/

# Check and fix issues
ruff check --fix dify_dataset_sdk/

# Type checking
mypy dify_dataset_sdk/
```

--------------------------------

### Supported File Types

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Lists the file types that the SDK supports for uploading data.

```APIDOC
## Supported File Types

### Description
The SDK supports uploading data from various file formats.

### File Types
- `txt` - Plain text files
- `md`, `markdown` - Markdown files
- `pdf` - PDF documents
- `html` - HTML files
- `xlsx` - Excel spreadsheets
- `docx` - Word documents
- `csv` - CSV files
```

--------------------------------

### Error Handling with Retry

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Demonstrates how to implement automatic retry mechanisms for robust error handling.

```APIDOC
## Error Handling with Retry

### Description
Implement robust error handling with automatic retry using exponential backoff for network-related errors.

### Example Usage
```python
from dify_dataset_sdk.exceptions import DifyTimeoutError, DifyConnectionError
import time

def safe_operation_with_retry(operation, max_retries=3):
    for attempt in range(max_retries):
        try:
            return operation()
        except (DifyTimeoutError, DifyConnectionError) as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                time.sleep(wait_time)
                continue
            raise e
```
```

--------------------------------

### Rate Limits

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Information regarding API rate limits and the SDK's handling of them.

```APIDOC
## Rate Limits

### Description
Users must adhere to Dify's API rate limits. The SDK is designed with built-in error handling for rate limit responses.
```

--------------------------------

### Batch Processing API

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Demonstrates efficient processing of multiple documents using parallel operations.

```APIDOC
## Parallel Document Upload

### Description
Uploads multiple documents to a dataset concurrently using a thread pool executor.

### Method
POST (implicitly via `create_document_by_file` calls)

### Endpoint
`/api/datasets/{dataset_id}/documents` (for `create_document_by_file`)

### Parameters (for `create_document_by_file`)
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset to upload to.

#### Request Body (for `create_document_by_file`)
- **file_path** (string) - Required - The path to the document file.
- **indexing_technique** (string) - Optional - The technique to use for indexing (e.g., "high_quality").

### Code Example
```python
from concurrent.futures import ThreadPoolExecutor

def upload_document(client, dataset_id, file_path):
    """Helper function to upload a single document."""
    try:
        return client.create_document_by_file(
            dataset_id=dataset_id,
            file_path=file_path,
            indexing_technique="high_quality"
        )
    except Exception as e:
        print(f"Error uploading {file_path}: {e}")
        return None

# Assume 'client' is an initialized Dify SDK client instance
# Assume 'dataset_id' is the target dataset ID
# Assume 'file_list' is a list of file paths to upload

file_list = ["/path/to/doc1.pdf", "/path/to/doc2.docx", "/path/to/doc3.txt"]

print("Starting parallel document upload...")
with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit upload tasks to the executor
    futures = [executor.submit(upload_document, client, dataset_id, file) for file in file_list]
    
    # Collect results as tasks complete
    results = []
    for future in futures:
        result = future.result()
        if result:
            results.append(result)

print(f"Successfully uploaded {len(results)} documents.")
```

### Response (for `create_document_by_file`)
#### Success Response (200)
- **document_id** (string) - The ID of the uploaded document.
- **status** (string) - The status of the document upload.

### Response Example (for `create_document_by_file`)
```json
{
  "document_id": "doc_uuid_123",
  "status": "uploaded"
}
```
```

--------------------------------

### Manage Metadata Fields and Update Documents

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Illustrates how to create custom metadata fields (e.g., 'category', 'priority') for datasets and subsequently update document metadata with specific values using the SDK.

```python
# Create metadata fields
category_field = client.create_metadata_field(
    dataset_id=dataset_id,
    field_type="string",
    name="category"
)

priority_field = client.create_metadata_field(
    dataset_id=dataset_id,
    field_type="number",
    name="priority"
)

# Update document metadata
metadata_operations = [
    {
        "document_id": document_id,
        "metadata_list": [
            {
                "id": category_field.id,
                "value": "technical",
                "name": "category"
            },
            {
                "id": priority_field.id,
                "value": "5",
                "name": "priority"
            }
        ]
    }
]

client.update_document_metadata(dataset_id, metadata_operations)
```

--------------------------------

### Progress Monitoring API

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

API for monitoring the status of document indexing within a dataset.

```APIDOC
## GET /api/datasets/{dataset_id}/indexing-status

### Description
Retrieves the indexing status for documents in a dataset.

### Method
GET

### Endpoint
/api/datasets/{dataset_id}/indexing-status

### Parameters
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset.
- **batch_id** (string) - Required - The ID of the indexing batch.

### Response
#### Success Response (200)
- **data** (array of objects) - Information about the indexing status.
  - **indexing_status** (string) - The current status of indexing (e.g., "completed", "processing").
  - **completed_segments** (integer) - The number of segments that have been processed.
  - **total_segments** (integer) - The total number of segments to process.

### Response Example
```json
{
  "data": [
    {
      "indexing_status": "completed",
      "completed_segments": 100,
      "total_segments": 100
    }
  ]
}
```
```

--------------------------------

### Comprehensive Error Handling with Dify SDK Exceptions

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Demonstrates robust error handling by catching specific exception types provided by the Dify SDK, such as authentication errors, validation errors, not found errors, and general API errors, along with their attributes.

```python
from dify_dataset_sdk.exceptions import (
    DifyAPIError,
    DifyAuthenticationError,
    DifyValidationError,
    DifyNotFoundError,
    DifyConflictError,
    DifyServerError,
    DifyConnectionError,
    DifyTimeoutError
)

try:
    dataset = client.create_dataset(name="Test Dataset")
except DifyAuthenticationError:
    print("Invalid API key")
except DifyValidationError as e:
    print(f"Validation error: {e}")
except DifyConflictError as e:
    print(f"Conflict: {e}")  # e.g., duplicate dataset name
except DifyAPIError as e:
    print(f"API error: {e}")
    print(f"Status code: {e.status_code}")
    print(f"Error code: {e.error_code}")
```

--------------------------------

### Monitor Document Indexing Progress

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Provides a code snippet to retrieve the indexing status of documents within a dataset and display progress information such as completed and total segments.

```python
# Monitor document indexing progress
status = client.get_document_indexing_status(dataset_id, batch_id)

if status.data:
    indexing_info = status.data[0]
    print(f"Status: {indexing_info.indexing_status}")
    print(f"Progress: {indexing_info.completed_segments}/{indexing_info.total_segments}")
```

--------------------------------

### Custom document processing rules

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Illustrates how to define and use custom processing rules when creating documents from files. This allows for fine-grained control over text cleaning and segmentation, such as removing extra spaces or URLs, and setting segment size.

```python
from dify_dataset_sdk import DifyDatasetClient

# Assume client is initialized and dataset_id is available
# client = DifyDatasetClient(api_key="your-api-key")
# dataset_id = dataset.id

# Custom processing configuration
process_rule_config = {
    "rules": {
        "pre_processing_rules": [
            {"id": "remove_extra_spaces", "enabled": True},
            {"id": "remove_urls_emails", "enabled": True}
        ],
        "segmentation": {
            "separator": "###",
            "max_tokens": 500
        }
    }
}

doc_response = client.create_document_by_file(
    dataset_id=dataset_id,
    file_path="document.txt",
    process_rule_mode="custom",
    process_rule_config=process_rule_config
)
```

--------------------------------

### Manage document segments

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Provides methods for segment management within a dataset and document. Includes creating multiple segments with content, answers, and keywords, listing existing segments, updating a segment's details, and deleting a specific segment.

```python
from dify_dataset_sdk import DifyDatasetClient

# Assume client is initialized, dataset_id, document_id, and segment_id are available
# client = DifyDatasetClient(api_key="your-api-key")
# dataset_id = dataset.id
# document_id = doc_response.id
# segment_id = segments[0].id

# Create segments
segments_data = [
    {
        "content": "First segment content",
        "answer": "Answer for first segment",
        "keywords": ["keyword1", "keyword2"]
    },
    {
        "content": "Second segment content",
        "answer": "Answer for second segment",
        "keywords": ["keyword3", "keyword4"]
    }
]

segments = client.create_segments(dataset_id, document_id, segments_data)

# List segments
segments = client.list_segments(dataset_id, document_id)

# Update a segment
client.update_segment(
    dataset_id=dataset_id,
    document_id=document_id,
    segment_id=segment_id,
    segment_data={
        "content": "Updated content",
        "keywords": ["updated", "keywords"],
        "enabled": True
    }
)

# Delete a segment
# client.delete_segment(dataset_id, document_id, segment_id)
```

--------------------------------

### Advanced Retrieval API

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

APIs for performing semantic, hybrid, and full-text searches on datasets.

```APIDOC
## POST /api/datasets/{dataset_id}/retrieve

### Description
Performs a retrieval operation on a dataset using specified search methods and configurations.

### Method
POST

### Endpoint
/api/datasets/{dataset_id}/retrieve

### Parameters
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset to search within.

#### Request Body
- **query** (string) - Required - The search query.
- **retrieval_config** (object) - Required - Configuration for the retrieval process.
  - **search_method** (string) - Required - The search method to use (e.g., "semantic_search", "hybrid_search", "full_text_search").
  - **top_k** (integer) - Optional - The number of results to return.
  - **score_threshold** (float) - Optional - The minimum score for results (for semantic search).
  - **rerank_model** (object) - Optional - Configuration for reranking results (for hybrid search).
    - **model** (string) - Required - The name of the rerank model.
    - **mode** (string) - Required - The mode of the rerank model.

### Request Example
```json
{
  "query": "How to implement authentication?",
  "retrieval_config": {
    "search_method": "semantic_search",
    "top_k": 5,
    "score_threshold": 0.7
  }
}
```

### Response
#### Success Response (200)
- **results** (array of objects) - The search results.

### Response Example
```json
{
  "results": [
    {
      "content": "Authentication can be implemented using OAuth 2.0...",
      "score": 0.92,
      "metadata": {}
    }
  ]
}
```
```

--------------------------------

### Error Handling

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Details on the exception types provided by the SDK for robust error management.

```APIDOC
## Error Handling Overview

The Dify SDK provides a set of custom exception classes to handle various API-related errors gracefully.

### Exception Types

- **DifyAPIError**: Base class for all Dify API errors.
- **DifyAuthenticationError**: Raised for authentication failures (e.g., invalid API key).
- **DifyValidationError**: Raised when request data fails validation.
- **DifyNotFoundError**: Raised when a requested resource is not found.
- **DifyConflictError**: Raised when an operation conflicts with existing resources (e.g., duplicate names).
- **DifyServerError**: Raised for server-side errors.
- **DifyConnectionError**: Raised for network connection issues.
- **DifyTimeoutError**: Raised when an API request times out.

### Usage Example
```python
from dify_dataset_sdk.exceptions import (
    DifyAPIError,
    DifyAuthenticationError,
    DifyValidationError,
    DifyNotFoundError,
    DifyConflictError,
    DifyServerError,
    DifyConnectionError,
    DifyTimeoutError
)

try:
    # Attempt an API operation that might fail
    client.create_dataset(name="My Dataset")
except DifyAuthenticationError:
    print("Authentication failed. Please check your API key.")
except DifyValidationError as e:
    print(f"Validation error occurred: {e}")
except DifyConflictError as e:
    print(f"Conflict detected: {e}. Resource might already exist.")
except DifyNotFoundError as e:
    print(f"Resource not found: {e}")
except DifyServerError as e:
    print(f"Server error: {e}. Status code: {e.status_code}")
except DifyConnectionError:
    print("Could not connect to the Dify API. Check your network connection.")
except DifyTimeoutError:
    print("The request to the Dify API timed out.")
except DifyAPIError as e:
    print(f"An unexpected API error occurred: {e}")
    print(f"Error Code: {e.error_code}")
    print(f"Status Code: {e.status_code}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
```
```

--------------------------------

### Monitor SDK Performance and API Health (Python)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

This class provides health monitoring for the SDK by tracking requests, errors, and average response times. The `health_check` method performs a simple API call to gauge the connection and response status.

```python
class SDKMonitor:
    def __init__(self, client):
        self.client = client
        self.metrics = {"requests": 0, "errors": 0, "avg_response_time": 0}

    def health_check(self):
        try:
            start_time = time.time()
            self.client.list_datasets(limit=1)
            response_time = time.time() - start_time
            return {"status": "healthy", "response_time": response_time}
        except Exception as e:
            return {"status": "unhealthy", "error": str(e)}
```

--------------------------------

### List documents in a dataset

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

Demonstrates how to retrieve a list of all documents within a specific dataset. It shows printing the total count of documents retrieved.

```python
from dify_dataset_sdk import DifyDatasetClient

# Assume client is initialized and dataset_id is available
# client = DifyDatasetClient(api_key="your-api-key")
# dataset_id = dataset.id

# List all documents
documents = client.list_documents(dataset_id)
print(f"Total documents: {documents.total}")
```

--------------------------------

### Metadata Management API

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

APIs for creating metadata fields and updating document metadata.

```APIDOC
## POST /api/datasets/{dataset_id}/metadata-fields

### Description
Creates a new metadata field for a dataset.

### Method
POST

### Endpoint
/api/datasets/{dataset_id}/metadata-fields

### Parameters
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset.

#### Request Body
- **name** (string) - Required - The name of the metadata field.
- **field_type** (string) - Required - The type of the metadata field (e.g., "string", "number").

### Request Example
```json
{
  "name": "category",
  "field_type": "string"
}
```

### Response
#### Success Response (200)
- **id** (string) - The ID of the created metadata field.
- **name** (string) - The name of the metadata field.
- **field_type** (string) - The type of the metadata field.

### Response Example
```json
{
  "id": "field_xyz",
  "name": "category",
  "field_type": "string"
}
```

## POST /api/datasets/{dataset_id}/documents/metadata

### Description
Updates the metadata for one or more documents within a dataset.

### Method
POST

### Endpoint
/api/datasets/{dataset_id}/documents/metadata

### Parameters
#### Path Parameters
- **dataset_id** (string) - Required - The ID of the dataset.

#### Request Body
- **metadata_operations** (array of objects) - Required - A list of operations to update document metadata.
  - **document_id** (string) - Required - The ID of the document to update.
  - **metadata_list** (array of objects) - Required - A list of metadata key-value pairs to apply.
    - **id** (string) - Required - The ID of the metadata field.
    - **value** (string) - Required - The value for the metadata field.
    - **name** (string) - Required - The name of the metadata field.

### Request Example
```json
{
  "metadata_operations": [
    {
      "document_id": "doc_789",
      "metadata_list": [
        {
          "id": "field_xyz",
          "value": "technical",
          "name": "category"
        }
      ]
    }
  ]
}
```

### Response
#### Success Response (200)
- **message** (string) - Confirmation message.

### Response Example
```json
{
  "message": "Document metadata updated successfully."
}
```
```

--------------------------------

### Implement Error Handling with Retry (Python)

Source: https://github.com/leekjay/dify-knowledge-sdk/blob/master/README.md

This function implements robust error handling with automatic retries for operations that might fail due to timeouts or connection errors. It uses exponential backoff to increase wait times between retries, improving resilience.

```python
from dify_dataset_sdk.exceptions import DifyTimeoutError, DifyConnectionError
import time

def safe_operation_with_retry(operation, max_retries=3):
    for attempt in range(max_retries):
        try:
            return operation()
        except (DifyTimeoutError, DifyConnectionError) as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                time.sleep(wait_time)
                continue
            raise e
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.