### Install Textractor

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Install the Textractor library to begin working with Amazon Textract. This is a prerequisite for using the subsequent code examples.

```bash
pip install textractor
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to retrieve a specific field. Ensure the 'textractor' library is installed.

```python
from textractor import Textractor

txt = Textractor("us-east-1")

# Query for a specific field
response = txt.query("What is the invoice number?")
print(response.query_results[0].value)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a fundamental query to extract data. Ensure you have the necessary Textractor setup before running.

```python
from textractor.tools.utils import Query

query = Query("What is the total amount?")

```

--------------------------------

### Direct Path Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a direct path query to extract specific information. Ensure the 'textractor' library is installed.

```python
from textractor.tools.utils import get_document
doc = get_document("path/to/your/document.pdf")
query = "What is the total amount?"
result = doc.query(query)
print(result)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a fundamental query to extract specific information. Ensure you have the necessary Textractor setup.

```python
from textractor.tools.utils import query_table

# Assuming 'doc' is a loaded Textractor document object
# Example: Query for a specific value in a table
result = query_table(doc, "column_name", "value_to_find")
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to extract specific text. Ensure the 'textractor' library is installed.

```python
from textractor import Textractor

txt = Textractor("us-east-1")

# Example: Extract text based on a query
query = "What is the invoice number?"
response = txt.query(query=query)

print(response.text)
```

--------------------------------

### Start Document Analysis (Example)

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/layout_analysis_for_text_linearization.ipynb.txt

Example of starting a Textract document analysis job. Ensure SNS topic and role ARNs are valid.

```python
textract_client.start_document_analysis(
    DocumentLocation={'S3Object': {'Bucket': 'your-bucket-name', 'Name': 'your-document.pdf'}},
    FeatureTypes=['TABLES'],
    NotificationChannel={'SNSTopicArn': 'your-sns-topic-arn', 'RoleArn': 'your-sns-role-arn'}
)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to retrieve all pages from a document. Ensure the Textractor library is installed and imported.

```python
from textractor import Textractor

txtr = Textractor(filename="document.pdf")

# Get all pages
response = txtr.query(queries=["all pages"])

print(response.pages)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to extract text based on a keyword. Ensure the 'textractor' library is installed.

```python
from textractor import Textractor

txt = Textractor()

# Example: Extract text after the word "Invoice"
query = "Invoice"
result = txt.get_text_after(query)
print(result)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find a specific word in a document. Ensure the 'textractor' library is installed.

```python
from textractor import Textractor

txt = Textractor("us-east-1")

# Query for a specific word
results = txt.query("invoice")

for result in results:
    print(result)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find all 'Invoice ID' fields. Ensure the Textractor library is installed and imported.

```python
from textractor.tools.query import Query

query = Query("Invoice ID")

```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find a specific key-value pair. Ensure you have the Textractor library installed and imported.

```python
from textractor.tools.utils import load_document
from textractor.data.document import Document

doc: Document = load_document("path/to/your/document.pdf")

# Example: Find the value associated with the key "Invoice Number"
query = "Invoice Number"
results = doc.query(query)

for result in results:
    print(f"Found: {result.value} at page {result.page_number}")
```

--------------------------------

### Direct Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a direct query to Textractor for processing a document. Ensure you have the necessary imports and document setup.

```python
from textractor import Textractor

txt = Textractor("us-east-1")
doc = txt.open_document("document.pdf")

# Direct query
response = doc.query("What is the total amount?")
print(response.answer)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find a specific string within a document. Ensure the Textractor library is installed and imported.

```python
from textractor import Textractor

txt = Textractor("YOUR_ACCESS_KEY_ID", "YOUR_SECRET_ACCESS_KEY", "YOUR_REGION_NAME")

# Example: Query for a specific string
response = txt.query("Find the invoice number.")
print(response.text)
```

--------------------------------

### Example Usage

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Provides a complete example of creating a Document object and then linearizing it to both Markdown and HTML formats.

```python
# Create a document
doc = Document("My Document")

# Add sections and content
intro_section = Section("Introduction")
intro_section.add_content("This is the introduction to my document.")
doc.add_section(intro_section)

body_section = Section("Main Content")
body_section.add_content("This is the main content of the document.")
body_section.add_content("Here is another paragraph.")
doc.add_section(body_section)

# Convert to Markdown
markdown_output = to_markdown(doc)
print("--- Markdown Output ---")
print(markdown_output)

# Convert to HTML
html_output = to_html(doc)
print("--- HTML Output ---")
print(html_output)
```

--------------------------------

### Install Amazon Textract Textractor from Source

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/installation.rst.txt

Install the package in editable mode after cloning the repository and installing requirements.

```bash
pip install -e .
```

--------------------------------

### Get Signature Information from a Page

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to retrieve signature information specifically from a single page of a document. Ensure the 'textractor' library is installed.

```python
from textractor import Textractor

txt = Textractor(textract_client=None)

# Get signatures from a specific page (e.g., page 1)
doc = txt.start("document.pdf", pages=[1])
signatures = doc.pages[0].signatures

# Print signature information for the page
for signature in signatures:
    print(f"Signature found at: {signature.geometry}")
    print(f"Confidence: {signature.confidence}")
```

--------------------------------

### Install Requirements from Source

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/installation.rst.txt

After cloning the repository, install the necessary requirements using the provided requirements.txt file.

```bash
pip install -r requirements.txt
```

--------------------------------

### Get Document Analysis Results (Example)

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/layout_analysis_for_text_linearization.ipynb.txt

Example of retrieving Textract analysis results using a Job ID. This is a simplified retrieval.

```python
response = textract_client.get_document_analysis(JobId='your-job-id')
results = response['Blocks']
```

--------------------------------

### Python: Example Usage

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

This is an example of how to use the notebook_to_markdown function, including commented-out lines for specifying the notebook path and printing the output.

```python
# Example usage:
# notebook_path = 'path/to/your/notebook.ipynb'
# markdown_content = notebook_to_markdown(notebook_path)
# print(markdown_content)
```

--------------------------------

### Install amazon-textract-textractor

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/simple_ocr.ipynb.txt

Install the package using pip. Consider installing PDF extra dependencies if your workflow uses PDFs.

```bash
pip install amazon-textract-textractor
```

```bash
pip install amazon-textract-textractor[pdfium]
```

--------------------------------

### Example Usage: Converting Extracted Table

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/table_data_to_various_formats.ipynb.txt

This example shows how to use the conversion functions with sample table data. Ensure you have the necessary libraries installed.

```python
sample_table_data = [
    {'Column1': 'Row1Value1', 'Column2': 'Row1Value2'},
    {'Column1': 'Row2Value1', 'Column2': 'Row2Value2'}
]

# Convert to CSV
csv_output = table_to_csv(sample_table_data)
print("--- CSV Output ---")
print(csv_output)

# Convert to JSON
json_output = table_to_json(sample_table_data)
print("\n--- JSON Output ---")
print(json_output)

# Convert to Parquet (saves to a file named 'output.parquet')
table_to_parquet(sample_table_data, 'output.parquet')
print("\n--- Parquet Output ---")
print("Table data saved to output.parquet")
```

--------------------------------

### Python Image Processing Setup

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/visualizing_results.ipynb.txt

Basic imports required for image manipulation and drawing using Pillow. Ensure Pillow is installed (`pip install Pillow`).

```python
from PIL import Image, ImageDraw
import json
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to retrieve all documents. This is a foundational step for more complex operations.

```python
from textractor.tools.document import Document
from textractor.tools.document.query import Query

doc = Document("path/to/your/document.pdf")

# Get all documents
all_docs = Query().all().get(doc)
print(f"Found {len(all_docs)} documents.")
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to extract data. Ensure you have the necessary imports.

```python
from textractor.tools.utils import get_document_information

doc = get_document_information("path/to/your/document.pdf")

# Example query: Extract all tables
tables = doc.query("tables")

# Example query: Extract all forms
forms = doc.query("forms")

# Example query: Extract all text
text = doc.query("text")
```

--------------------------------

### Start Expense Analysis (Async)

Source: https://aws-samples.github.io/amazon-textract-textractor/commandline.html

Initiates asynchronous analysis for expense documents. Requires input files to be in S3 or uploaded using `--s3-upload-path`.

```bash
textractor start-expense-analysis [-h] [--s3-upload-path S3_UPLOAD_PATH]
                                  [--s3-output-path S3_OUTPUT_PATH]
                                  [--profile-name PROFILE_NAME]
                                  [--region-name REGION_NAME]
                                  file_source

```

--------------------------------

### Get Text by Product Instruction

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/layout_analysis_for_text_linearization.ipynb.txt

Retrieves detected product instructions from documents. Useful for user manuals and guides.

```python
print(doc.get_text_by_product_instruction())

```

--------------------------------

### Initialize Bedrock Client

Source: https://aws-samples.github.io/amazon-textract-textractor/notebooks/textractor_for_large_language_models.html

Sets up the AWS region and Bedrock endpoint URL, then initializes the Bedrock client for invoking models.

```python
import os
import boto3
import json

from PIL import Image
from textractor import Textractor
from textractor.visualizers.entitylist import EntityList
from textractor.data.constants import TextractFeatures

def get_response_from_claude(context, prompt_data):
    body = json.dumps({
        "prompt": f"""Human: Given the following document:
        {context}
        Answer the following:\n {prompt_data}
        Assistant:""",
        "max_tokens_to_sample": 2000,
        "top_k": 1,
    })
    modelId = f'anthropic.claude-instant-v1' # change this to use a different version from the model provider
    accept = '*/*'
    contentType = 'application/json'

    response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    answer = response_body.get('completion')

    return answer

os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
os.environ["BEDROCK_ENDPOINT_URL"] = "https://bedrock-runtime.us-west-2.amazonaws.com"

bedrock = boto3.client(service_name='bedrock-runtime',region_name='us-west-2',endpoint_url='https://bedrock-runtime.us-west-2.amazonaws.com')

```

--------------------------------

### Initialize Textractor and Analyze Document with Queries

Source: https://aws-samples.github.io/amazon-textract-textractor/notebooks/using_queries.html

Initialize the Textractor client and use the analyze_document method with TextractFeatures.QUERIES and a list of queries. This example uses an image file as the document source.

```python
import os
from textractor import Textractor
from textractor.data.constants import TextractFeatures

extractor = Textractor(profile_name="default")
document = extractor.analyze_document(
    file_source=Image.open("../../../tests/fixtures/form.png"),
    features=[TextractFeatures.QUERIES],
    queries=queries
)
```

--------------------------------

### Example Usage: Analyze and Print Tables

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Demonstrates the complete workflow: analyzing a document for tables and then printing the linearized results. Replace 'your-bucket-name' and 'your-document.pdf' with your actual S3 bucket and document.

```python
bucket = 'your-bucket-name'
document = 'your-document.pdf'

response = analyze_document_tables(bucket, document)
table_data = get_table_results(response)
print_table_data(table_data)

```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to extract a specific piece of information. Ensure you have initialized the Textractor client.

```python
from textractor.tools.utils import query_document

query = "What is the total amount?"
response = query_document(document=doc, query=query)
print(response)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find a specific piece of information. Ensure you have initialized Textractor and loaded your document.

```python
from textractor import Textractor

txt = Textractor("YOUR_REGION", "YOUR_BUCKET_NAME")
doc = txt.start("YOUR_DOCUMENT_NAME")

# Define a query to find the 'Invoice Number'
query = "Invoice Number"

# Run the query on the document
response = doc.query(query)

# Print the query response
print(response)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to extract specific information. Ensure you have the necessary Textractor components imported.

```python
from textractor.tools.utils import query_document

query_document(document, "What is the total amount?")
```

--------------------------------

### Access Column Content

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Accesses the content of a specific column. This example shows how to get the text content of all cells in the first column.

```python
column_texts = [cell.text for cell in tables[0].columns[0]]
```

--------------------------------

### Access Row Content

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Accesses the content of a specific row. This example shows how to get the text content of all cells in the first row.

```python
row_texts = [cell.text for cell in tables[0].rows[0]]
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

This snippet demonstrates a fundamental query to retrieve specific data. Ensure necessary imports are present.

```python
from textractor.tools.utils import get_document_from_s3

document = get_document_from_s3(bucket="your-bucket-name", key="your-document-key.pdf")

# Example query: Get all pages with more than 100 words
query_result = document.query(lambda page: len(page.words) > 100)

print(f"Found {len(query_result)} pages matching the query.")
```

--------------------------------

### Analyze Expense with Custom Configuration

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_analyze_expense.ipynb.txt

Demonstrates how to use custom configurations for expense analysis, such as specifying a particular feature or setting a minimum confidence score. This allows for more tailored data extraction.

```python
from trp.analyze.expense import AnalyzeExpense, ExpenseFeatures

expense_analyzer = AnalyzeExpense(
    features=[ExpenseFeatures.LINE_ITEM_GROUPS],
    minimum_ைconfidence=0.9
)
response = expense_analyzer.analyze_expense("document.pdf")
print(response)
```

--------------------------------

### Get Signature Bounding Box

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to retrieve the bounding box coordinates for detected signatures. It iterates through the signatures found in a document.

```python
from textractor.data.document import Document

if __name__ == "__main__":
    # Example usage:
    doc = Document(path="path/to/your/document.pdf")
    signatures = doc.signatures
    for signature in signatures:
        print(f"Signature bounding box: {signature.bounding_box}")
```

--------------------------------

### Get Signature Field Information

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to extract information about signature fields, including their geometry and confidence score, from the Textract response.

```python
import boto3

def get_signature_field_info(bucket, document):

    client = boto3.client('textract')

    # Call analyze_document with Signature detection
    response = client.analyze_document( 
        Document={'S3Object': {'Bucket': bucket, 'Name': document}},
        FeatureTypes=['SIGNATURE']
    )

    # Extract signature field information
    signature_fields = []
    for item in response['DocumentMetadata']['Pages']:
        for block in response['Blocks']:
            if block['BlockType'] == 'SIGNATURE_FIELD':
                signature_fields.append({
                    'page': item['Page'],
                    'id': block['Id'],
                    'confidence': block['Confidence'],
                    'geometry': block['Geometry']
                })

    return signature_fields


# Example usage:
# bucket_name = 'your-s3-bucket-name'
# document_name = 'your-document-name.pdf'
# signature_info = get_signature_field_info(bucket_name, document_name)
# print(f"Found {len(signature_info)} signature fields.")
# for info in signature_info:
#     print(f"Page: {info['page']}, ID: {info['id']}, Confidence: {info['confidence']}")
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to extract data. Ensure you have initialized the Textractor client and loaded your document.

```python
from textractor import Textractor

client = Textractor(region_name="us-east-1")

document = client.start_document_analysis(
    "s3://textract-sample-data/invoice/sample-invoice.pdf",
    "invoice"
)

# Example: Get all invoice IDs
invoice_ids = document.get_by_field("INVOICE_ID")

for invoice_id in invoice_ids:
    print(f"Invoice ID: {invoice_id.value}")
```

--------------------------------

### Load Document and Get Tables

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Loads a document and extracts all tabular data. Ensure the 'textractor' library is installed and the document path is correct.

```python
from textractor import Textractor

txt = Textractor.from_path("path/to/your/document.pdf")
tables = txt.get_tables()
print(f"Found {len(tables)} tables.")
```

--------------------------------

### Extract and Print Tables

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

This example demonstrates how to use the Textractor library to get document content and then extract and print any tables found within it.

```python
from textractor import Textractor

txtr = Textractor("us-east-2")
doc = txtr.start(file_path='sample.pdf')
tables = process_document_tables(doc)

for i, table in enumerate(tables):
    print(f"\n--- Table {i+1} ---")
    for row in table:
        print('\t'.join(row))

```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to retrieve all pages from a document. This is a foundational step for many extraction tasks.

```python
from textractor import Textractor

txt = Textractor("path/to/your/document.pdf")

# Get all pages
response = txt.get_pages()
print(response)
```

--------------------------------

### Access Cell Content

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Accesses the text content of a specific cell within a table. This example shows how to get the text from the first cell.

```python
cell_text = tables[0].cells[0].text
```

--------------------------------

### Get Table as Pandas DataFrame

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization_continued.ipynb.txt

Converts a table into a Pandas DataFrame for convenient data manipulation and analysis. Requires the pandas library to be installed.

```python
import pandas as pd

df = tables[0].to_pandas()
print(df)
```

--------------------------------

### Get Specific Pages as Forms

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Retrieves the forms from a range of pages as a list of forms. Specify start and end page numbers (inclusive).

```python
pages_3_to_5_forms = document.get_pages_forms(start_page=3, end_page=5)
```

--------------------------------

### Querying with Document Configuration

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

This example demonstrates how to use DocumentConfig to specify settings for query execution, such as the desired feature types (e.g., QUERIES). This allows for fine-grained control over the Textract analysis.

```python
from trp.trp_query import Query, QueryConfig
from trp.trp_utils import DocumentConfig

# Assuming you have a Textract client and a document loaded
# Example:
# textract_client = boto3.client("textract", region_name="us-east-1")
# document = Document("path/to/your/document.pdf", textract_client=textract_client)

# Define a query
query_config = QueryConfig(query_string="Payment Method")
queries = [Query(query_config=query_config)]

# Configure document settings for query execution
document_config = DocumentConfig(feature_type="QUERIES")

# Execute the query with document configuration
# response = document.query(queries=queries, document_config=document_config)

# The response will be generated based on the specified feature type
# print(response)
```

--------------------------------

### Get Specific Pages as Tables

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Retrieves the tables from a range of pages as a list of tables. Specify start and end page numbers (inclusive).

```python
pages_3_to_5_tables = document.get_pages_tables(start_page=3, end_page=5)
```

--------------------------------

### Get Specific Pages as Images

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Retrieves the content from a range of pages as a list of images. Specify start and end page numbers (inclusive).

```python
pages_3_to_5_images = document.get_pages_images(start_page=3, end_page=5)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find a specific key-value pair. Ensure you have initialized the Textractor client and loaded your document.

```python
from textractor import Textractor

txt = Textractor(region_name="us-east-1")
doc = txt.start_document_analysis("document.pdf")

# Example: Query for a specific key
query_result = doc.query("Invoice Number")
print(query_result)
```

--------------------------------

### Initialize Textractor for Visualization

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/visualizing_results.ipynb.txt

Set up the Textractor library to process documents and prepare for result visualization. Ensure you have the necessary AWS credentials and Textract permissions configured.

```python
from textractor.visualizer import Visualizer
from textractor import Textractor

# Initialize Textractor with your AWS region
txt_processor = Textractor(region_name="us-east-1")

# Initialize the Visualizer
visualizer = Visualizer()

# Load a document (e.g., from a file path)
doc = txt_processor.parse_document(document="path/to/your/document.pdf")
```

--------------------------------

### Get Signature Field Information

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to retrieve detailed information about detected signature fields, including their bounding boxes and confidence scores.

```python
from trp.aws.aws_textract_document import AWSTextractDocument

def get_signature_field_info(bucket, document):
    """Gets information about signature fields in a document.

    Args:
        bucket (str): The S3 bucket name.
        document (str): The S3 object key for the document.

    Returns:
        list: A list of dictionaries, each containing information about a signature field.
    """
    doc = AWSTextractDocument(s3_bucket=bucket, s3_object_key=document)
    signature_fields = doc.detect_signatures()
    field_info = []
    for field in signature_fields:
        field_info.append({
            "type": field.type,
            "confidence": field.confidence,
            "geometry": field.geometry
        })
    return field_info
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to filter data. Ensure you have the necessary Textractor library imported.

```python
from textractor.tools.query import Query

query = Query()
query.add_filter(field="field_name", operator="=", value="value")
results = query.run(data)
```

--------------------------------

### Get Signature Fields from a Document

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example demonstrates how to retrieve signature fields from a document using the AWSTextractDocument class. It assumes the document has already been processed.

```python
from trp.aws.aws_textract_document import AWSTextractDocument

def get_signature_fields(bucket: str, document: str):
    """Gets signature fields from a document.

    Args:
        bucket: The S3 bucket name.
        document: The document file name.

    Returns:
        A list of signature fields.
    """
    doc = AWSTextractDocument(bucket=bucket, document=document)
    return doc.get_signature_fields()
```

--------------------------------

### Initialize Textractor with Gateway and Direct Path Queries

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Initialize the Textractor client with both gateway and direct path query configurations. This allows for flexible query execution.

```python
from textractor.textractor import Textractor
from textractor.data.config import TextractorConfig

config = TextractorConfig(queries=["What is the invoice number?", "What is the total amount?"], gateway_queries=["What is the vendor name?"])
txt = Textractor(config=config)
```

--------------------------------

### Process Document with Textractor

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Load and process a document using Textractor. This example shows how to get all text elements, including tables and key-value pairs.

```python
from textractor import Textractor
from textractor.data.text_elements import TextType

txt = Textractor(profile_name="your-profile-name")
doc = txt.process_document("path/to/your/document.pdf")

# Get all text elements
all_text_elements = doc.get_text_elements()

# Get tables
tables = doc.get_table_elements()

# Get key-value pairs
key_value_pairs = doc.get_key_value_elements()

# Convert tables to JSON
tables_json = [table.to_json(TextType.LINEARIZED) for table in tables]

# Convert key-value pairs to JSON
key_value_json = [kv.to_json(TextType.KEY_VALUE) for kv in key_value_pairs]

print("Tables:", tables_json)
print("Key-Value Pairs:", key_value_json)
```

--------------------------------

### Get Table as Pandas DataFrame

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Converts a detected table into a Pandas DataFrame for easier data manipulation and analysis. Requires the pandas library to be installed.

```python
import pandas as pd

df = document.tables[0].to_pandas()
print(df)
```

--------------------------------

### Initialize Textract Client (Python)

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

Example of initializing the Amazon Textract client in Python. This is a prerequisite for using Textract functionalities.

```python
import boto3

textract_client = boto3.client('textract')
document = {
    'S3Object': {
        'Bucket': 'your-bucket-name',
        'Name': 'your-document-name'
    }
}
```

--------------------------------

### Claude LLM Integration Setup

Source: https://aws-samples.github.io/amazon-textract-textractor/notebooks/tabular_data_linearization_continued.html

Sets up the AWS SDK for Bedrock to interact with Claude models for natural language processing.

```python
import json
import boto3

def get_response_from_claude(context, prompt_data):
    body = json.dumps({
        "prompt": f"Human: Given the following document:
        {context}
        Answer the following:\n {prompt_data}
        Assistant:"
```

```python
        "max_tokens_to_sample": 2000,
        "top_k": 1,
    })
    modelId = f'anthropic.claude-instant-v1' # change this to use a different version from the model provider
    accept = '*/*'
    contentType = 'application/json'

    response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    answer = response_body.get('completion')

    return answer

os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
os.environ["BEDROCK_ENDPOINT_URL"] = "https://bedrock-runtime.us-west-2.amazonaws.com"

bedrock = boto3.client(service_name='bedrock-runtime',region_name='us-west-2',endpoint_url='https://bedrock-runtime.us-west-2.amazonaws.com')
```

--------------------------------

### Get Document Text with Layout (Final Example)

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/layout_analysis_for_text_linearization.ipynb.txt

A comprehensive function to extract document text, handling various block types and their hierarchical relationships.

```python
def get_document_text_with_layout_final(response):
    text = ""
    for block in response["Blocks"]:
        if block["BlockType"] == "FORM":
            text += block["Text"] + "\n\n"
        elif block["BlockType"] == "TABLE":
            text += block["Text"] + "\n\n"
        elif block["BlockType"] == "PARAGRAPH":
            text += block["Text"] + "\n\n"
        elif block["BlockType"] == "LINE":
            text += block["Text"] + "\n"
        elif block["BlockType"] == "WORD":
            text += block["Text"] + " "
        elif block["BlockType"] == "PAGE_NUMBER":
            text += block["Text"] + "\n\n"
    return text
```

--------------------------------

### Get Specific Pages as CSV

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Retrieves the content from a range of pages as a single CSV file. Specify start and end page numbers (inclusive).

```python
pages_3_to_5_csv = document.get_pages_csv(start_page=3, end_page=5)
```

--------------------------------

### Gateway Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Shows how to use a gateway query, which is useful for more complex scenarios or when direct queries are not sufficient. This method allows for more control over the query process.

```python
from textractor import Textractor

txt = Textractor("us-east-1")
doc = txt.open_document("document.pdf")

# Gateway query
response = doc.gateway_query("What is the total amount?")
print(response.answer)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

This snippet demonstrates a fundamental query to extract data. Ensure you have the necessary imports before running.

```python
from textractor.tools.utils import get_document_information

doc_info = get_document_information(document_path)

# Example query: Extract all text from the document
query = "SELECT text FROM document"
results = doc_info.query(query)

print(results)
```

--------------------------------

### Get Specific Pages as PDF

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Retrieves the content from a range of pages as a single PDF file. Specify start and end page numbers (inclusive).

```python
pages_3_to_5_pdf = document.get_pages_pdf(start_page=3, end_page=5)
```

--------------------------------

### Get Document Analysis Results

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/layout_analysis_for_text_linearization.ipynb.txt

Retrieves the results of an asynchronous document analysis job using the JobId. This is typically called after starting a job with `start_document_analysis`.

```python
response = textract.get_document_analysis(
    JobId=job_id
)

# Process the results
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print(item["Text"])
```

--------------------------------

### Process Signatures from S3

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to detect signatures in a document stored in an S3 bucket. Ensure the 'textractor' library is installed and AWS credentials are configured.

```python
from textractor.data.document import Document

doc = Document(s3_bucket="your-bucket-name", s3_key="path/to/your/document.pdf")
doc.detect_signatures()

for signature in doc.signatures:
    print(f"Signature found at: {signature.bounding_box}")
```

--------------------------------

### Basic Signature Detection Query

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example demonstrates a basic query to detect signatures in a document. Ensure the document is uploaded and accessible.

```python
from textractor.data.document import Document

doc = Document.open("path/to/your/document.pdf")

signatures = doc.get_signatures()

for signature in signatures:
    print(f"Signature found at: {signature.geometry}")
```

--------------------------------

### Get Signature Information with Confidence Score

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example demonstrates how to retrieve signature information along with their confidence scores. This is useful for filtering signatures based on reliability.

```python
from trp.aws.aws_textract_document import AWSTextractDocument
from trp.aws.aws_textract_document import AWSTextractDocumentConfig

doc = AWSTextractDocument(AWSTextractDocumentConfig(profile="default", region_name="us-east-1"))
doc.detect_signatures(document="/path/to/your/document.pdf")
for signature in doc.signatures:
    print(f"Signature ID: {signature.id}, Confidence: {signature.confidence}")
```

--------------------------------

### Advanced Querying with Specific Keys

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

This example shows how to query for specific keys, allowing for more precise data extraction. It's useful when you know the exact field names you need.

```python
doc = t.start(
    "s3://textract-sample-data/sample-us-west-2/invoice.png"
)

# Query for a specific key
results = doc.query_kvp(["invoice"])

# Print the results
for result in results:
    print(f"{result.key}: {result.value}")

```

--------------------------------

### Get Table as Pandas DataFrame

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

Converts a specific linearized table into a Pandas DataFrame for easier data manipulation and analysis. Requires the pandas library to be installed.

```python
import pandas as pd

df = pd.DataFrame(linearized_tables[0])
```

--------------------------------

### Example Usage: Detect and Process Signatures

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to call the detect_signatures and process_signature_detection_results functions with a sample S3 bucket and document name. Make sure to replace 'your-bucket-name' and 'your-document.pdf' with your actual S3 details.

```python
if __name__ == "__main__":
    bucket_name = "your-bucket-name"
    document_name = "your-document.pdf"

    signature_response = detect_signatures(bucket_name, document_name)
    process_signature_detection_results(signature_response)

```

--------------------------------

### AnalyzeDocument API Call

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

This is a basic example of calling the AnalyzeDocument API to get document analysis results. Ensure you have the necessary AWS credentials and permissions configured.

```python
import boto3

client = boto3.client('textract')

response = client.analyze_document(
    Document={'S3Object': {'Bucket': 'YOUR_BUCKET_NAME', 'Name': 'YOUR_DOCUMENT_NAME'}},
    FeatureTypes=['FORMS', 'TABLES']
)

# Process the response here
print(response)
```

--------------------------------

### Basic Query Example

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_queries.ipynb.txt

Demonstrates a simple query to find specific text within a document. Ensure Textractor is initialized before use.

```python
from textractor import Textractor

txt = Textractor()

# Example: Find all occurrences of 'invoice'
results = txt.find("invoice")

for result in results:
    print(f"Found 'invoice' at page {result.page_number}, bounding box: {result.bounding_box}")
```

--------------------------------

### Find a Word and Get its Page Number

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/finding_words_within_a_document.ipynb.txt

This example shows how to find a word and retrieve the page number where it was found. This is helpful for organizing and referencing search results.

```python
from textractor.textractor import Textractor


tractor = Textractor()

# Find all occurrences of "data" and get their page numbers
found_words_with_pages = tractor.find_all_words("data", get_page_numbers=True)

for word_info in found_words_with_pages:
    print(f"Found 'data' on page: {word_info['page']}")
```

--------------------------------

### Example Usage of Signature Detection

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

Demonstrates how to use the detect_signatures function with a sample document. Ensure you have AWS credentials configured.

```python
import boto3

# Initialize Textract client
textract_client = boto3.client('textract')

# Load document from S3
with open("signature.png", "rb") as document_file:
    document_bytes = document_file.read()

document = {
    'Bytes': document_bytes
}

signatures = detect_signatures(textract_client, document)

if signatures:
    print(f"Detected {len(signatures)} signature fields:")
    for sig in signatures:
        print(f"- ID: {sig['Id']}, Geometry: {sig['Geometry']}")
else:
    print("No signature fields detected.")
```

--------------------------------

### Install Textractor with Multiple Extras

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/installation.rst.txt

Install Textractor by specifying multiple extras, such as pdf and torch, separated by commas.

```bash
pip install amazon-textract-textractor[pdf,torch]
```

--------------------------------

### Get Table as Pandas DataFrame

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization_continued.ipynb.txt

Retrieves a specific table from the document as a Pandas DataFrame for convenient data manipulation and analysis. Requires the 'pandas' library to be installed.

```python
import pandas as pd
from textractor.data.document import Document

doc = Document(bucket_name="amazon-textract-sample-data", document_name="sample_invoice.pdf")
doc.process(verbose=True)

tables = doc.tables

# Assuming you want the first table
df = tables[0].to_pandas()
print(df)
```

--------------------------------

### Get Specific Pages as JSON

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/document_linearization_to_markdown_or_html.ipynb.txt

Retrieves the analysis results for a range of pages as a single JSON object. Specify start and end page numbers (inclusive).

```python
pages_3_to_5_json = document.get_pages_json(start_page=3, end_page=5)
```

--------------------------------

### Initialize Textractor

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_analyze_id.ipynb.txt

Instantiate the Textractor client. This is the first step before performing any Textract operations.

```python
from textractor import Textractor

txt = Textractor(profile_name="your-profile-name", region_name="your-region-name")
```

--------------------------------

### Analyze ID with Specific Configuration

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/using_analyze_id.ipynb.txt

This example demonstrates how to analyze an ID with specific configurations, such as setting the region and profile. Adjust 'us-east-1' and 'default' as needed.

```python
from textractor.tools.document import Document
from textractor.data.document import DocumentType

doc = Document(document_type=DocumentType.ANALYZE_ID, region_name="us-east-1", profile_name="default")
doc.analyze()
doc.save_json("analyze_id_output_configured.json")
```

--------------------------------

### Detect Signatures in a Document (Node.js)

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This Node.js example demonstrates how to detect signatures in a document using the AWS SDK for JavaScript. Ensure you have the AWS SDK installed and configured.

```javascript
const AWS = require('aws-sdk');
const fs = require('fs');

AWS.config.update({ region: 'us-east-1' });

const textract = new AWS.Textract();

fs.readFile('invoice.png', (err, data) => {
    if (err) throw err;

    const params = {
        Document: {
            Bytes: data
        }
    };

    textract.detectDocumentText(params, (err, response) => {
        if (err) {
            console.log(err);
            return;
        }
        response.Blocks.forEach((block) => {
            if (block.BlockType === 'SIGNATURE') {
                console.log(`Found signature: ${block.Id}`);
            }
        });
    });
});
```

--------------------------------

### Install Amazon Textract Textractor from PyPI

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/installation.rst.txt

Install the base package from PyPI. Use extras like [pdfium] for PDF rasterization.

```bash
pip install amazon-textract-textractor
```

--------------------------------

### Process Signatures from an Image

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example shows how to detect signatures in an image file using Amazon Textract. Ensure the 'textractor' library is installed and the image path is correct.

```python
from textractor.tools.signature_detection import SignatureDetection

image_path = "/path/to/your/image.png"
signature_detector = SignatureDetection(image_path=image_path)
signature_detector.detect_signatures()

# Iterate through detected signatures and print their information
for signature in signature_detector.signatures:
    print(f"Detected signature: {signature.bounding_box}")
```

--------------------------------

### Loading a Document with Specific Configurations

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/tabular_data_linearization.ipynb.txt

This example illustrates loading a document with Textractor, specifying configurations like skipping duplicates and normalizing text and layout. Adjust the document path as needed.

```python
from textractor.data.document import Document
from textractor.data.document.document import DocumentConfiguration

doc = Document(document_path="path/to/your/document.pdf", document_configuration=DocumentConfiguration(skip_duplicates=True, normalize_text=True, normalize_layout=True, normalize_table=True))
```

--------------------------------

### Get Signature Confidence Score

Source: https://aws-samples.github.io/amazon-textract-textractor/_sources/notebooks/signature_detection.ipynb.txt

This example demonstrates how to retrieve the confidence score for detected signatures. The confidence score indicates the likelihood that the detected field is indeed a signature.

```python
from trp.signature import SignatureDetector

def get_signature_confidence(document_path):
    detector = SignatureDetector(document_path=document_path)
    detector.detect()
    for signature in detector.signatures:
        print(f"Signature on page {signature.page} has confidence: {signature.confidence:.2f}")

get_signature_confidence("path/to/your/document.pdf")
```