### Install budgetnlp Package

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/costco_audit_complexity.md

Install the budgetnlp library using pip. This is a prerequisite for running the analysis script.

```bash
pip install budgetnlp
```

--------------------------------

### Install secfiler Package

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/construct_sec_documents.md

Install the secfiler package using pip. This is the first step to using its functionalities.

```bash
pip install secfiler
```

--------------------------------

### Example: Download IBM 10-K Submissions

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Example demonstrating how to download all IBM 10-K filings between January 1, 2019, and January 1, 2024, using the 'sec' provider.

```python
portfolio = Portfolio('ibm')
portfolio.download_submissions(filing_date=('2019-01-01', '2024-01-01'), submission_type='10-K',
                          provider='sec')
```

--------------------------------

### Install txt2dataset

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/gm_impairment_event.md

Install the txt2dataset library using pip. This is a prerequisite for running the data extraction and structuring scripts.

```bash
pip install txt2dataset
```

--------------------------------

### Example: Download Apple Graphics with Datamule

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Example showing how to download only the 'GRAPHIC' documents from Apple's submissions using the 'datamule-tar' provider.

```python
portfolio = Portfolio('apple_graphics')
portfolio.download_submissions(ticker='AAPL',document_type="GRAPHIC",
                          provider='datamule-tar')
```

--------------------------------

### Example Usage

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md

Provides a Python code example demonstrating how to download SEC submissions and process documents using the Datamule library.

```APIDOC
Usage
```
from datamule import Portfolio
from time import time
from datamule.tags.config import set_dictionaries

set_dictionaries(['13fhr_information_table_cusips'])

portfolio = Portfolio('13fhr')
portfolio.download_submissions(submission_type=['13F-HR'],filing_date=('2008-09-01','2008-09-30'))

for sub in portfolio:
    for doc in sub:
        results = doc.text.tags.cusips
        if results is not None:
            print(results)
```
```

--------------------------------

### Example: Download IBM Form 4s with Datamule

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Example demonstrating the download of all IBM Form 4 and 4/A submissions since 1994 using the 'datamule-sgml' provider.

```python
portfolio = Portfolio('ibm')
portfolio.download_submissions(submission_type=['4','4/A'],
                          provider='datamule-sgml')
```

--------------------------------

### Install Datamule Python Package

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/quickstart.md

Install the datamule package using pip. This is the first step before using any of its functionalities.

```bash
pip install datamule
```

--------------------------------

### Quickstart: Initialize Portfolio and Download Filings

Source: https://github.com/john-friedman/datamule-python/blob/main/readme.md

Initialize a Portfolio object for a given ticker and download SEC submissions of a specific type.

```python
from datamule import Portfolio

portfolio = Portfolio('amzn')
portfolio.download_submissions(ticker='AMZN',submission_type='10-K')
```

--------------------------------

### Datamule Submission Loading Start

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/parse_fiscal_year_inline_xbrl.ipynb

This output indicates the start of loading submission data.

```text
Loading submissions
```

--------------------------------

### Example: Chained Text Filtering

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Demonstrates how to chain multiple `filter_text` calls to progressively narrow down submissions before downloading. This example filters for 'climate change' and 'drought' within 10-K filings for a specific date range.

```python
portfolio.filter_text("climate change", filing_date=('2019-01-01', '2019-01-31'), submission_type='10-K')
portfolio.filter_text('drought', filing_date=('2019-01-01', '2019-01-31'), submission_type='10-K')
portfolio.download_submissions(filing_date=('2019-01-01', '2019-01-31'), submission_type='10-K')
```

--------------------------------

### Example: Delete and Download Submissions

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Demonstrates creating a Portfolio object, deleting its folder, downloading submissions, and then deleting it again. This shows a common workflow for managing portfolio data.

```python
from datamule import Portfolio
port = Portfolio('deletetest')
port.delete()
port.download_submissions(ticker='MSFT',submission_type='10-K')
port.delete()
port.download_submissions(ticker='MSFT',submission_type='10-K')
```

--------------------------------

### Get Sections by Title Regex

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md

Retrieves all sections whose titles start with the specified regex pattern.

```python
get_section(title_regex= r"income.*", format='dict')
```

--------------------------------

### Iterate Documents by Type

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Iterates through documents in a portfolio based on their type. Example shows printing the path of each '10-K' document.

```python
for document in portfolio.document_type('10-K'):
    print(document.path)
```

--------------------------------

### Get Tables by Description Regex

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md

Fetches tables from a document where the description matches a given regex pattern. Requires downloading submissions first.

```python
from datamule import Portfolio

portfolio = Portfolio('DEFM14A')
portfolio.download_submissions(cik='943324', submission_type='DEFM14A', document_type='DEFM14A', filing_date=('2011-10-25','2011-10-25'))
for doc in portfolio.document_type('DEFM14A'):
    print(str(doc.get_tables(description_regex=r'(?i)golden parachute')[0]))
```

--------------------------------

### Test SEC Document Reconstruction with datamule

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/construct_sec_documents.md

This example demonstrates how to fetch an SEC document using datamule's Submission class, save the original XML, and then reconstruct and save it using secfiler's construct_document function. It's useful for testing the efficacy of the reconstruction process.

```python
from datamule import Submission
from secfiler import construct_document

sub = Submission(url='https://www.sec.gov/Archives/edgar/data/789019/000078901926000028/0000789019-26-000028.txt')

for doc in sub:
    if doc.type == '4':
        with open('original.xml', 'wb') as f:
            f.write(doc.content)
        xml = construct_document(doc.tables, '4')
        with open('reconstructed.xml', 'wb') as f:
            f.write(xml)
```

--------------------------------

### Load and Display cik_cusip_crosswalk Dataset

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/datasets.md

Imports the cik_cusip_crosswalk dataset and displays the first few rows using pandas. Ensure pandas is installed.

```python
from datamule.datasets import cik_cusip_crosswalk
import pandas as pd

print(pd.DataFrame(cik_cusip_crosswalk).head())
```

--------------------------------

### Iterate Submissions in Portfolio

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

Iterates through all submissions within a portfolio. Example shows printing the path of each submission.

```python
for submission in portfolio:
    print(submission.path)
```

--------------------------------

### Import Portfolio from Datamule

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb

Imports the Portfolio class from the datamule library. This is a common starting point for using the library's features.

```python
from datamule import Portfolio
```

--------------------------------

### Financial Data Structure Example

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb

This snippet displays a sample of the financial data structure, including balance sheet and income statement items with their respective values and periods.

```json
period_start_date
{'balanceSheet': {'cashAndEquivalents': [{'value': '1056000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '943000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '1056000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '943000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'accountsReceivableNet': [{'value': '1523000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '1418000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'inventoryNet': [{'value': '688000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '604000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '688000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '604000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'totalAssets': [{'value': '34648000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '32963000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '17305000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '16512000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '8544000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '7728000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '4609000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '4545000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '3495000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '3466000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '695000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '712000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '34648000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '32963000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'goodwill': [{'value': '1182000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '1177000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'accumulatedDepreciationPpe': [{'value': '8734000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '8486000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'currentLiabilities': [{'value': '5753000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '4732000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'accountsPayable': [{'value': '1288000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '1153000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'stockholdersEquity': [{'value': '3044000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '2798000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'currentAssets': [{'value': '6142000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '5356000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'currentRatio': [{'value': 1.0676168955327656, 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': 1.1318681318681318, 'period_start_date': '2021-12-31', 'period_end_date': None}], 'bookValuePerShare': [{'value': 4.557843886495044, 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': 4.196200885993774, 'period_start_date': '2021-12-31', 'period_end_date': None}]}, 'incomeStatement': {'totalRevenues': [{'value': '835000000', 'period_start_date': '2022-01-01', 'period_end_date': '2022-03-31'}, {'value': '707000000', 'period_start_date': '2021-01-01', 'period_end_date': '2021-03-31'}, {'value': '20170000
```

--------------------------------

### Example SEC Search Results Format

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/index/index.md

Illustrates the structure of a single search result dictionary, including metadata like index, ID, score, and source details.

```json
[
{
    "_index": "edgar_file",
    "_id": "0001628280-24-002390:tsla-2023x12x31xex211.htm",
    "_score": 10.79173,
    "_source": {
        "ciks": ["0001318605"],
        "period_ending": "2023-12-31",
        "file_num": ["001-34756"],
        "display_names": ["Tesla, Inc.  (TSLA)  (CIK 0001318605)"],
        "root_forms": ["10-K"],
        "file_date": "2024-01-29",
        "form": "10-K",
        "adsh": "0001628280-24-002390",
        "file_type": "EX-21.1",
        "file_description": "EX-21.1",
        # Additional fields omitted for brevity
    }
},
... 
]
```

--------------------------------

### Download SEC Filings with Datamule

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/parse_xbrl.ipynb

Use this snippet to download specific SEC filings (e.g., 10-K) for a given ticker symbol. Ensure the datamule library is installed and configured.

```python
from datamule import Portfolio
import pandas as pd

# get data
portfolio = Portfolio("adt")
portfolio.download_submissions(submission_type='10-K',document_type='10-K',ticker='ADT')
```

--------------------------------

### Get Table for XBRL Data

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/sheet/sheet.md

Queries the XBRL data from the SEC filings database using SQL and saves the results as Parquet files. Ensure the 'datamule api_key' is configured.

```python
from datamule import Sheet

sheet = Sheet("xbrl-results")

files = sheet.get_table("""
    SELECT accessionNumber, taxonomy, name, value
    FROM simple_xbrl
    WHERE taxonomy = 'us-gaap'
      AND name = 'NetIncomeLoss'
    LIMIT 100
")
```

--------------------------------

### Get Table with SQL Query

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/sheet/sheet.md

Executes a SQL query against the SEC filings database and saves results as Parquet files. Specify an output directory or let it default to the Sheet path. Requires a datamule api_key.

```python
from datamule import Sheet

sheet = Sheet("query-results")

files = sheet.get_table( করতে
    SELECT accessionNumber, submissionType, filingDate
    FROM submissions_metadata
    WHERE submissionType = '10-K'
    LIMIT 100
) 

print(files)
```

--------------------------------

### Extract Covenant Breach Risk Contexts

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/ge_covenant_breach_risk.md

Downloads GE's 10-Q filings, extracts text related to financial covenants using predefined keywords, and saves the contexts to a CSV file. Ensure 'budgetnlp' and 'datamule' libraries are installed.

```python
from datamule import Portfolio
from txt2dataset import DatasetBuilder
from budgetnlp import extract_contexts
import csv
from pydantic import BaseModel
from typing import List, Literal
from collections import defaultdict

portfolio = Portfolio('ge_near_threshold_risk')
portfolio.download_submissions(ticker='GE', submission_type=['10-Q'], document_type=['10-Q'])

data = []
for sub in portfolio:
    for doc in sub:
        if doc.type in ['10-Q']:
            if doc.extension in ['.htm', '.html']:
                try:
                    # Define keywords that are associated with threshold risk.
                    keywords = [
                        "leverage",
                        "covenant",
                        "headroom",
                        "waiver",
                        "compliance",
                        "interest coverage",
                        "debt-to-EBITDA",
                        "breach",
                        "amendment",
                        "cushion",
                        "violation"
                    ]
                    # Get context around keyword matches
                    contexts = extract_contexts(doc.text, keywords, context_sentences=2)
      
                    for idx,context in enumerate(contexts):
                        data.append({'accession_id': f"{sub.accession}_{idx}", 'document_type': doc.type, 'filing_date': doc.filing_date,
                                    "context": context, 'filer_cik': sub._filer_cik})
                except:
                    pass

with open('ge_near_threshold_risk_excerpts.csv', 'w', encoding='utf-8', newline='') as csvfile:
    fieldnames = ['accession_id','document_type', 'filing_date', 'context', 'filer_cik']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)
```

--------------------------------

### Query SEC Proxy Voting Records with SQL

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Query proxy voting records using SQL. This example demonstrates filtering by CUSIP and limiting results. Requires the 'datamule' library and a 'proxy' sheet.

```python
from datamule import Sheet

sheet = Sheet('proxy')
files = sheet.get_table('"
    SELECT *
    FROM proxy_voting_record
    WHERE cusip = \'037833100\'
    LIMIT 1000
"')
print(files)
```

--------------------------------

### Filter Filings by Document Type

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/monitor_new_filings.md

This example shows how to filter SEC filings by document type after retrieving them using DataMule's polling mechanism. It iterates through documents within a submission to check their types.

```python
from datamule import Portfolio, Submission
from datamule.utils.convenience import construct_sgml_url


portfolio = Portfolio('monitor')

def data_callback(hits):
    for hit in hits:
        sgml_url = construct_sgml_url(hit['accession'],hit['ciks'][0])

        sub = Submission(url=sgml_url)
        for doc in sub:
            if doc.type == "EX-99.1":
                print(f"Length: {len(doc.text)}")

portfolio.monitor_submissions(data_callback=data_callback, interval_callback=None,
                            polling_interval=1000, quiet=True, start_date=None,
                            validation_interval=60000)
```

--------------------------------

### Download and Process SEC Filings with Tag Extraction

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md

Example of downloading 13F-HR submissions for a specific date range and iterating through documents to extract and print CUSIP tags from their text content. Requires setting up tag dictionaries.

```python
from datamule import Portfolio
from time import time
from datamule.tags.config import set_dictionaries

set_dictionaries(['13fhr_information_table_cusips'])

portfolio = Portfolio('13fhr')
portfolio.download_submissions(submission_type=['13F-HR'],filing_date=('2008-09-01','2008-09-30'))

for sub in portfolio:
    for doc in sub:
        results = doc.text.tags.cusips
        if results is not None:
            print(results)
```

--------------------------------

### Initialize and Build Dataset with txt2dataset

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/gm_impairment_event.md

Prepare entries from the CSV data and initialize DatasetBuilder with a prompt, schema, and model. The build() method processes the entries to extract structured data.

```python
# Prepare entries from the CSV
entries = []
for item in data:
    if item['item206']:  # Only process if there's Item 2.06 text
        identifier = item['accession']
        text = item['item206']
        entries.append((identifier, text))

# Define the prompt
prompt = "Extract the impairment amount (as an average if a range is given), asset class (such as equity investment, goodwill, PPE, or intangibles), and business segment from this text."

# Initialize the DatasetBuilder
builder = DatasetBuilder(
    prompt="Extract impairment and restructuring charge information from this SEC filing",
    schema=ImpairmentExtraction,
    model="gemini-2.5-flash-lite",
    entries=entries,
    rpm=1000
)
print(builder.api_key)

# Build the dataset
builder.build()

# Save the structured output
builder.save('gm_impairments_structured.csv')
```

--------------------------------

### Get Tables by Contains Regex

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md

Filters tables to include only those that contain all specified regex patterns within their data.

```python
contains_regex=[r'Director', r'20\d{2}']
```

--------------------------------

### Iterate and Print Fundamentals

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb

Iterate through downloaded submissions and print the fundamentals for each. This is useful for inspecting the retrieved financial data.

```python
fiscal_period_focus_list = []
for sub in portfolio:
    print(sub.fundamentals)
```

--------------------------------

### Initialize Datamule Book for SEC XBRL

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Initializes a Datamule Book object for accessing SEC XBRL data. Contact support for access details.

```python
from datamule import Book

book = Book()
# Contact support for access details
```

--------------------------------

### Portfolio Class and Initialization

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md

The Portfolio class is used to interact with SEC Submissions. It requires a path to a folder containing submission subfolders.

```APIDOC
## Portfolio Class

The `Portfolio` class lets you interact with SEC Submissions. Portfolio's consist of a folder that contains subfolders named after SEC Submission accession numbers.

### Attributes
* `portfolio.path` - path to folder
```

--------------------------------

### Initialize DatasetBuilder for Delay Extraction

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/late_filing_reasons.md

Prepares data for Txt2Dataset by creating entries from the extracted narratives. Defines a Pydantic schema for the expected output and a prompt for the Gemini model to extract delay reasons.

```python
# Define the schema
class DelayInfo(BaseModel):
    type_of_delay: Literal[
        "accounting issues",
        "auditor delay", 
        "internal control weaknesses",
        "acquisition integration",
        "internal review delays",
        "other reasons"
    ]  
    explanation: str 

class DelayExtraction(BaseModel):
    info_found: bool
    data: List[DelayInfo] = []

# Prepare entries from the CSV
entries = []
for item in data:
    if item['narrative']:
        identifier = item['accession']
        text = item['narrative']
        entries.append((identifier, text))

# Define the prompt
prompt = "Extract the type of delay (such as auditor delay, accounting issues, internal control weaknesses, system failure, or other reasons) and the specific explanation for why the 10-K filing was delayed from this text."

# Initialize the DatasetBuilder
builder = DatasetBuilder(
    prompt=prompt,
    schema=DelayExtraction,
    model="gemini-2.5-flash-lite",
    entries=entries,
    rpm=1000
)
print(builder.api_key)
```

--------------------------------

### Get Sections by Title Regex and Class

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md

Retrieves sections where the title matches a given regex pattern and the class is restricted to 'item'.

```python
get_section(title_regex= r"item1.*", format='dict',title_class='item')
```

--------------------------------

### Download and Parse XBRL Data with Datamule

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/parse_fiscal_year_inline_xbrl.ipynb

Use this snippet to download SEC filings (10-Q and 10-K) for a given CIK and date range. It then parses the XBRL data to extract fiscal period information.

```python
# made for https://www.reddit.com/r/algotrading/comments/1lxf7ga/xbrl_deidocumentfiscalperiodfocus_help_needed/

from datamule import Portfolio
import pandas as pd

# get data
portfolio = Portfolio("aes")
portfolio.download_submissions(submission_type=['10-Q','10-K'],document_type=['10-Q','10-K'],
                               cik=['874761'], filing_date=('2022-01-01','2024-12-31'))


fiscal_period_focus_list = []
for sub in portfolio:
    filing_date = sub.metadata.content['filing-date']
    for doc in sub.document_type(['10-Q','10-K']):
        doc.parse_xbrl()
        basic = [{'quarter':item['_val'],'period_start':item['_context']['context_period_startdate'], 'period_end':item['_context']['context_period_enddate']
                  } for item in doc.xbrl if item['_attributes']['name']=='dei:DocumentFiscalPeriodFocus']
        
        
fiscal_period_focus_list.append([{**item, 'type': doc.type, 'filing_date': filing_date} for item in basic])


# sort basic
fiscal_period_focus_list = sorted([item for sublist in fiscal_period_focus_list for item in sublist], key=lambda x: x['filing_date'])

for item in fiscal_period_focus_list:
    print(f"{item['type']} filed on {item['filing_date']}: Quarter {item['quarter']} ({item['period_start']} to {item['period_end']})")
```

--------------------------------

### Initialize Matplotlib Figure and Lines for Animation

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/use_index_to_search_filings_by_email_usage.md

Sets up the Matplotlib figure, axes, color map, and initial empty line objects for each domain. This prepares the plotting environment for the animation function.

```python
# Prepare figure
fig, ax = plt.subplots(figsize=(14, 8))

# Color map for domains
all_domains = list(domain_year_counts.keys())
colors = plt.cm.tab10(np.linspace(0, 1, len(all_domains)))
domain_colors = {domain: colors[i] for i, domain in enumerate(all_domains)}

# Store line objects for each domain
lines = {}
for domain in all_domains:
    line, = ax.plot([], [], marker='o', linewidth=2.5, markersize=6, 
                    label=domain, color=domain_colors[domain], alpha=0)
    lines[domain] = line
```

--------------------------------

### Get Business Street Address from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Retrieves the primary street address for a company using its CIK. Handles single or multiple CIKs.

```python
from datamule.utils.convenience import get_business_street1_from_ciks
print(get_business_street1_from_ciks([1318605, 51143]))
```

```python
['1 TESLA ROAD', '1 NEW ORCHARD ROAD']
```

--------------------------------

### Download and Parse SEC Filings with Portfolio

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/quickstart.md

Use the Portfolio class to download SEC filings by filing date and submission type. It allows iterating through documents, parsing them, and accessing specific data fields. Supports threaded operations with a callback function for faster processing.

```python
from datamule import Portfolio

# Create a Portfolio object
portfolio = Portfolio('output_dir') # can be an existing directory or a new one

# Download submissions
portfolio.download_submissions(
   filing_date=('2023-01-01','2023-01-03'),
   submission_type=['10-K']
)

# Iterate through documents by document type
for ten_k in portfolio.document_type('10-K'):
   ten_k.parse()
   print(ten_k.data['document']['part2']['item7'])

# For faster operations, you can take advantage of built in threading with callback function
def callback(submission):
   print(submission.path)

submission_results = portfolio.process_submissions(callback)
```

--------------------------------

### Download Insider Ownership Metadata Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download filing-level metadata for insider ownership reports. Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='metadata_ownership_table'
)
```

--------------------------------

### Get US Zip Codes from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Returns the ZIP code for US-based companies using their CIK. Returns None for non-US companies.

```python
from datamule.utils.convenience import get_us_zipcodes_from_ciks
print(get_us_zipcodes_from_ciks([1318605, 51143]))
```

```python
['78725', '10504']
```

--------------------------------

### Get Tickers from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Retrieves the stock ticker symbol for a given Central Index Key (CIK). Handles single or multiple CIKs.

```python
from datamule.utils.convenience import get_tickers_from_ciks
print(get_tickers_from_ciks([1318605]))
```

```python
['TSLA']
```

```python
from datamule.utils.convenience import get_tickers_from_ciks
print(get_tickers_from_ciks([1318605, 51143]))
```

```python
['TSLA', 'IBM']
```

--------------------------------

### Get CIKs from Tickers

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Retrieves the Central Index Key (CIK) for a given stock ticker symbol. Handles single or multiple tickers.

```python
from datamule.utils.convenience import get_ciks_from_tickers
print(get_ciks_from_tickers(['IBM']))
```

```python
[51143]
```

```python
from datamule.utils.convenience import get_ciks_from_tickers
print(get_ciks_from_tickers(['TSLA','IBM']))
```

```python
[1318605, 51143]
```

--------------------------------

### Fetch and Reconstruct SEC Documents at Scale

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/construct_sec_documents.md

This script demonstrates fetching SEC documents from a parquet file, reconstructing them using secfiler, and saving the original and reconstructed versions. It requires an API key and allows specifying the document type for processing.

```python
DOCUMENT_TYPE = "SH-ER" # change to whatever document you would like.

import urllib.request
import polars as pl
from datamule import Document
from datamule.utils.convenience import construct_document_url
from secfiler import construct_document


def fetch_url(url):
    req = urllib.request.Request(url, headers={"User-Agent": "John Smith johnsmith@company.com"})
    return urllib.request.urlopen(req).read()

def get_samples(document_type, n=1):
    docs = (
        pl.scan_parquet("complete_sec_documents_table.parquet")
        .filter(
            (pl.col("documentType") == document_type) &
            (pl.col("filename").str.ends_with(".xml"))
        )
        .select("accessionNumber", "filename")
        .limit(n)
    )

    rows = (
        docs.join(
            pl.scan_parquet("complete_sec_accession_cik_table.parquet")
            .select("accessionNumber", "cik"),
            on="accessionNumber",
            how="left",
        )
        .collect()
        .to_dicts()
    )

    return [
        construct_document_url(row["accessionNumber"], row["cik"], row["filename"]) 
        for row in rows
    ]


for url in get_samples(DOCUMENT_TYPE):
    content = fetch_url(url)
    doc = Document(type=DOCUMENT_TYPE, content=content, filename="placeholder.xml", accession="0000000000-00-000000", filing_date="2000-01-01")
    print(type(doc.tables))

    with open("original.xml", "wb") as f:
        f.write(content)

    xml = construct_document(doc.tables, DOCUMENT_TYPE)
    with open("reconstructed.xml", "wb") as f:
        f.write(xml)
```

--------------------------------

### Get Company Names from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Retrieves the official company name for a given Central Index Key (CIK). Handles single or multiple CIKs.

```python
from datamule.utils.convenience import get_company_names_from_ciks
print(get_company_names_from_ciks([1318605, 51143]))
```

```python
['Tesla, Inc.', 'INTERNATIONAL BUSINESS MACHINES CORP']
```

--------------------------------

### Set Default Data Source

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/config/config.md

Use Config to set the default data source to 'datamule' or 'sec'. An API key is required for the 'datamule' source.

```python
from datamule import Config

config = Config()
config.set_default_source("datamule")  # Options: "datamule", "sec"

# Verify your settings
print(f"Default source: {config.get_default_source()}")
```

--------------------------------

### Get US State from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Returns the two-letter US state code for US-based companies using their CIK. Returns an empty string for non-US companies.

```python
from datamule.utils.convenience import get_us_state_from_ciks
print(get_us_state_from_ciks([1318605, 51143]))
```

```python
['TX', 'NY']
```

--------------------------------

### Download Insider Derivative Holdings Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download derivative security holdings data. Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='derivative_holding_ownership_table'
)
```

--------------------------------

### Download Insider Derivative Transactions Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download derivative security transactions, such as options and warrants. Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='derivative_transaction_ownership_table'
)
```

--------------------------------

### Get SICs from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Retrieves the Standard Industrial Classification (SIC) code for a given Central Index Key (CIK). Handles single or multiple CIKs.

```python
from datamule.utils.convenience import get_sics_from_ciks
print(get_sics_from_ciks([1318605, 51143]))
```

```python
['3711', '7372']
```

--------------------------------

### Download Simple XBRL Table Dataset

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Downloads the 'simple_xbrl_table' dataset, which contains parsed XBRL facts from SEC filings. Requires a Datamule Book object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='simple_xbrl_table'
)
```

--------------------------------

### Get Country of Business Address from CIKs

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md

Returns the country of the company's business address using its CIK. For US-based companies, it returns 'United States of America'.

```python
from datamule.utils.convenience import get_adm0_from_ciks
print(get_adm0_from_ciks([1318605, 51143]))
```

```python
['United States of America', 'United States of America']
```

--------------------------------

### Download Insider Owner Signatures Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download signature information for insider ownership filings. Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='owner_signature_ownership_table'
)
```

--------------------------------

### Download Financial Submissions

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb

Use this snippet to download specific financial filings (e.g., 10-Q, 10-K) for a given CIK within a date range. Requires initializing a Portfolio object.

```python
portfolio = Portfolio("aes")
portfolio.download_submissions(submission_type=['10-Q','10-K'],
                               cik=['874761'], filing_date=("2022-01-01","2024-12-31"))
```

--------------------------------

### Download Insider Reporting Owner Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download details for reporting owners (insiders). Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='reporting_owner_ownership_table'
)
```

--------------------------------

### Real-time SEC Filings Notification

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Gets notified of new SEC filings in real time using a websocket. Uses the Portfolio class and requires a callback function to process incoming data.

```python
from datamule import Portfolio

portfolio = Portfolio('newfilings')

def data_callback(data):
    for item in data:
        print(item)

stream_submissions(data_callback=data_callback)
```

--------------------------------

### Download and Parse SEC XML Files

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/parse_any_sec_xml_file.md

Use this code to download SEC filings of a specific document type and extract tabular data into CSV files. Ensure you have the 'datamule' and 'csv' libraries installed.

```python
from datamule import Portfolio
import csv

portfolio = Portfolio('1z')
portfolio.download_submissions(document_type='1-Z') # downloads only document_type = '1-Z'

all_1z_tables = {}

for sub in portfolio:
    for doc in sub:
        if doc.extension == '.xml':
            for table in doc.tables:
                if table.name not in all_1z_tables:
                    all_1z_tables[table.name] = []
                all_1z_tables[table.name].extend(table.data)

for table_name, rows in all_1z_tables.items():
    with open(f'{table_name}.csv', 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=rows[0].keys())
        writer.writeheader()
        writer.writerows(rows)
```

--------------------------------

### Download Institutional Holdings Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download the complete 13F institutional holdings dataset. Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='institutional_holdings_table'
)
```

--------------------------------

### Define Pydantic Schemas for Impairment Data

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/gm_impairment_event.md

Define Pydantic models to structure the extracted impairment information, including expected amount, asset class, and business segment. This schema guides the txt2dataset processing.

```python
class ImpairmentInfo(BaseModel):
    expected_impairment_amount: Optional[float] = None  # Average amount in dollars
    asset_class: Optional[str] = None  # e.g., "Equity investment", "Goodwill", "PPE", "Intangibles"
    segment: Optional[str] = None  # e.g., "China", "North America"

class ImpairmentExtraction(BaseModel):
    info_found: bool
    data: List[ImpairmentInfo] = []
```

--------------------------------

### Build Covenant Breach Risk Dataset with Gemini

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/ge_covenant_breach_risk.md

Initializes and builds a dataset using txt2dataset and a Gemini model for classifying covenant breach risks. Requires the 'ge_near_threshold_risk_excerpts.csv' file generated previously. Ensure your GEMINI_API_KEY is set.

```python
# Define the schema
class ConvenantBreachRiskInfo(BaseModel):
    explanation: str
    classification: Literal["FINE", "WARNING", "WAIVER", "BREACH", "UNRELATED"]

class ConvenantBreachRisk(BaseModel):
    info_found: bool
    data: List[ConvenantBreachRiskInfo] = []

# Prepare entries from the CSV
entries = []
for item in data:
    if item['context']: 
        identifier = item['accession_id']
        text = item['context']
        entries.append((identifier, text))

# Define the prompt
prompt = "Classify this text as FINE, WARNING, WAIVER, BREACH, or UNRELATED - where FINE means covenants comfortably met, WARNING means approaching covenant limits, WAIVER means covenant waiver/amendment obtained, BREACH means covenant violated, and UNRELATED means no covenant risk discussed."

# Initialize the DatasetBuilder
builder = DatasetBuilder(
    prompt=prompt,
    schema=ConvenantBreachRisk,
    model="gemini-2.5-flash-lite",
    entries=entries,
    rpm=1000
)
print(builder.api_key)

# Build the dataset
builder.build()

# Save the structured output
builder.save('ge-convenant-breach-risk.csv')
```

--------------------------------

### Download Insider Non-Derivative Holdings Table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md

Download non-derivative security holdings data. Initializes a 'Book' object.

```python
from datamule import Book

book = Book()
book.download_dataset(
    dataset='non_derivative_holding_ownership_table'
)
```

--------------------------------

### File Download Progress

Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb

This output shows the progress of downloading files, indicating the percentage complete and the number of files downloaded.

```text
Downloading files: 100%|██████████| 12/12 [00:03<00:00,  3.60it/s]
```

--------------------------------

### Retrieve Friday Night Dump 8-K Filings

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/get_friday_night_dump.md

Fetches 8-K filings submitted after 4 PM ET on a specific date. Requires setting start and end times in ET, converting them to UTC for the API, and querying the 'submissions_metadata' table.

```python
from datamule import Sheet
from datetime import datetime
from zoneinfo import ZoneInfo

# Set your times in ET
start_et = datetime(2026, 3, 27, 16, 0, 0, tzinfo=ZoneInfo("America/New_York"))
end_et = datetime(2026, 3, 27, 23, 59, 59, tzinfo=ZoneInfo("America/New_York"))

# Convert to UTC strings for the API
start_utc = start_et.astimezone(ZoneInfo("UTC")).strftime('%Y-%m-%d %H:%M:%S')
end_utc = end_et.astimezone(ZoneInfo("UTC")).strftime('%Y-%m-%d %H:%M:%S')

sheet = Sheet('')

files = sheet.get_table(f"""
    SELECT accessionNumber, submissionType, filingDate, detectedTime
    FROM submissions_metadata
    WHERE submissionType = '8-K'
      AND filingDate = DATE '2026-03-27'
      AND detectedTime BETWEEN TIMESTAMP '{start_utc}' AND TIMESTAMP '{end_utc}'
")

print(files)
```

--------------------------------

### get_table

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/sheet/sheet.md

Runs a SQL query against Datamule's Athena-backed SEC filings database and writes the result as Parquet files to disk. The query results are saved in the specified output directory or the Sheet path if no output directory is provided.

```APIDOC
## get_table

### Description
Runs a SQL query against Datamule's Athena-backed SEC filings database and writes the result as Parquet files to disk. The query results are saved in the specified output directory or the Sheet path if no output directory is provided.

### Method Signature
`get_table(self, query, output_dir=None, wait_seconds=None)`

### Parameters
- **query** (string) - Required - The SQL query to execute.
- **output_dir** (string, optional) - The directory to save the Parquet files. If omitted, results are saved in the Sheet path.
- **wait_seconds** (integer, optional) - The number of seconds to wait for the query to complete.

### Usage Example
```python
from datamule import Sheet

sheet = Sheet("query-results")

files = sheet.get_table("""
    SELECT accessionNumber, submissionType, filingDate
    FROM submissions_metadata
    WHERE submissionType = '10-K'
    LIMIT 100
")

print(files)
```

### Example Result
```python
[
    PosixPath("query-results/part-00000.parquet")
]
```
```

--------------------------------

### Calculate Tesla Risk Factor Negative Ratio

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/tesla_risk_factor_language_score.md

This script downloads Tesla's 10-K filings, extracts the 'Risk Factors' section, calculates the negative ratio using budgetnlp, and saves the results to a CSV file. Ensure budgetnlp is installed and accessible.

```python
from datamule import Portfolio
from budgetnlp import negative_ratio
import csv


portfolio = Portfolio('tesla_risk_factor_language_score')
portfolio.download_submissions(ticker='TSLA',submission_type=['10-K'],document_type=['10-K'])


data = []
for sub in portfolio:
    for doc in sub:
        if doc.type in ['10-K']:
            if doc.extension in ['.htm','.html']:
                item1a_risk_factors = doc.get_section(title='item1a', format='text')[0]
                neg_ratio = negative_ratio(item1a_risk_factors, negation_reversals=True)

                data.append({'accession':sub.accession,'document_type':doc.type,'filing_date':doc.filing_date,
                            "negative_ratio":neg_ratio, 'filer_cik' : sub._filer_cik})

with open('risk_factors_negative_ratio.csv', 'w', encoding='utf-8',newline='') as csvfile:
    fieldnames = ['accession', 'document_type', 'filing_date', 'negative_ratio', 'filer_cik']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)
```

--------------------------------

### Calculate Audit Report Complexity Score

Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/costco_audit_complexity.md

This Python script downloads Costco's 10-K filings, extracts the audit report section, calculates a complexity score using naive_complexity_ratio, and saves the results to a CSV file. Ensure the budgetnlp library is installed and accessible.

```python
from datamule import Portfolio
from budgetnlp import negative_ratio,naive_complexity_ratio
import csv

portfolio = Portfolio('costco_audit_complexity')
portfolio.download_submissions(ticker='COST', submission_type=['10-K'], document_type=['10-K'])

data = []
for sub in portfolio:
    for doc in sub:
        if doc.type in ['10-K']:
            if doc.extension in ['.htm', '.html']:
                try:
                    audit_report = doc.get_section(title_regex='(?i)report of independent registered public accounting firm', format='text')[0]
                    complexity_score = naive_complexity_ratio(audit_report, complexity_weight=50, uncertainty_weight=30, sentence_weight=20,normal_sentence_length=20)

                    data.append({'accession': sub.accession, 'document_type': doc.type, 'filing_date': doc.filing_date,
                                "complexity_score": complexity_score, 'filer_cik': sub._filer_cik})
                except:
                    pass

with open('costco_audit_complexity.csv', 'w', encoding='utf-8', newline='') as csvfile:
    fieldnames = ['accession', 'document_type', 'filing_date', 'complexity_score', 'filer_cik']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)
```