### Install budgetnlp Package Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/costco_audit_complexity.md Install the budgetnlp library using pip. This is a prerequisite for running the analysis script. ```bash pip install budgetnlp ``` -------------------------------- ### Install secfiler Package Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/construct_sec_documents.md Install the secfiler package using pip. This is the first step to using its functionalities. ```bash pip install secfiler ``` -------------------------------- ### Example: Download IBM 10-K Submissions Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Example demonstrating how to download all IBM 10-K filings between January 1, 2019, and January 1, 2024, using the 'sec' provider. ```python portfolio = Portfolio('ibm') portfolio.download_submissions(filing_date=('2019-01-01', '2024-01-01'), submission_type='10-K', provider='sec') ``` -------------------------------- ### Install txt2dataset Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/gm_impairment_event.md Install the txt2dataset library using pip. This is a prerequisite for running the data extraction and structuring scripts. ```bash pip install txt2dataset ``` -------------------------------- ### Example: Download Apple Graphics with Datamule Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Example showing how to download only the 'GRAPHIC' documents from Apple's submissions using the 'datamule-tar' provider. ```python portfolio = Portfolio('apple_graphics') portfolio.download_submissions(ticker='AAPL',document_type="GRAPHIC", provider='datamule-tar') ``` -------------------------------- ### Example Usage Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md Provides a Python code example demonstrating how to download SEC submissions and process documents using the Datamule library. ```APIDOC Usage ``` from datamule import Portfolio from time import time from datamule.tags.config import set_dictionaries set_dictionaries(['13fhr_information_table_cusips']) portfolio = Portfolio('13fhr') portfolio.download_submissions(submission_type=['13F-HR'],filing_date=('2008-09-01','2008-09-30')) for sub in portfolio: for doc in sub: results = doc.text.tags.cusips if results is not None: print(results) ``` ``` -------------------------------- ### Example: Download IBM Form 4s with Datamule Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Example demonstrating the download of all IBM Form 4 and 4/A submissions since 1994 using the 'datamule-sgml' provider. ```python portfolio = Portfolio('ibm') portfolio.download_submissions(submission_type=['4','4/A'], provider='datamule-sgml') ``` -------------------------------- ### Install Datamule Python Package Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/quickstart.md Install the datamule package using pip. This is the first step before using any of its functionalities. ```bash pip install datamule ``` -------------------------------- ### Quickstart: Initialize Portfolio and Download Filings Source: https://github.com/john-friedman/datamule-python/blob/main/readme.md Initialize a Portfolio object for a given ticker and download SEC submissions of a specific type. ```python from datamule import Portfolio portfolio = Portfolio('amzn') portfolio.download_submissions(ticker='AMZN',submission_type='10-K') ``` -------------------------------- ### Datamule Submission Loading Start Source: https://github.com/john-friedman/datamule-python/blob/main/examples/parse_fiscal_year_inline_xbrl.ipynb This output indicates the start of loading submission data. ```text Loading submissions ``` -------------------------------- ### Example: Chained Text Filtering Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Demonstrates how to chain multiple `filter_text` calls to progressively narrow down submissions before downloading. This example filters for 'climate change' and 'drought' within 10-K filings for a specific date range. ```python portfolio.filter_text("climate change", filing_date=('2019-01-01', '2019-01-31'), submission_type='10-K') portfolio.filter_text('drought', filing_date=('2019-01-01', '2019-01-31'), submission_type='10-K') portfolio.download_submissions(filing_date=('2019-01-01', '2019-01-31'), submission_type='10-K') ``` -------------------------------- ### Example: Delete and Download Submissions Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Demonstrates creating a Portfolio object, deleting its folder, downloading submissions, and then deleting it again. This shows a common workflow for managing portfolio data. ```python from datamule import Portfolio port = Portfolio('deletetest') port.delete() port.download_submissions(ticker='MSFT',submission_type='10-K') port.delete() port.download_submissions(ticker='MSFT',submission_type='10-K') ``` -------------------------------- ### Get Sections by Title Regex Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md Retrieves all sections whose titles start with the specified regex pattern. ```python get_section(title_regex= r"income.*", format='dict') ``` -------------------------------- ### Iterate Documents by Type Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Iterates through documents in a portfolio based on their type. Example shows printing the path of each '10-K' document. ```python for document in portfolio.document_type('10-K'): print(document.path) ``` -------------------------------- ### Get Tables by Description Regex Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md Fetches tables from a document where the description matches a given regex pattern. Requires downloading submissions first. ```python from datamule import Portfolio portfolio = Portfolio('DEFM14A') portfolio.download_submissions(cik='943324', submission_type='DEFM14A', document_type='DEFM14A', filing_date=('2011-10-25','2011-10-25')) for doc in portfolio.document_type('DEFM14A'): print(str(doc.get_tables(description_regex=r'(?i)golden parachute')[0])) ``` -------------------------------- ### Test SEC Document Reconstruction with datamule Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/construct_sec_documents.md This example demonstrates how to fetch an SEC document using datamule's Submission class, save the original XML, and then reconstruct and save it using secfiler's construct_document function. It's useful for testing the efficacy of the reconstruction process. ```python from datamule import Submission from secfiler import construct_document sub = Submission(url='https://www.sec.gov/Archives/edgar/data/789019/000078901926000028/0000789019-26-000028.txt') for doc in sub: if doc.type == '4': with open('original.xml', 'wb') as f: f.write(doc.content) xml = construct_document(doc.tables, '4') with open('reconstructed.xml', 'wb') as f: f.write(xml) ``` -------------------------------- ### Load and Display cik_cusip_crosswalk Dataset Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/datasets.md Imports the cik_cusip_crosswalk dataset and displays the first few rows using pandas. Ensure pandas is installed. ```python from datamule.datasets import cik_cusip_crosswalk import pandas as pd print(pd.DataFrame(cik_cusip_crosswalk).head()) ``` -------------------------------- ### Iterate Submissions in Portfolio Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md Iterates through all submissions within a portfolio. Example shows printing the path of each submission. ```python for submission in portfolio: print(submission.path) ``` -------------------------------- ### Import Portfolio from Datamule Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb Imports the Portfolio class from the datamule library. This is a common starting point for using the library's features. ```python from datamule import Portfolio ``` -------------------------------- ### Financial Data Structure Example Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb This snippet displays a sample of the financial data structure, including balance sheet and income statement items with their respective values and periods. ```json period_start_date {'balanceSheet': {'cashAndEquivalents': [{'value': '1056000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '943000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '1056000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '943000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'accountsReceivableNet': [{'value': '1523000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '1418000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'inventoryNet': [{'value': '688000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '604000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '688000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '604000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'totalAssets': [{'value': '34648000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '32963000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '17305000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '16512000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '8544000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '7728000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '4609000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '4545000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '3495000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '3466000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '695000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '712000000', 'period_start_date': '2021-12-31', 'period_end_date': None}, {'value': '34648000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '32963000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'goodwill': [{'value': '1182000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '1177000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'accumulatedDepreciationPpe': [{'value': '8734000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '8486000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'currentLiabilities': [{'value': '5753000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '4732000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'accountsPayable': [{'value': '1288000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '1153000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'stockholdersEquity': [{'value': '3044000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '2798000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'currentAssets': [{'value': '6142000000', 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': '5356000000', 'period_start_date': '2021-12-31', 'period_end_date': None}], 'currentRatio': [{'value': 1.0676168955327656, 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': 1.1318681318681318, 'period_start_date': '2021-12-31', 'period_end_date': None}], 'bookValuePerShare': [{'value': 4.557843886495044, 'period_start_date': '2022-03-31', 'period_end_date': None}, {'value': 4.196200885993774, 'period_start_date': '2021-12-31', 'period_end_date': None}]}, 'incomeStatement': {'totalRevenues': [{'value': '835000000', 'period_start_date': '2022-01-01', 'period_end_date': '2022-03-31'}, {'value': '707000000', 'period_start_date': '2021-01-01', 'period_end_date': '2021-03-31'}, {'value': '20170000 ``` -------------------------------- ### Example SEC Search Results Format Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/index/index.md Illustrates the structure of a single search result dictionary, including metadata like index, ID, score, and source details. ```json [ { "_index": "edgar_file", "_id": "0001628280-24-002390:tsla-2023x12x31xex211.htm", "_score": 10.79173, "_source": { "ciks": ["0001318605"], "period_ending": "2023-12-31", "file_num": ["001-34756"], "display_names": ["Tesla, Inc. (TSLA) (CIK 0001318605)"], "root_forms": ["10-K"], "file_date": "2024-01-29", "form": "10-K", "adsh": "0001628280-24-002390", "file_type": "EX-21.1", "file_description": "EX-21.1", # Additional fields omitted for brevity } }, ... ] ``` -------------------------------- ### Download SEC Filings with Datamule Source: https://github.com/john-friedman/datamule-python/blob/main/examples/parse_xbrl.ipynb Use this snippet to download specific SEC filings (e.g., 10-K) for a given ticker symbol. Ensure the datamule library is installed and configured. ```python from datamule import Portfolio import pandas as pd # get data portfolio = Portfolio("adt") portfolio.download_submissions(submission_type='10-K',document_type='10-K',ticker='ADT') ``` -------------------------------- ### Get Table for XBRL Data Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/sheet/sheet.md Queries the XBRL data from the SEC filings database using SQL and saves the results as Parquet files. Ensure the 'datamule api_key' is configured. ```python from datamule import Sheet sheet = Sheet("xbrl-results") files = sheet.get_table(""" SELECT accessionNumber, taxonomy, name, value FROM simple_xbrl WHERE taxonomy = 'us-gaap' AND name = 'NetIncomeLoss' LIMIT 100 ") ``` -------------------------------- ### Get Table with SQL Query Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/sheet/sheet.md Executes a SQL query against the SEC filings database and saves results as Parquet files. Specify an output directory or let it default to the Sheet path. Requires a datamule api_key. ```python from datamule import Sheet sheet = Sheet("query-results") files = sheet.get_table( করতে SELECT accessionNumber, submissionType, filingDate FROM submissions_metadata WHERE submissionType = '10-K' LIMIT 100 ) print(files) ``` -------------------------------- ### Extract Covenant Breach Risk Contexts Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/ge_covenant_breach_risk.md Downloads GE's 10-Q filings, extracts text related to financial covenants using predefined keywords, and saves the contexts to a CSV file. Ensure 'budgetnlp' and 'datamule' libraries are installed. ```python from datamule import Portfolio from txt2dataset import DatasetBuilder from budgetnlp import extract_contexts import csv from pydantic import BaseModel from typing import List, Literal from collections import defaultdict portfolio = Portfolio('ge_near_threshold_risk') portfolio.download_submissions(ticker='GE', submission_type=['10-Q'], document_type=['10-Q']) data = [] for sub in portfolio: for doc in sub: if doc.type in ['10-Q']: if doc.extension in ['.htm', '.html']: try: # Define keywords that are associated with threshold risk. keywords = [ "leverage", "covenant", "headroom", "waiver", "compliance", "interest coverage", "debt-to-EBITDA", "breach", "amendment", "cushion", "violation" ] # Get context around keyword matches contexts = extract_contexts(doc.text, keywords, context_sentences=2) for idx,context in enumerate(contexts): data.append({'accession_id': f"{sub.accession}_{idx}", 'document_type': doc.type, 'filing_date': doc.filing_date, "context": context, 'filer_cik': sub._filer_cik}) except: pass with open('ge_near_threshold_risk_excerpts.csv', 'w', encoding='utf-8', newline='') as csvfile: fieldnames = ['accession_id','document_type', 'filing_date', 'context', 'filer_cik'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerows(data) ``` -------------------------------- ### Query SEC Proxy Voting Records with SQL Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Query proxy voting records using SQL. This example demonstrates filtering by CUSIP and limiting results. Requires the 'datamule' library and a 'proxy' sheet. ```python from datamule import Sheet sheet = Sheet('proxy') files = sheet.get_table('" SELECT * FROM proxy_voting_record WHERE cusip = \'037833100\' LIMIT 1000 "') print(files) ``` -------------------------------- ### Filter Filings by Document Type Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/monitor_new_filings.md This example shows how to filter SEC filings by document type after retrieving them using DataMule's polling mechanism. It iterates through documents within a submission to check their types. ```python from datamule import Portfolio, Submission from datamule.utils.convenience import construct_sgml_url portfolio = Portfolio('monitor') def data_callback(hits): for hit in hits: sgml_url = construct_sgml_url(hit['accession'],hit['ciks'][0]) sub = Submission(url=sgml_url) for doc in sub: if doc.type == "EX-99.1": print(f"Length: {len(doc.text)}") portfolio.monitor_submissions(data_callback=data_callback, interval_callback=None, polling_interval=1000, quiet=True, start_date=None, validation_interval=60000) ``` -------------------------------- ### Download and Process SEC Filings with Tag Extraction Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md Example of downloading 13F-HR submissions for a specific date range and iterating through documents to extract and print CUSIP tags from their text content. Requires setting up tag dictionaries. ```python from datamule import Portfolio from time import time from datamule.tags.config import set_dictionaries set_dictionaries(['13fhr_information_table_cusips']) portfolio = Portfolio('13fhr') portfolio.download_submissions(submission_type=['13F-HR'],filing_date=('2008-09-01','2008-09-30')) for sub in portfolio: for doc in sub: results = doc.text.tags.cusips if results is not None: print(results) ``` -------------------------------- ### Initialize and Build Dataset with txt2dataset Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/gm_impairment_event.md Prepare entries from the CSV data and initialize DatasetBuilder with a prompt, schema, and model. The build() method processes the entries to extract structured data. ```python # Prepare entries from the CSV entries = [] for item in data: if item['item206']: # Only process if there's Item 2.06 text identifier = item['accession'] text = item['item206'] entries.append((identifier, text)) # Define the prompt prompt = "Extract the impairment amount (as an average if a range is given), asset class (such as equity investment, goodwill, PPE, or intangibles), and business segment from this text." # Initialize the DatasetBuilder builder = DatasetBuilder( prompt="Extract impairment and restructuring charge information from this SEC filing", schema=ImpairmentExtraction, model="gemini-2.5-flash-lite", entries=entries, rpm=1000 ) print(builder.api_key) # Build the dataset builder.build() # Save the structured output builder.save('gm_impairments_structured.csv') ``` -------------------------------- ### Get Tables by Contains Regex Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md Filters tables to include only those that contain all specified regex patterns within their data. ```python contains_regex=[r'Director', r'20\d{2}'] ``` -------------------------------- ### Iterate and Print Fundamentals Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb Iterate through downloaded submissions and print the fundamentals for each. This is useful for inspecting the retrieved financial data. ```python fiscal_period_focus_list = [] for sub in portfolio: print(sub.fundamentals) ``` -------------------------------- ### Initialize Datamule Book for SEC XBRL Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Initializes a Datamule Book object for accessing SEC XBRL data. Contact support for access details. ```python from datamule import Book book = Book() # Contact support for access details ``` -------------------------------- ### Portfolio Class and Initialization Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/portfolio.md The Portfolio class is used to interact with SEC Submissions. It requires a path to a folder containing submission subfolders. ```APIDOC ## Portfolio Class The `Portfolio` class lets you interact with SEC Submissions. Portfolio's consist of a folder that contains subfolders named after SEC Submission accession numbers. ### Attributes * `portfolio.path` - path to folder ``` -------------------------------- ### Initialize DatasetBuilder for Delay Extraction Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/late_filing_reasons.md Prepares data for Txt2Dataset by creating entries from the extracted narratives. Defines a Pydantic schema for the expected output and a prompt for the Gemini model to extract delay reasons. ```python # Define the schema class DelayInfo(BaseModel): type_of_delay: Literal[ "accounting issues", "auditor delay", "internal control weaknesses", "acquisition integration", "internal review delays", "other reasons" ] explanation: str class DelayExtraction(BaseModel): info_found: bool data: List[DelayInfo] = [] # Prepare entries from the CSV entries = [] for item in data: if item['narrative']: identifier = item['accession'] text = item['narrative'] entries.append((identifier, text)) # Define the prompt prompt = "Extract the type of delay (such as auditor delay, accounting issues, internal control weaknesses, system failure, or other reasons) and the specific explanation for why the 10-K filing was delayed from this text." # Initialize the DatasetBuilder builder = DatasetBuilder( prompt=prompt, schema=DelayExtraction, model="gemini-2.5-flash-lite", entries=entries, rpm=1000 ) print(builder.api_key) ``` -------------------------------- ### Get Sections by Title Regex and Class Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/portfolio/document.md Retrieves sections where the title matches a given regex pattern and the class is restricted to 'item'. ```python get_section(title_regex= r"item1.*", format='dict',title_class='item') ``` -------------------------------- ### Download and Parse XBRL Data with Datamule Source: https://github.com/john-friedman/datamule-python/blob/main/examples/parse_fiscal_year_inline_xbrl.ipynb Use this snippet to download SEC filings (10-Q and 10-K) for a given CIK and date range. It then parses the XBRL data to extract fiscal period information. ```python # made for https://www.reddit.com/r/algotrading/comments/1lxf7ga/xbrl_deidocumentfiscalperiodfocus_help_needed/ from datamule import Portfolio import pandas as pd # get data portfolio = Portfolio("aes") portfolio.download_submissions(submission_type=['10-Q','10-K'],document_type=['10-Q','10-K'], cik=['874761'], filing_date=('2022-01-01','2024-12-31')) fiscal_period_focus_list = [] for sub in portfolio: filing_date = sub.metadata.content['filing-date'] for doc in sub.document_type(['10-Q','10-K']): doc.parse_xbrl() basic = [{'quarter':item['_val'],'period_start':item['_context']['context_period_startdate'], 'period_end':item['_context']['context_period_enddate'] } for item in doc.xbrl if item['_attributes']['name']=='dei:DocumentFiscalPeriodFocus'] fiscal_period_focus_list.append([{**item, 'type': doc.type, 'filing_date': filing_date} for item in basic]) # sort basic fiscal_period_focus_list = sorted([item for sublist in fiscal_period_focus_list for item in sublist], key=lambda x: x['filing_date']) for item in fiscal_period_focus_list: print(f"{item['type']} filed on {item['filing_date']}: Quarter {item['quarter']} ({item['period_start']} to {item['period_end']})") ``` -------------------------------- ### Initialize Matplotlib Figure and Lines for Animation Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/use_index_to_search_filings_by_email_usage.md Sets up the Matplotlib figure, axes, color map, and initial empty line objects for each domain. This prepares the plotting environment for the animation function. ```python # Prepare figure fig, ax = plt.subplots(figsize=(14, 8)) # Color map for domains all_domains = list(domain_year_counts.keys()) colors = plt.cm.tab10(np.linspace(0, 1, len(all_domains))) domain_colors = {domain: colors[i] for i, domain in enumerate(all_domains)} # Store line objects for each domain lines = {} for domain in all_domains: line, = ax.plot([], [], marker='o', linewidth=2.5, markersize=6, label=domain, color=domain_colors[domain], alpha=0) lines[domain] = line ``` -------------------------------- ### Get Business Street Address from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Retrieves the primary street address for a company using its CIK. Handles single or multiple CIKs. ```python from datamule.utils.convenience import get_business_street1_from_ciks print(get_business_street1_from_ciks([1318605, 51143])) ``` ```python ['1 TESLA ROAD', '1 NEW ORCHARD ROAD'] ``` -------------------------------- ### Download and Parse SEC Filings with Portfolio Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/quickstart.md Use the Portfolio class to download SEC filings by filing date and submission type. It allows iterating through documents, parsing them, and accessing specific data fields. Supports threaded operations with a callback function for faster processing. ```python from datamule import Portfolio # Create a Portfolio object portfolio = Portfolio('output_dir') # can be an existing directory or a new one # Download submissions portfolio.download_submissions( filing_date=('2023-01-01','2023-01-03'), submission_type=['10-K'] ) # Iterate through documents by document type for ten_k in portfolio.document_type('10-K'): ten_k.parse() print(ten_k.data['document']['part2']['item7']) # For faster operations, you can take advantage of built in threading with callback function def callback(submission): print(submission.path) submission_results = portfolio.process_submissions(callback) ``` -------------------------------- ### Download Insider Ownership Metadata Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download filing-level metadata for insider ownership reports. Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='metadata_ownership_table' ) ``` -------------------------------- ### Get US Zip Codes from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Returns the ZIP code for US-based companies using their CIK. Returns None for non-US companies. ```python from datamule.utils.convenience import get_us_zipcodes_from_ciks print(get_us_zipcodes_from_ciks([1318605, 51143])) ``` ```python ['78725', '10504'] ``` -------------------------------- ### Get Tickers from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Retrieves the stock ticker symbol for a given Central Index Key (CIK). Handles single or multiple CIKs. ```python from datamule.utils.convenience import get_tickers_from_ciks print(get_tickers_from_ciks([1318605])) ``` ```python ['TSLA'] ``` ```python from datamule.utils.convenience import get_tickers_from_ciks print(get_tickers_from_ciks([1318605, 51143])) ``` ```python ['TSLA', 'IBM'] ``` -------------------------------- ### Get CIKs from Tickers Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Retrieves the Central Index Key (CIK) for a given stock ticker symbol. Handles single or multiple tickers. ```python from datamule.utils.convenience import get_ciks_from_tickers print(get_ciks_from_tickers(['IBM'])) ``` ```python [51143] ``` ```python from datamule.utils.convenience import get_ciks_from_tickers print(get_ciks_from_tickers(['TSLA','IBM'])) ``` ```python [1318605, 51143] ``` -------------------------------- ### Fetch and Reconstruct SEC Documents at Scale Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/construct_sec_documents.md This script demonstrates fetching SEC documents from a parquet file, reconstructing them using secfiler, and saving the original and reconstructed versions. It requires an API key and allows specifying the document type for processing. ```python DOCUMENT_TYPE = "SH-ER" # change to whatever document you would like. import urllib.request import polars as pl from datamule import Document from datamule.utils.convenience import construct_document_url from secfiler import construct_document def fetch_url(url): req = urllib.request.Request(url, headers={"User-Agent": "John Smith johnsmith@company.com"}) return urllib.request.urlopen(req).read() def get_samples(document_type, n=1): docs = ( pl.scan_parquet("complete_sec_documents_table.parquet") .filter( (pl.col("documentType") == document_type) & (pl.col("filename").str.ends_with(".xml")) ) .select("accessionNumber", "filename") .limit(n) ) rows = ( docs.join( pl.scan_parquet("complete_sec_accession_cik_table.parquet") .select("accessionNumber", "cik"), on="accessionNumber", how="left", ) .collect() .to_dicts() ) return [ construct_document_url(row["accessionNumber"], row["cik"], row["filename"]) for row in rows ] for url in get_samples(DOCUMENT_TYPE): content = fetch_url(url) doc = Document(type=DOCUMENT_TYPE, content=content, filename="placeholder.xml", accession="0000000000-00-000000", filing_date="2000-01-01") print(type(doc.tables)) with open("original.xml", "wb") as f: f.write(content) xml = construct_document(doc.tables, DOCUMENT_TYPE) with open("reconstructed.xml", "wb") as f: f.write(xml) ``` -------------------------------- ### Get Company Names from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Retrieves the official company name for a given Central Index Key (CIK). Handles single or multiple CIKs. ```python from datamule.utils.convenience import get_company_names_from_ciks print(get_company_names_from_ciks([1318605, 51143])) ``` ```python ['Tesla, Inc.', 'INTERNATIONAL BUSINESS MACHINES CORP'] ``` -------------------------------- ### Set Default Data Source Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/config/config.md Use Config to set the default data source to 'datamule' or 'sec'. An API key is required for the 'datamule' source. ```python from datamule import Config config = Config() config.set_default_source("datamule") # Options: "datamule", "sec" # Verify your settings print(f"Default source: {config.get_default_source()}") ``` -------------------------------- ### Get US State from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Returns the two-letter US state code for US-based companies using their CIK. Returns an empty string for non-US companies. ```python from datamule.utils.convenience import get_us_state_from_ciks print(get_us_state_from_ciks([1318605, 51143])) ``` ```python ['TX', 'NY'] ``` -------------------------------- ### Download Insider Derivative Holdings Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download derivative security holdings data. Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='derivative_holding_ownership_table' ) ``` -------------------------------- ### Download Insider Derivative Transactions Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download derivative security transactions, such as options and warrants. Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='derivative_transaction_ownership_table' ) ``` -------------------------------- ### Get SICs from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Retrieves the Standard Industrial Classification (SIC) code for a given Central Index Key (CIK). Handles single or multiple CIKs. ```python from datamule.utils.convenience import get_sics_from_ciks print(get_sics_from_ciks([1318605, 51143])) ``` ```python ['3711', '7372'] ``` -------------------------------- ### Download Simple XBRL Table Dataset Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Downloads the 'simple_xbrl_table' dataset, which contains parsed XBRL facts from SEC filings. Requires a Datamule Book object. ```python from datamule import Book book = Book() book.download_dataset( dataset='simple_xbrl_table' ) ``` -------------------------------- ### Get Country of Business Address from CIKs Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/utils/convenience.md Returns the country of the company's business address using its CIK. For US-based companies, it returns 'United States of America'. ```python from datamule.utils.convenience import get_adm0_from_ciks print(get_adm0_from_ciks([1318605, 51143])) ``` ```python ['United States of America', 'United States of America'] ``` -------------------------------- ### Download Insider Owner Signatures Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download signature information for insider ownership filings. Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='owner_signature_ownership_table' ) ``` -------------------------------- ### Download Financial Submissions Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb Use this snippet to download specific financial filings (e.g., 10-Q, 10-K) for a given CIK within a date range. Requires initializing a Portfolio object. ```python portfolio = Portfolio("aes") portfolio.download_submissions(submission_type=['10-Q','10-K'], cik=['874761'], filing_date=("2022-01-01","2024-12-31")) ``` -------------------------------- ### Download Insider Reporting Owner Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download details for reporting owners (insiders). Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='reporting_owner_ownership_table' ) ``` -------------------------------- ### Real-time SEC Filings Notification Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Gets notified of new SEC filings in real time using a websocket. Uses the Portfolio class and requires a callback function to process incoming data. ```python from datamule import Portfolio portfolio = Portfolio('newfilings') def data_callback(data): for item in data: print(item) stream_submissions(data_callback=data_callback) ``` -------------------------------- ### Download and Parse SEC XML Files Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/parse_any_sec_xml_file.md Use this code to download SEC filings of a specific document type and extract tabular data into CSV files. Ensure you have the 'datamule' and 'csv' libraries installed. ```python from datamule import Portfolio import csv portfolio = Portfolio('1z') portfolio.download_submissions(document_type='1-Z') # downloads only document_type = '1-Z' all_1z_tables = {} for sub in portfolio: for doc in sub: if doc.extension == '.xml': for table in doc.tables: if table.name not in all_1z_tables: all_1z_tables[table.name] = [] all_1z_tables[table.name].extend(table.data) for table_name, rows in all_1z_tables.items(): with open(f'{table_name}.csv', 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=rows[0].keys()) writer.writeheader() writer.writerows(rows) ``` -------------------------------- ### Download Institutional Holdings Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download the complete 13F institutional holdings dataset. Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='institutional_holdings_table' ) ``` -------------------------------- ### Define Pydantic Schemas for Impairment Data Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/gm_impairment_event.md Define Pydantic models to structure the extracted impairment information, including expected amount, asset class, and business segment. This schema guides the txt2dataset processing. ```python class ImpairmentInfo(BaseModel): expected_impairment_amount: Optional[float] = None # Average amount in dollars asset_class: Optional[str] = None # e.g., "Equity investment", "Goodwill", "PPE", "Intangibles" segment: Optional[str] = None # e.g., "China", "North America" class ImpairmentExtraction(BaseModel): info_found: bool data: List[ImpairmentInfo] = [] ``` -------------------------------- ### Build Covenant Breach Risk Dataset with Gemini Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/ge_covenant_breach_risk.md Initializes and builds a dataset using txt2dataset and a Gemini model for classifying covenant breach risks. Requires the 'ge_near_threshold_risk_excerpts.csv' file generated previously. Ensure your GEMINI_API_KEY is set. ```python # Define the schema class ConvenantBreachRiskInfo(BaseModel): explanation: str classification: Literal["FINE", "WARNING", "WAIVER", "BREACH", "UNRELATED"] class ConvenantBreachRisk(BaseModel): info_found: bool data: List[ConvenantBreachRiskInfo] = [] # Prepare entries from the CSV entries = [] for item in data: if item['context']: identifier = item['accession_id'] text = item['context'] entries.append((identifier, text)) # Define the prompt prompt = "Classify this text as FINE, WARNING, WAIVER, BREACH, or UNRELATED - where FINE means covenants comfortably met, WARNING means approaching covenant limits, WAIVER means covenant waiver/amendment obtained, BREACH means covenant violated, and UNRELATED means no covenant risk discussed." # Initialize the DatasetBuilder builder = DatasetBuilder( prompt=prompt, schema=ConvenantBreachRisk, model="gemini-2.5-flash-lite", entries=entries, rpm=1000 ) print(builder.api_key) # Build the dataset builder.build() # Save the structured output builder.save('ge-convenant-breach-risk.csv') ``` -------------------------------- ### Download Insider Non-Derivative Holdings Table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/products.md Download non-derivative security holdings data. Initializes a 'Book' object. ```python from datamule import Book book = Book() book.download_dataset( dataset='non_derivative_holding_ownership_table' ) ``` -------------------------------- ### File Download Progress Source: https://github.com/john-friedman/datamule-python/blob/main/examples/fundamentals.ipynb This output shows the progress of downloading files, indicating the percentage complete and the number of files downloaded. ```text Downloading files: 100%|██████████| 12/12 [00:03<00:00, 3.60it/s] ``` -------------------------------- ### Retrieve Friday Night Dump 8-K Filings Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/get_friday_night_dump.md Fetches 8-K filings submitted after 4 PM ET on a specific date. Requires setting start and end times in ET, converting them to UTC for the API, and querying the 'submissions_metadata' table. ```python from datamule import Sheet from datetime import datetime from zoneinfo import ZoneInfo # Set your times in ET start_et = datetime(2026, 3, 27, 16, 0, 0, tzinfo=ZoneInfo("America/New_York")) end_et = datetime(2026, 3, 27, 23, 59, 59, tzinfo=ZoneInfo("America/New_York")) # Convert to UTC strings for the API start_utc = start_et.astimezone(ZoneInfo("UTC")).strftime('%Y-%m-%d %H:%M:%S') end_utc = end_et.astimezone(ZoneInfo("UTC")).strftime('%Y-%m-%d %H:%M:%S') sheet = Sheet('') files = sheet.get_table(f""" SELECT accessionNumber, submissionType, filingDate, detectedTime FROM submissions_metadata WHERE submissionType = '8-K' AND filingDate = DATE '2026-03-27' AND detectedTime BETWEEN TIMESTAMP '{start_utc}' AND TIMESTAMP '{end_utc}' ") print(files) ``` -------------------------------- ### get_table Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/sheet/sheet.md Runs a SQL query against Datamule's Athena-backed SEC filings database and writes the result as Parquet files to disk. The query results are saved in the specified output directory or the Sheet path if no output directory is provided. ```APIDOC ## get_table ### Description Runs a SQL query against Datamule's Athena-backed SEC filings database and writes the result as Parquet files to disk. The query results are saved in the specified output directory or the Sheet path if no output directory is provided. ### Method Signature `get_table(self, query, output_dir=None, wait_seconds=None)` ### Parameters - **query** (string) - Required - The SQL query to execute. - **output_dir** (string, optional) - The directory to save the Parquet files. If omitted, results are saved in the Sheet path. - **wait_seconds** (integer, optional) - The number of seconds to wait for the query to complete. ### Usage Example ```python from datamule import Sheet sheet = Sheet("query-results") files = sheet.get_table(""" SELECT accessionNumber, submissionType, filingDate FROM submissions_metadata WHERE submissionType = '10-K' LIMIT 100 ") print(files) ``` ### Example Result ```python [ PosixPath("query-results/part-00000.parquet") ] ``` ``` -------------------------------- ### Calculate Tesla Risk Factor Negative Ratio Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/tesla_risk_factor_language_score.md This script downloads Tesla's 10-K filings, extracts the 'Risk Factors' section, calculates the negative ratio using budgetnlp, and saves the results to a CSV file. Ensure budgetnlp is installed and accessible. ```python from datamule import Portfolio from budgetnlp import negative_ratio import csv portfolio = Portfolio('tesla_risk_factor_language_score') portfolio.download_submissions(ticker='TSLA',submission_type=['10-K'],document_type=['10-K']) data = [] for sub in portfolio: for doc in sub: if doc.type in ['10-K']: if doc.extension in ['.htm','.html']: item1a_risk_factors = doc.get_section(title='item1a', format='text')[0] neg_ratio = negative_ratio(item1a_risk_factors, negation_reversals=True) data.append({'accession':sub.accession,'document_type':doc.type,'filing_date':doc.filing_date, "negative_ratio":neg_ratio, 'filer_cik' : sub._filer_cik}) with open('risk_factors_negative_ratio.csv', 'w', encoding='utf-8',newline='') as csvfile: fieldnames = ['accession', 'document_type', 'filing_date', 'negative_ratio', 'filer_cik'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerows(data) ``` -------------------------------- ### Calculate Audit Report Complexity Score Source: https://github.com/john-friedman/datamule-python/blob/main/datamule/docs-rewrite/docs/datamule-python/examples/costco_audit_complexity.md This Python script downloads Costco's 10-K filings, extracts the audit report section, calculates a complexity score using naive_complexity_ratio, and saves the results to a CSV file. Ensure the budgetnlp library is installed and accessible. ```python from datamule import Portfolio from budgetnlp import negative_ratio,naive_complexity_ratio import csv portfolio = Portfolio('costco_audit_complexity') portfolio.download_submissions(ticker='COST', submission_type=['10-K'], document_type=['10-K']) data = [] for sub in portfolio: for doc in sub: if doc.type in ['10-K']: if doc.extension in ['.htm', '.html']: try: audit_report = doc.get_section(title_regex='(?i)report of independent registered public accounting firm', format='text')[0] complexity_score = naive_complexity_ratio(audit_report, complexity_weight=50, uncertainty_weight=30, sentence_weight=20,normal_sentence_length=20) data.append({'accession': sub.accession, 'document_type': doc.type, 'filing_date': doc.filing_date, "complexity_score": complexity_score, 'filer_cik': sub._filer_cik}) except: pass with open('costco_audit_complexity.csv', 'w', encoding='utf-8', newline='') as csvfile: fieldnames = ['accession', 'document_type', 'filing_date', 'complexity_score', 'filer_cik'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerows(data) ```