### Install Development Dependencies Source: https://github.com/cre-dev/xml2db/blob/main/README.md Install additional development dependencies for testing and documentation, after cloning the repository. ```bash pip install -e .[tests,docs] ``` -------------------------------- ### Install xml2db Package Source: https://github.com/cre-dev/xml2db/blob/main/README.md Install the xml2db package using pip. It is recommended to do this within a virtual environment. ```bash pip install xml2db ``` -------------------------------- ### Install Project with Dev Dependencies Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Installs the project in editable mode along with development and documentation dependencies, including DuckDB and pytz. ```bash pip install -e .[tests,docs] duckdb_engine pytz ``` -------------------------------- ### Install xml2db in Editable Mode Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md For development, clone the repository and install xml2db in editable mode with development dependencies. ```bash pip install -e .[docs,tests] ``` -------------------------------- ### Load XML into Database Source: https://github.com/cre-dev/xml2db/blob/main/README.md Use this snippet to create a data model from an XSD, parse an XML file, and insert its content into a relational database. Ensure you have the necessary database driver installed. ```python from xml2db import DataModel # Create a data model of tables with relations based on the XSD file data_model = DataModel( xsd_file="path/to/file.xsd", connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb", ) # Parse an XML file based on this XSD document = data_model.parse_xml( xml_file="path/to/file.xml" ) # Insert the document content into the database document.insert_into_target_tables() ``` -------------------------------- ### Add Custom SQLAlchemy Index Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md Pass extra arguments to SQLAlchemy's Table constructor to customize indexes. This example demonstrates adding a custom index on multiple columns for a specific table. ```python model_config = { "tables": { "my_table": { "extra_args": sqlalchemy.Index("my_index", "my_column1", "my_column2"), } } } ``` -------------------------------- ### Run a Specific Test by Name Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Executes a specific test identified by its name, for example, 'test_iterative_recursive_parsing'. ```bash pytest -k "test_iterative_recursive_parsing" ``` -------------------------------- ### Serve Documentation Locally Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Builds and serves the project's documentation locally using MkDocs, allowing for previewing changes. ```bash mkdocs serve ``` -------------------------------- ### Loading XML into a Database Source: https://github.com/cre-dev/xml2db/blob/main/docs/index.md This snippet demonstrates how to initialize a DataModel from an XSD file and a database connection string, parse an XML file, and load the data into the database tables. ```Python from xml2db import DataModel # Create a DataModel object from an XSD file data_model = DataModel( xsd_file="path/to/file.xsd", connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb", ) # Parse an XML file based on this XSD schema document = data_model.parse_xml(xml_file="path/to/file.xml") # Load data into the database, creating target tables if need be document.insert_into_target_tables() ``` -------------------------------- ### Create a DataModel object Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md Create a DataModel object by providing the path to an XSD file, the target database schema name, and a SQLAlchemy connection string. An optional model configuration can also be provided. ```python from xml2db import DataModel data_model = DataModel( xsd_file="path/to/file.xsd", db_schema="source_data", # the name of the database target schema connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb", model_config={}, ) ``` -------------------------------- ### Regenerate Snapshot Tests Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Navigates to the sample models directory and runs the 'models.py' script to regenerate snapshot files for model outputs. ```bash cd tests/sample_models && python models.py ``` -------------------------------- ### Run Tests Against a Real Database Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Executes tests against a persistent database using a provided connection string, here shown with PostgreSQL and psycopg2. ```bash DB_STRING="postgresql+psycopg2://user:pass@localhost/testdb" pytest ``` -------------------------------- ### Run Conversion Tests Only Source: https://github.com/cre-dev/xml2db/blob/main/README.md Run only the conversion tests that do not require a database connection. This is useful for quick checks. ```bash pytest -m "not dbtest" ``` -------------------------------- ### Run a Specific Test File Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Executes tests located within a particular file, such as 'tests/test_conversions.py'. ```bash pytest tests/test_conversions.py ``` -------------------------------- ### n-n Relationship Modeling with Third Table Source: https://github.com/cre-dev/xml2db/blob/main/docs/how_it_works.md Demonstrates how n-n relationships are represented in a SQL model using an additional table to hold the relationship. ```mermaid erDiagram CONTRACT ||--|{ CONTRACT_DELIVERY_PROFILE : is_in CONTRACT_DELIVERY_PROFILE }|--|| DELIVERY_PROFILE : involves ``` -------------------------------- ### Run All Tests Source: https://github.com/cre-dev/xml2db/blob/main/README.md Execute all tests for the xml2db package using the pytest command. ```bash python -m pytest ``` -------------------------------- ### Data Loading Flowchart Source: https://github.com/cre-dev/xml2db/blob/main/docs/api/overview.md Visual representation of the data loading process from an XML file into database tables, detailing the functions involved in lower-level steps. Useful for understanding advanced data transformation and loading scenarios. ```mermaid flowchart TB subgraph "DataModel.parse_xml" direction TB A[XML file]-- "XMLConverter.parse_xml" -->B[Document tree] B-- "Document.doc_tree_to_flat_data" -->C[Flat data model] end C -.- D subgraph "Document.insert_into_target_tables" direction TB D[Flat data model]-- "Document.insert_into_temp_tables" -->E[Temporary tables] E-- "Document.merge_into_target_tables" -->F[Target tables] end ``` -------------------------------- ### Data Model Visualization Source: https://github.com/cre-dev/xml2db/blob/main/docs/index.md This Mermaid diagram illustrates the structure of a data model extracted from an XSD file, showing tables and their relationships. ```mermaid erDiagram Unavailability_MarketDocument ||--o{ TimeSeries : "TimeSeries*" Unavailability_MarketDocument ||--|{ Reason : "Reason*" Unavailability_MarketDocument { string mRID string revisionNumber NMTOKEN type NMTOKEN process_processType dateTime createdDateTime string sender_MarketParticipant_mRID NMTOKEN sender_MarketParticipant_marketRole_type string receiver_MarketParticipant_mRID NMTOKEN receiver_MarketParticipant_marketRole_type string unavailability_Time_Period_timeInterval_start string unavailability_Time_Period_timeInterval_end NMTOKEN docStatus_value } TimeSeries ||--o{ Available_Period : "Available_Period*" TimeSeries ||--o{ Available_Period : "WindPowerFeedin_Period*" TimeSeries ||--o{ Asset_RegisteredResource : "Asset_RegisteredResource*" TimeSeries ||--o{ Reason : "Reason*" TimeSeries { string mRID NMTOKEN businessType string biddingZone_Domain_mRID string in_Domain_mRID string out_Domain_mRID date start_DateAndOrTime_date time start_DateAndOrTime_time date end_DateAndOrTime_date time end_DateAndOrTime_time NMTOKEN quantity_Measure_Unit_name NMTOKEN curveType string production_RegisteredResource_mRID string production_RegisteredResource_name string production_RegisteredResource_location_name NMTOKEN production_RegisteredResource_pSRType_psrType string production_RegisteredResource_pSRType_powerSystemResources_mRID string production_RegisteredResource_pSRType_powerSystemResources_name float production_RegisteredResource_pSRType_powerSystemResources_nominalP } Available_Period ||--|{ Point : "Point*" Available_Period { string timeInterval_start string timeInterval_end duration resolution } Point { integer position decimal quantity } Asset_RegisteredResource { string mRID string name NMTOKEN asset_PSRType_psrType string location_name } Reason { NMTOKEN code string text } ``` -------------------------------- ### Model Configuration General Structure Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md This dictionary structure outlines the general configuration options available for the xml2db data model. It shows top-level model settings and nested table-specific configurations. ```python { "document_tree_hook": None, "document_tree_node_hook": None, "row_numbers": False, "as_columnstore": False, "metadata_columns": None, "tables": { "table1": { "reuse": True, "choice_transform": False, "as_columnstore": False, "fields": { "my_column": { "type": None #default type } }, "extra_args": [], } } } ``` -------------------------------- ### Configure Joining for Simple Types Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md Configure how simple type elements with specific XSD types and maximum occurrences are joined into a single column. This setting is currently always applied and cannot be opted out. ```python model_config = { "tables": { "my_table_name": { "fields": { "my_field_name": { "transform": "join" } } } } } ``` -------------------------------- ### Write source tree and target tree to a file Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md Generate text-based tree representations of the raw XML schema (source tree) and the simplified schema (target tree). This shows element names, data types, and cardinality. ```python with open(f"source_tree.txt", "w") as f: f.write(data_model.source_tree) with open(f"target_tree.txt", "w") as f: f.write(data_model.target_tree) ``` -------------------------------- ### Parse a XML file Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md Parses a single XML file and prepares its content for database insertion. Ensure the data model is defined before parsing. ```python document = data_model.parse_xml( xml_file="path/to/file.xml", ) document.insert_into_target_tables() ``` -------------------------------- ### XML Schema Data Types and Database Mapping Source: https://github.com/cre-dev/xml2db/blob/main/tests/sample_models/table1/table1_erd_version1.md Illustrates the mapping of various XML data types to their database schema equivalents. This is useful for understanding how to represent complex XML structures in a relational database. ```plaintext decimal undisclosedVolume_value string undisclosedVolume_unit string orderDuration_duration dateTime orderDuration_expirationDateTime priceIntervalQuantityDetails { date intervalStartDate date intervalEndDate string daysOfTheWeek time-N intervalStartTime time-N intervalEndTime decimal quantity string unit decimal priceTimeIntervalQuantity_value string priceTimeIntervalQuantity_currency } optionDetails { string optionStyle string optionType date-N optionExerciseDate decimal optionStrikePrice_value string optionStrikePrice_currency } fixingIndex { string indexName decimal indexValue } deliveryProfile { date loadDeliveryStartDate date loadDeliveryEndDate string-N daysOfTheWeek time-N loadDeliveryStartTime time-N loadDeliveryEndTime } contractTradingHours { time startTime time endTime date date } ``` -------------------------------- ### Load multiple XML files in one database operation Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md Accumulates data from multiple XML files in memory before inserting into the database in a single batch. This optimizes performance for numerous small files. Metadata can be passed for each file. ```python flat_data = None for xml_file in files: document = data_model.parse_xml( xml_file=xml_file, metadata={"input_file_path": xml_file}, flat_data=flat_data, ) flat_data = document.data document.insert_into_target_tables() ``` -------------------------------- ### Run All Tests with DB Integration Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md Executes all tests, including those requiring database integration, using in-memory DuckDB. Sets the timezone to Europe/Paris. ```bash TZ="Europe/Paris" DB_STRING="duckdb:///:memory:" python -m pytest ``` -------------------------------- ### 1-n Relationship Conversion Source: https://github.com/cre-dev/xml2db/blob/main/docs/how_it_works.md Shows how 1-n relationships are handled, allowing a child node to have multiple parents if used under different parent nodes. ```mermaid erDiagram CONTRACT ||--|{ DELIVERY_PROFILE : delivers UNIQUE_CONTRACT }|--|{ UNIQUE_DELIVERY_PROFILE : delivers ``` -------------------------------- ### Multiprocessing XML Loading with Database Lock Source: https://github.com/cre-dev/xml2db/blob/main/docs/api/overview.md Demonstrates parallel XML parsing across multiple processes, with database I/O serialised using a multiprocessing lock. This approach ensures data integrity for various database backends. ```python import multiprocessing from xml2db import DataModel def load_one_file(xml_path, xsd_path, connection_string, lock): # Each process creates its own DataModel with a unique temp_prefix. model = DataModel( xsd_file=xsd_path, connection_string=connection_string, ) # XML parsing is CPU-bound and runs in parallel across all processes. doc = model.parse_xml(xml_path) # Serialise all database I/O across processes. with lock: doc.insert_into_target_tables() model.engine.dispose() if __name__ == "__main__": xsd_path = "schema.xsd" connection_string = "duckdb:///data.duckdb" xml_files = ["file1.xml", "file2.xml", "file3.xml"] lock = multiprocessing.Lock() processes = [ multiprocessing.Process( target=load_one_file, args=(xml_path, xsd_path, connection_string, lock), ) for xml_path in xml_files ] for p in processes: p.start() for p in processes: p.join() if p.exitcode != 0: raise RuntimeError(f"Worker failed with exit code {p.exitcode}") ``` -------------------------------- ### 1-1 Relationship Conversion to n-1 Source: https://github.com/cre-dev/xml2db/blob/main/docs/how_it_works.md Illustrates a 1-1 relationship converted to n-1 after node reuse, where UNIQUE_TRADE holds a foreign key to UNIQUE_CONTRACT. ```mermaid erDiagram TRADE ||--|| CONTRACT : concerns UNIQUE_TRADE }|--|| UNIQUE_CONTRACT : concerns ``` -------------------------------- ### Write Entity Relationship Diagram to a file Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md Generate a Mermaid-compatible Entity Relationship Diagram (ERD) for the data model and write it to a markdown file. This helps visualize tables and relationships. ```python with open(f"target_data_model_erd.md", "w") as f: f.write(data_model.get_entity_rel_diagram()) ``` -------------------------------- ### Extract data back to XML Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md Extracts data from the database based on a WHERE clause and saves it to an XML file. Primarily used for testing and round-trip validation. ```python document = data_model.extract_from_database( root_select_where="xml2db_input_file_path='path/to/file.xml'", ) document.to_xml("extracted_file.xml") ``` -------------------------------- ### Disable Choice Group Simplification Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md Use this configuration to prevent xml2db from simplifying choice groups with more than two options of the same data type. This is useful when you want to retain the original structure of choice groups. ```python model_config = { "tables": { "my_table_name": { "choice_transform": False } } } ``` -------------------------------- ### Force Elevation of Complex Child Elements Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md Force the elevation of a complex child element to its parent, even if it has more than 5 fields. This can help simplify the data model by pulling child fields up to the parent level. ```python model_config = { "tables": { "contract": { "fields": { "docStatus": { "transform": "elevate" } } } } } ``` -------------------------------- ### Disable Deduplication for a Table Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md Opt-out of the default element deduplication behavior for a specific table. This can simplify the data model and potentially speed up queries if elements are mostly unique, at the cost of increased storage. ```python model_config = { "tables": { "my_table": {"reuse": False} } } ``` -------------------------------- ### Override Default Column Type Mapping Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md Customize the SQLAlchemy data type for a specific column in your model configuration. This is useful when the default mapping does not meet your database requirements. ```python import xml2db from sqlalchemy.dialects import mssql model_config = { "tables": { "my_table": { "fields": { "my_column": { "type": mssql.BIGINT } } }, }, } data_model = xml2db.DataModel( xsd_file="path/to/file.xsd", db_schema="my_schema", model_config=model_config ) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.