### Install Development Dependencies
Source: https://github.com/cre-dev/xml2db/blob/main/README.md
Install additional development dependencies for testing and documentation, after cloning the repository.
```bash
pip install -e .[tests,docs]
```
--------------------------------
### Install xml2db Package
Source: https://github.com/cre-dev/xml2db/blob/main/README.md
Install the xml2db package using pip. It is recommended to do this within a virtual environment.
```bash
pip install xml2db
```
--------------------------------
### Install Project with Dev Dependencies
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Installs the project in editable mode along with development and documentation dependencies, including DuckDB and pytz.
```bash
pip install -e .[tests,docs] duckdb_engine pytz
```
--------------------------------
### Install xml2db in Editable Mode
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
For development, clone the repository and install xml2db in editable mode with development dependencies.
```bash
pip install -e .[docs,tests]
```
--------------------------------
### Load XML into Database
Source: https://github.com/cre-dev/xml2db/blob/main/README.md
Use this snippet to create a data model from an XSD, parse an XML file, and insert its content into a relational database. Ensure you have the necessary database driver installed.
```python
from xml2db import DataModel
# Create a data model of tables with relations based on the XSD file
data_model = DataModel(
xsd_file="path/to/file.xsd",
connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
)
# Parse an XML file based on this XSD
document = data_model.parse_xml(
xml_file="path/to/file.xml"
)
# Insert the document content into the database
document.insert_into_target_tables()
```
--------------------------------
### Add Custom SQLAlchemy Index
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
Pass extra arguments to SQLAlchemy's Table constructor to customize indexes. This example demonstrates adding a custom index on multiple columns for a specific table.
```python
model_config = {
"tables": {
"my_table": {
"extra_args": sqlalchemy.Index("my_index", "my_column1", "my_column2"),
}
}
}
```
--------------------------------
### Run a Specific Test by Name
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Executes a specific test identified by its name, for example, 'test_iterative_recursive_parsing'.
```bash
pytest -k "test_iterative_recursive_parsing"
```
--------------------------------
### Serve Documentation Locally
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Builds and serves the project's documentation locally using MkDocs, allowing for previewing changes.
```bash
mkdocs serve
```
--------------------------------
### Loading XML into a Database
Source: https://github.com/cre-dev/xml2db/blob/main/docs/index.md
This snippet demonstrates how to initialize a DataModel from an XSD file and a database connection string, parse an XML file, and load the data into the database tables.
```Python
from xml2db import DataModel
# Create a DataModel object from an XSD file
data_model = DataModel(
xsd_file="path/to/file.xsd",
connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
)
# Parse an XML file based on this XSD schema
document = data_model.parse_xml(xml_file="path/to/file.xml")
# Load data into the database, creating target tables if need be
document.insert_into_target_tables()
```
--------------------------------
### Create a DataModel object
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
Create a DataModel object by providing the path to an XSD file, the target database schema name, and a SQLAlchemy connection string. An optional model configuration can also be provided.
```python
from xml2db import DataModel
data_model = DataModel(
xsd_file="path/to/file.xsd",
db_schema="source_data", # the name of the database target schema
connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
model_config={},
)
```
--------------------------------
### Regenerate Snapshot Tests
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Navigates to the sample models directory and runs the 'models.py' script to regenerate snapshot files for model outputs.
```bash
cd tests/sample_models && python models.py
```
--------------------------------
### Run Tests Against a Real Database
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Executes tests against a persistent database using a provided connection string, here shown with PostgreSQL and psycopg2.
```bash
DB_STRING="postgresql+psycopg2://user:pass@localhost/testdb" pytest
```
--------------------------------
### Run Conversion Tests Only
Source: https://github.com/cre-dev/xml2db/blob/main/README.md
Run only the conversion tests that do not require a database connection. This is useful for quick checks.
```bash
pytest -m "not dbtest"
```
--------------------------------
### Run a Specific Test File
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Executes tests located within a particular file, such as 'tests/test_conversions.py'.
```bash
pytest tests/test_conversions.py
```
--------------------------------
### n-n Relationship Modeling with Third Table
Source: https://github.com/cre-dev/xml2db/blob/main/docs/how_it_works.md
Demonstrates how n-n relationships are represented in a SQL model using an additional table to hold the relationship.
```mermaid
erDiagram
CONTRACT ||--|{ CONTRACT_DELIVERY_PROFILE : is_in
CONTRACT_DELIVERY_PROFILE }|--|| DELIVERY_PROFILE : involves
```
--------------------------------
### Run All Tests
Source: https://github.com/cre-dev/xml2db/blob/main/README.md
Execute all tests for the xml2db package using the pytest command.
```bash
python -m pytest
```
--------------------------------
### Data Loading Flowchart
Source: https://github.com/cre-dev/xml2db/blob/main/docs/api/overview.md
Visual representation of the data loading process from an XML file into database tables, detailing the functions involved in lower-level steps. Useful for understanding advanced data transformation and loading scenarios.
```mermaid
flowchart TB
subgraph "DataModel.parse_xml"
direction TB
A[XML file]-- "XMLConverter.parse_xml" -->B[Document tree]
B-- "Document.doc_tree_to_flat_data" -->C[Flat data model]
end
C -.- D
subgraph "Document.insert_into_target_tables"
direction TB
D[Flat data model]-- "Document.insert_into_temp_tables" -->E[Temporary tables]
E-- "Document.merge_into_target_tables" -->F[Target tables]
end
```
--------------------------------
### Data Model Visualization
Source: https://github.com/cre-dev/xml2db/blob/main/docs/index.md
This Mermaid diagram illustrates the structure of a data model extracted from an XSD file, showing tables and their relationships.
```mermaid
erDiagram
Unavailability_MarketDocument ||--o{ TimeSeries : "TimeSeries*"
Unavailability_MarketDocument ||--|{ Reason : "Reason*"
Unavailability_MarketDocument {
string mRID
string revisionNumber
NMTOKEN type
NMTOKEN process_processType
dateTime createdDateTime
string sender_MarketParticipant_mRID
NMTOKEN sender_MarketParticipant_marketRole_type
string receiver_MarketParticipant_mRID
NMTOKEN receiver_MarketParticipant_marketRole_type
string unavailability_Time_Period_timeInterval_start
string unavailability_Time_Period_timeInterval_end
NMTOKEN docStatus_value
}
TimeSeries ||--o{ Available_Period : "Available_Period*"
TimeSeries ||--o{ Available_Period : "WindPowerFeedin_Period*"
TimeSeries ||--o{ Asset_RegisteredResource : "Asset_RegisteredResource*"
TimeSeries ||--o{ Reason : "Reason*"
TimeSeries {
string mRID
NMTOKEN businessType
string biddingZone_Domain_mRID
string in_Domain_mRID
string out_Domain_mRID
date start_DateAndOrTime_date
time start_DateAndOrTime_time
date end_DateAndOrTime_date
time end_DateAndOrTime_time
NMTOKEN quantity_Measure_Unit_name
NMTOKEN curveType
string production_RegisteredResource_mRID
string production_RegisteredResource_name
string production_RegisteredResource_location_name
NMTOKEN production_RegisteredResource_pSRType_psrType
string production_RegisteredResource_pSRType_powerSystemResources_mRID
string production_RegisteredResource_pSRType_powerSystemResources_name
float production_RegisteredResource_pSRType_powerSystemResources_nominalP
}
Available_Period ||--|{ Point : "Point*"
Available_Period {
string timeInterval_start
string timeInterval_end
duration resolution
}
Point {
integer position
decimal quantity
}
Asset_RegisteredResource {
string mRID
string name
NMTOKEN asset_PSRType_psrType
string location_name
}
Reason {
NMTOKEN code
string text
}
```
--------------------------------
### Model Configuration General Structure
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
This dictionary structure outlines the general configuration options available for the xml2db data model. It shows top-level model settings and nested table-specific configurations.
```python
{
"document_tree_hook": None,
"document_tree_node_hook": None,
"row_numbers": False,
"as_columnstore": False,
"metadata_columns": None,
"tables": {
"table1": {
"reuse": True,
"choice_transform": False,
"as_columnstore": False,
"fields": {
"my_column": {
"type": None #default type
}
},
"extra_args": [],
}
}
}
```
--------------------------------
### Configure Joining for Simple Types
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
Configure how simple type elements with specific XSD types and maximum occurrences are joined into a single column. This setting is currently always applied and cannot be opted out.
```python
model_config = {
"tables": {
"my_table_name": {
"fields": {
"my_field_name": {
"transform": "join"
}
}
}
}
}
```
--------------------------------
### Write source tree and target tree to a file
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
Generate text-based tree representations of the raw XML schema (source tree) and the simplified schema (target tree). This shows element names, data types, and cardinality.
```python
with open(f"source_tree.txt", "w") as f:
f.write(data_model.source_tree)
with open(f"target_tree.txt", "w") as f:
f.write(data_model.target_tree)
```
--------------------------------
### Parse a XML file
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
Parses a single XML file and prepares its content for database insertion. Ensure the data model is defined before parsing.
```python
document = data_model.parse_xml(
xml_file="path/to/file.xml",
)
document.insert_into_target_tables()
```
--------------------------------
### XML Schema Data Types and Database Mapping
Source: https://github.com/cre-dev/xml2db/blob/main/tests/sample_models/table1/table1_erd_version1.md
Illustrates the mapping of various XML data types to their database schema equivalents. This is useful for understanding how to represent complex XML structures in a relational database.
```plaintext
decimal undisclosedVolume_value
string undisclosedVolume_unit
string orderDuration_duration
dateTime orderDuration_expirationDateTime
priceIntervalQuantityDetails {
date intervalStartDate
date intervalEndDate
string daysOfTheWeek
time-N intervalStartTime
time-N intervalEndTime
decimal quantity
string unit
decimal priceTimeIntervalQuantity_value
string priceTimeIntervalQuantity_currency
}
optionDetails {
string optionStyle
string optionType
date-N optionExerciseDate
decimal optionStrikePrice_value
string optionStrikePrice_currency
}
fixingIndex {
string indexName
decimal indexValue
}
deliveryProfile {
date loadDeliveryStartDate
date loadDeliveryEndDate
string-N daysOfTheWeek
time-N loadDeliveryStartTime
time-N loadDeliveryEndTime
}
contractTradingHours {
time startTime
time endTime
date date
}
```
--------------------------------
### Load multiple XML files in one database operation
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
Accumulates data from multiple XML files in memory before inserting into the database in a single batch. This optimizes performance for numerous small files. Metadata can be passed for each file.
```python
flat_data = None
for xml_file in files:
document = data_model.parse_xml(
xml_file=xml_file,
metadata={"input_file_path": xml_file},
flat_data=flat_data,
)
flat_data = document.data
document.insert_into_target_tables()
```
--------------------------------
### Run All Tests with DB Integration
Source: https://github.com/cre-dev/xml2db/blob/main/CLAUDE.md
Executes all tests, including those requiring database integration, using in-memory DuckDB. Sets the timezone to Europe/Paris.
```bash
TZ="Europe/Paris" DB_STRING="duckdb:///:memory:" python -m pytest
```
--------------------------------
### 1-n Relationship Conversion
Source: https://github.com/cre-dev/xml2db/blob/main/docs/how_it_works.md
Shows how 1-n relationships are handled, allowing a child node to have multiple parents if used under different parent nodes.
```mermaid
erDiagram
CONTRACT ||--|{ DELIVERY_PROFILE : delivers
UNIQUE_CONTRACT }|--|{ UNIQUE_DELIVERY_PROFILE : delivers
```
--------------------------------
### Multiprocessing XML Loading with Database Lock
Source: https://github.com/cre-dev/xml2db/blob/main/docs/api/overview.md
Demonstrates parallel XML parsing across multiple processes, with database I/O serialised using a multiprocessing lock. This approach ensures data integrity for various database backends.
```python
import multiprocessing
from xml2db import DataModel
def load_one_file(xml_path, xsd_path, connection_string, lock):
# Each process creates its own DataModel with a unique temp_prefix.
model = DataModel(
xsd_file=xsd_path,
connection_string=connection_string,
)
# XML parsing is CPU-bound and runs in parallel across all processes.
doc = model.parse_xml(xml_path)
# Serialise all database I/O across processes.
with lock:
doc.insert_into_target_tables()
model.engine.dispose()
if __name__ == "__main__":
xsd_path = "schema.xsd"
connection_string = "duckdb:///data.duckdb"
xml_files = ["file1.xml", "file2.xml", "file3.xml"]
lock = multiprocessing.Lock()
processes = [
multiprocessing.Process(
target=load_one_file,
args=(xml_path, xsd_path, connection_string, lock),
)
for xml_path in xml_files
]
for p in processes:
p.start()
for p in processes:
p.join()
if p.exitcode != 0:
raise RuntimeError(f"Worker failed with exit code {p.exitcode}")
```
--------------------------------
### 1-1 Relationship Conversion to n-1
Source: https://github.com/cre-dev/xml2db/blob/main/docs/how_it_works.md
Illustrates a 1-1 relationship converted to n-1 after node reuse, where UNIQUE_TRADE holds a foreign key to UNIQUE_CONTRACT.
```mermaid
erDiagram
TRADE ||--|| CONTRACT : concerns
UNIQUE_TRADE }|--|| UNIQUE_CONTRACT : concerns
```
--------------------------------
### Write Entity Relationship Diagram to a file
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
Generate a Mermaid-compatible Entity Relationship Diagram (ERD) for the data model and write it to a markdown file. This helps visualize tables and relationships.
```python
with open(f"target_data_model_erd.md", "w") as f:
f.write(data_model.get_entity_rel_diagram())
```
--------------------------------
### Extract data back to XML
Source: https://github.com/cre-dev/xml2db/blob/main/docs/getting_started.md
Extracts data from the database based on a WHERE clause and saves it to an XML file. Primarily used for testing and round-trip validation.
```python
document = data_model.extract_from_database(
root_select_where="xml2db_input_file_path='path/to/file.xml'",
)
document.to_xml("extracted_file.xml")
```
--------------------------------
### Disable Choice Group Simplification
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
Use this configuration to prevent xml2db from simplifying choice groups with more than two options of the same data type. This is useful when you want to retain the original structure of choice groups.
```python
model_config = {
"tables": {
"my_table_name": {
"choice_transform": False
}
}
}
```
--------------------------------
### Force Elevation of Complex Child Elements
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
Force the elevation of a complex child element to its parent, even if it has more than 5 fields. This can help simplify the data model by pulling child fields up to the parent level.
```python
model_config = {
"tables": {
"contract": {
"fields": {
"docStatus": {
"transform": "elevate"
}
}
}
}
}
```
--------------------------------
### Disable Deduplication for a Table
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
Opt-out of the default element deduplication behavior for a specific table. This can simplify the data model and potentially speed up queries if elements are mostly unique, at the cost of increased storage.
```python
model_config = {
"tables": {
"my_table": {"reuse": False}
}
}
```
--------------------------------
### Override Default Column Type Mapping
Source: https://github.com/cre-dev/xml2db/blob/main/docs/configuring.md
Customize the SQLAlchemy data type for a specific column in your model configuration. This is useful when the default mapping does not meet your database requirements.
```python
import xml2db
from sqlalchemy.dialects import mssql
model_config = {
"tables": {
"my_table": {
"fields": {
"my_column": {
"type": mssql.BIGINT
}
}
},
},
}
data_model = xml2db.DataModel(
xsd_file="path/to/file.xsd", db_schema="my_schema", model_config=model_config
)
```
=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.