### Install xml2db Package

Source: https://cre-dev.github.io/xml2db/getting_started

Install the xml2db package using pip. It is recommended to do this within a virtual environment.

```bash
pip install xml2db

```

--------------------------------

### Install xml2db in Editable Mode with Extras

Source: https://cre-dev.github.io/xml2db/getting_started

For development, clone the git repository and install the package in editable mode with documentation and test dependencies.

```bash
pip install -e .[docs,tests]

```

--------------------------------

### Instantiate DataModel

Source: https://cre-dev.github.io/xml2db/api/data_model

Example of creating a DataModel instance with an XSD file path and a database connection string.

```python
>>> data_model = DataModel(
>>>     xsd_file="path/to/file.xsd",
>>>     connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
>>> )


```

--------------------------------

### Get All Create Index Statements

Source: https://cre-dev.github.io/xml2db/api/data_model

Yields SQL CREATE INDEX statements for all tables in the data model.

```python
def get_all_create_index_statements(self) -> Iterable[CreateIndex]:
    """Yield create index statements for all tables"""
    for tb in self.fk_ordered_tables:
        yield from tb.get_create_index_statements()

```

--------------------------------

### Multiprocessing Example for Loading XML Data

Source: https://cre-dev.github.io/xml2db/api/overview

Demonstrates how to load multiple XML files into a database concurrently using multiprocessing. It serializes database I/O operations using a lock to ensure data integrity, while XML parsing is performed in parallel.

```python
import multiprocessing
from xml2db import DataModel


def load_one_file(xml_path, xsd_path, connection_string, lock):
    # Each process creates its own DataModel with a unique temp_prefix.
    model = DataModel(
        xsd_file=xsd_path,
        connection_string=connection_string,
    )
    # XML parsing is CPU-bound and runs in parallel across all processes.
    doc = model.parse_xml(xml_path)

    # Serialise all database I/O across processes.
    with lock:
        doc.insert_into_target_tables()
        model.engine.dispose()


if __name__ == "__main__":
    xsd_path = "schema.xsd"
    connection_string = "duckdb:///data.duckdb"
    xml_files = ["file1.xml", "file2.xml", "file3.xml"]

    lock = multiprocessing.Lock()
    processes = [
        multiprocessing.Process(
            target=load_one_file,
            args=(xml_path, xsd_path, connection_string, lock),
        )
        for xml_path in xml_files
    ]
    for p in processes:
        p.start()
    for p in processes:
        p.join()
        if p.exitcode != 0:
            raise RuntimeError(f"Worker failed with exit code {p.exitcode}")

```

--------------------------------

### Get All Create Table Statements

Source: https://cre-dev.github.io/xml2db/api/data_model

Yields SQLAlchemy CREATE TABLE statements for all tables. Can generate statements for temporary tables if the 'temp' parameter is set to True.

```python
def get_all_create_table_statements(
    self,
    temp: bool = False
) -> Iterable[CreateTable]:
    """Yield sqlalchemy `create table` statements for all tables

    Args:
        temp: If `False`, yield create table statements for target tables (unprefixed). If True, yield create
            table statements for temporary tables (prefixed).
    """
    for tb in self.fk_ordered_tables:
        yield from tb.get_create_table_statements(temp)

```

--------------------------------

### Extract Data from Database

Source: https://cre-dev.github.io/xml2db/api/document

This function extracts data starting from a specified root table with a given WHERE clause. It recursively fetches related data through defined relationships (one-to-many and many-to-one).

```python
def extract_from_database(self, root_table_name, root_select_where, force_tz=None):

        """Extract data from database.

        Args:
            root_table_name (str): The name of the root table to start extraction from.
            root_select_where (str): The SQL WHERE clause to filter the root table records.
            force_tz (str, optional): Timezone to force for datetime objects. Defaults to None.

        Returns:
            dict: A dictionary containing the extracted data, structured by tables and relations.
        """
        flat_tables = {}

        with self.model.engine.connect() as conn:
            _do_extract_table(
                self.model.tables[root_table_name],
                text(root_select_where),
                None,
                [],
                flat_tables,
                conn,
            )

        self.data = flat_tables
        return flat_tables
```

--------------------------------

### Extract Subtree from Database

Source: https://cre-dev.github.io/xml2db/api/document

This function extracts a subtree from a database, starting from a specified root table and applying a WHERE clause. It can optionally enforce a timezone for datetime objects. The extracted data is returned as a dictionary of flat data tables.

```python
def extract_from_database(
    self,
    root_table_name: str,
    root_select_where: str,
    force_tz: Union[str, None] = None,
) -> dict:
    """Extract a subtree from the database and store it in a flat format

    Args:
        root_table_name: The root table name to start from
        root_select_where: A where clause to apply to this root table
        force_tz: Apply this timezone if database returns timezone-naïve datetime

    Returns:
        A shallow dict of flat data tables
    """

    if force_tz:
        force_tz = ZoneInfo(force_tz)

    def _fetch_data(
        sqla_table: Table,
        key_column: Column,
        join_sequence: list[tuple[Column, Table, Column]],
        top_where_clause: TextClause,
        order_by: Union[None, tuple[Column]],
        append_to: list,
        conn: Connection,
    ):
        """Fetch data from a specific table and write fetched rows in a dict keyed by the first row column"""
        quer = select(*(sqla_table.columns.values()))

        join_sequence = join_sequence.copy()
        if len(join_sequence) > 0:
            left_col, join_tb, right_col = join_sequence.pop()
            sub_quer = select(right_col)
            prev_join_col = left_col
            for left_col, join_tb, right_col in reversed(join_sequence):
                sub_quer = sub_quer.join(join_tb, right_col == prev_join_col)
                prev_join_col = left_col
            sub_quer = sub_quer.where(top_where_clause)
            quer = quer.where(key_column.in_(sub_quer))
        else:
            quer = quer.where(top_where_clause)

        if order_by:
            quer = quer.order_by(*order_by)

        def add_tz(x):
            if (
                force_tz
                and isinstance(x, datetime.datetime)
                and (x.tzinfo is None or x.tzinfo.utcoffset(x) is None)
            ):
                x = x.replace(tzinfo=force_tz)
            return x

        col_names = sqla_table.columns.keys()
        for row in conn.execute(quer):
            append_to.append({key: add_tz(val) for key, val in zip(col_names, row)})

    def _do_extract_table(
        tb,
        top_where_clause,
        parent_table,
        join_sequence,
        res_dict,
        conn,
    ):
        """Fetch tables and relationship tables recursively"""
        if tb.type_name not in res_dict:
            res_dict[tb.type_name] = {"records": []}
        _fetch_data(
            tb.table,
            (
                getattr(tb.table.c, f"pk_{tb.name}")
                if tb.is_reused
                else getattr(tb.table.c, f"fk_parent_{parent_table.name}")
            ),
            join_sequence,
            top_where_clause,
            (
                None
                if tb.is_reused or not tb.data_model.model_config["row_numbers"]
                else (
                    getattr(tb.table.c, f"fk_parent_{parent_table.name}"),
                    tb.table.c.xml2db_row_number,
                )
            ),
            res_dict[tb.type_name]["records"],
            conn,
        )
        join_root = (
            [(None, tb.table, getattr(tb.table.c, f"pk_{tb.name}"))]
            if parent_table is None
            else []
        )
        if len(tb.relations_n) > 0:
            if "relations_n" not in res_dict[tb.type_name]:
                res_dict[tb.type_name]["relations_n"] = {}
            for rel in tb.relations_n.values():
                if rel.rel_table_name not in res_dict[tb.type_name]["relations_n"]:
                    res_dict[tb.type_name]["relations_n"][rel.rel_table_name] = {
```

--------------------------------

### extract_from_database

Source: https://cre-dev.github.io/xml2db/api/document

Extracts a subtree from the database and stores it in a flat format. This function is useful for retrieving related data across multiple tables based on a starting point and filtering conditions.

```APIDOC
## extract_from_database

### Description
Extract a subtree from the database and store it in a flat format. This function is useful for retrieving related data across multiple tables based on a starting point and filtering conditions.

### Method
Python Function

### Parameters
#### Path Parameters
- **root_table_name** (str) - Required - The root table name to start from
- **root_select_where** (str) - Required - A where clause to apply to this root table
- **force_tz** (Union[str, None]) - Optional - Apply this timezone if database returns timezone-naïve datetime. Defaults to None.

### Returns
- **dict** - A shallow dict of flat data tables

### Example
```python
# Assuming 'db_extractor' is an instance of the class containing this method
result = db_extractor.extract_from_database(
    root_table_name='users',
    root_select_where="user_id = 1",
    force_tz='UTC'
)
```
```

--------------------------------

### Create DataModel Object from XSD

Source: https://cre-dev.github.io/xml2db/getting_started

Initialize a DataModel object by providing the path to an XSD file, the target database schema name, and a SQLAlchemy connection string. An optional model configuration can also be provided.

```python
from xml2db import DataModel

data_model = DataModel(
    xsd_file="path/to/file.xsd",
    db_schema="source_data", # the name of the database target schema
    connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
    model_config={},
)

```

--------------------------------

### DataModel Class Initialization

Source: https://cre-dev.github.io/xml2db/api/data_model

The constructor for the DataModel class, responsible for parsing XSD files, setting up database connections, and configuring the data model.

```python
def __init__(
    self,
    xsd_file: str,
    short_name: str = "DocumentRoot",
    long_name: str = None,
    base_url: str = None,
    model_config: dict = None,
    connection_string: str = None,
    db_engine: sqlalchemy.Engine = None,
    db_type: str = None,
    db_schema: str = None,
    temp_prefix: str = None,
):
    self.model_config = self._validate_config(model_config)
    self.tables_config = model_config.get("tables", {}) if model_config else {}

    xsd_file_name = xsd_file
    if base_url is None:
        base_url = os.path.normpath(os.path.dirname(xsd_file))
        xsd_file_name = os.path.basename(xsd_file)

    self.xml_schema = xmlschema.XMLSchema(xsd_file_name, base_url=base_url)
    self.lxml_schema = etree.XMLSchema(etree.parse(xsd_file))

    self.xml_converter = XMLConverter(data_model=self)
    self.data_flow_name = short_name
    self.data_flow_long_name = long_name

    if connection_string is None and db_engine is None:
        logger.warning(
            "DataModel created without connection string cannot do actual imports"
        )
        self.engine = None
        self.db_type = db_type
    else:
        if db_engine:
            self.engine = db_engine
        else:
            engine_options = {}
            if "mssql" in connection_string:
                engine_options = {
                    "fast_executemany": True,
                    "isolation_level": "SERIALIZABLE",
                }
            self.engine = create_engine(
                connection_string,
                **engine_options,
            )
        self.db_type = self.engine.dialect.name

    self.dialect = get_dialect(self.db_type)
    self.model_config = self.dialect.validate_model_config(self.model_config)
    self.db_schema = db_schema
    self.temp_prefix = str(uuid4())[:8] if temp_prefix is None else temp_prefix

    self.tables = {}
    self.names_types_map = {}
    self.root_table = None

    self.types_transforms = {}
    self.fields_transforms = {}
    self.ordered_tables_keys = []
    self.transaction_groups = []
    self.source_tree = ""

```

--------------------------------

### Create All Tables

Source: https://cre-dev.github.io/xml2db/api/data_model

Creates all tables for the data model. Can create either target tables or temporary tables for data import. This method is often called implicitly by `Document.insert_into_target_tables()`.

```python
def create_all_tables(self, temp: bool = False) -> None:
    """Create tables for the data model, either target tables or temp tables used to import data.

    You do not have to call this method explicitly when using
        [`Document.insert_into_target_tables()`](document.md#xml2db.document.Document.insert_into_target_tables),
        which will create tables if they do not exist.

    Args:
        temp: If `False`, create target tables (unprefixed). If `True`, create temporary (prefixed) tables.
    """
    for tb in self.fk_ordered_tables:
        tb.create_tables(self.engine, temp)
```

--------------------------------

### Initialize Document Object

Source: https://cre-dev.github.io/xml2db/api/document

Initializes a Document object with a DataModel. The DataModel defines the structure and transformations for the XML data. This constructor sets up the model, initializes an empty data dictionary, and prepares for XML file path assignment.

```python
def __init__(self, model: "DataModel"):
    self.model = model
    self.data = {}
    self.xml_file_path = None
```

--------------------------------

### Write Source and Target Tree Structures to Files

Source: https://cre-dev.github.io/xml2db/getting_started

Generate text files representing the raw XML schema tree and the simplified target model tree using the source_tree and target_tree attributes.

```python
with open(f"source_tree.txt", "w") as f:
    f.write(data_model.source_tree)

with open(f"target_tree.txt", "w") as f:
    f.write(data_model.target_tree)

```

--------------------------------

### create_all_tables

Source: https://cre-dev.github.io/xml2db/api/data_model

Creates all tables for the data model. This can be used for either target tables or temporary tables used for data import. It is not necessary to call this method explicitly when using `Document.insert_into_target_tables()`, as that method will create tables if they do not exist.

```APIDOC
## create_all_tables(temp: bool = False)

### Description
Create tables for the data model, either target tables or temp tables used to import data. You do not have to call this method explicitly when using `Document.insert_into_target_tables()`, which will create tables if they do not exist.

### Method
POST

### Endpoint
/create_all_tables

### Parameters
#### Query Parameters
- **temp** (bool) - Optional - If `False`, create target tables (unprefixed). If `True`, create temporary (prefixed) tables. Default: `False`
```

--------------------------------

### DataModel Initialization

Source: https://cre-dev.github.io/xml2db/api/data_model

Initializes the DataModel with attributes for target tree, metadata, and processing timestamp. It also triggers the internal model building process.

```python
self.target_tree = ""
self.metadata = MetaData()
self.processed_at = datetime.now()

self._build_model()
```

--------------------------------

### Load XML into Database

Source: https://cre-dev.github.io/xml2db

This snippet demonstrates the basic usage of xml2db to load an XML file into a relational database. It requires an XSD file for schema definition and a database connection string. The `insert_into_target_tables()` method creates tables if they don't exist.

```python
from xml2db import DataModel

# Create a DataModel object from an XSD file
data_model = DataModel(
    xsd_file="path/to/file.xsd", 
    connection_string="postgresql+psycopg2://testuser:testuser@localhost:5432/testdb",
)

# Parse an XML file based on this XSD schema
document = data_model.parse_xml(xml_file="path/to/file.xml")

# Load data into the database, creating target tables if need be
document.insert_into_target_tables()

```

--------------------------------

### Model Configuration Structure

Source: https://cre-dev.github.io/xml2db/configuring

This dictionary structure outlines the general configuration options available for the data model. It includes settings for document tree hooks, row numbers, columnstore indexes, metadata columns, and table-specific configurations.

```python
{
    "document_tree_hook": None,
    "document_tree_node_hook": None,
    "row_numbers": False,
    "as_columnstore": False,
    "metadata_columns": None,
    "tables": {
        "table1": {
            "reuse": True,
            "choice_transform": False,
            "as_columnstore": False,
            "fields": {
                "my_column": {
                    "type": None #default type
                }
            },
            "extra_args": []
        }
    }
}
```

--------------------------------

### Load Multiple XML Files in Batch

Source: https://cre-dev.github.io/xml2db/getting_started

Accumulates data from multiple XML files in memory before inserting them into the database in a single batch operation. This is efficient for loading many small files.

```python
flat_data = None
for xml_file in files:
    document = data_model.parse_xml(
        xml_file=xml_file,
        metadata={"input_file_path": xml_file},
        flat_data=flat_data,
    )
    flat_data = document.data
document.insert_into_target_tables()
```

--------------------------------

### get_all_create_table_statements

Source: https://cre-dev.github.io/xml2db/api/data_model

Yields SQL statements for creating all tables, with an option to generate statements for temporary tables.

```APIDOC
## get_all_create_table_statements(temp=False)

### Description
Yield sqlalchemy `create table` statements for all tables.

### Parameters
#### Path Parameters
- None

#### Query Parameters
- None

#### Request Body
- None

### Parameters
- **temp** (`bool`) - Optional - If `False`, yield create table statements for target tables (unprefixed). If True, yield create table statements for temporary tables (prefixed). Default: `False`

### Returns
- **Iterable[CreateTable]** - Yields sqlalchemy `create table` statements for all tables.

### Request Example
```python
# No example provided in source
```

### Response
#### Success Response (200)
- **Iterable[CreateTable]** - Yields sqlalchemy `create table` statements for all tables.

#### Response Example
```json
# No example provided in source
```
```

--------------------------------

### XMLConverter Constructor

Source: https://cre-dev.github.io/xml2db/api/xml_converter

Initializes the XMLConverter with a DataModel and an optional document tree.

```APIDOC
## XMLConverter Constructor

### Description
Initializes the XMLConverter with a DataModel and an optional document tree.

### Parameters
#### Path Parameters
- **data_model** (DataModel) - Required - The DataModel object used to parse XML files
- **document_tree** (dict) - Optional - Data in the document tree format (can be built later by the parse_xml method)
```

--------------------------------

### Create Database Schema

Source: https://cre-dev.github.io/xml2db/api/data_model

Creates the database schema if it does not already exist. This method is typically managed automatically by `Document.insert_into_target_tables()`.

```python
def create_db_schema(self) -> None:
    """Create database schema if it does not already exist.

    You do not have to call this method explicitly when using
        [`Document.insert_into_target_tables()`](document.md#xml2db.document.Document.insert_into_target_tables).
    """
    if self.db_schema is not None:
        self.dialect.create_schema(self.engine, self.db_schema)
        logger.info(f"Created schema: {self.db_schema}")
```

--------------------------------

### get_all_create_index_statements

Source: https://cre-dev.github.io/xml2db/api/data_model

Yields SQL statements for creating indexes on all tables in the data model.

```APIDOC
## get_all_create_index_statements()

### Description
Yield create index statements for all tables.

### Parameters
#### Path Parameters
- None

#### Query Parameters
- None

#### Request Body
- None

### Returns
- **Iterable[CreateIndex]** - Yields create index statements for all tables.

### Request Example
```python
# No example provided in source
```

### Response
#### Success Response (200)
- **Iterable[CreateIndex]** - Yields create index statements for all tables.

#### Response Example
```json
# No example provided in source
```
```

--------------------------------

### Configure Joining Simple Type Values

Source: https://cre-dev.github.io/xml2db/configuring

This configuration enables joining simple type elements into a single column, separated by commas. It's the default behavior and cannot be opted out.

```Python
model_config = {
   "tables": {
       "my_table_name": {
           "fields": {
               "my_field_name": {
                   "transform": "join"
               }
           }
       }
   }
}

```

--------------------------------

### XMLConverter Class Initialization

Source: https://cre-dev.github.io/xml2db/api/xml_converter

Initializes the XMLConverter with a DataModel object and an optional document tree. The DataModel is required for parsing XML files.

```python
def __init__(self, data_model: "DataModel", document_tree: dict = None):
    """A class to convert data from document tree format (nested dict) to and from XML.

    Args:
        data_model: The [`DataModel`](./data_model.md#xml2db.model.DataModel) object used to parse XML files
        document_tree: Data in the document tree format (optional, can be built later by the `parse_xml` method)
    """
    self.model = data_model
    self.document_tree = document_tree

```

--------------------------------

### Add SQLAlchemy Index via Extra Arguments

Source: https://cre-dev.github.io/xml2db/configuring

Pass a SQLAlchemy Index object within 'extra_args' to customize table creation, such as adding an index on specific columns. This requires importing the 'sqlalchemy' library.

```python
model_config = {
    "tables": {
        "my_table": {
            "extra_args": sqlalchemy.Index("my_index", "my_column1", "my_column2"),
        }
    }
}
```

--------------------------------

### Parse XML and Insert into Database

Source: https://cre-dev.github.io/xml2db/getting_started

Parses a single XML file and inserts its content into the target database tables. The `insert_into_target_tables` method handles the data loading.

```python
document = data_model.parse_xml(
    xml_file="path/to/file.xml",
)
document.insert_into_target_tables()
```

--------------------------------

### Insert and Merge Data into Target Tables

Source: https://cre-dev.github.io/xml2db/api/document

Inserts data into temporary tables and then merges them into the final target tables. Use this function to populate your database from XML data, with options to control transaction scope and batch size for insertions.

```python
def insert_into_target_tables(
    self,
    single_transaction: bool = True,
    max_lines: int = -1,
) -> int:
    """Insert and merge data into the database

    Insert data into temporary tables and then merge temporary tables into target tables.

    Args:
        single_transaction: Should we run all queries in a single transaction, or isolate queries at the minimum
            scope required to ensure database consistency?
        max_lines: The maximum number of lines to insert in a single statement when loading data to the temporary
            tables

    Returns:
        The number of inserted rows
    """
    try:
        self.model.create_db_schema()
    except Exception as e:
        logger.error(
            f"Error while creating database schema '{self.model.db_schema}'"
        )
        logger.error(e)
        raise
    try:
        self.insert_into_temp_tables(max_lines)
    except Exception as e:
        logger.error(
            f"Error while importing into temporary tables from {self.xml_file_path}"
        )
        logger.error(e)
        raise
    else:
        logger.info(
            f"Merging temporary tables into target tables for {self.xml_file_path}"
        )
        try:
            self.model.create_all_tables()  # Create target tables if not exist
            inserted_rows = self.merge_into_target_tables(single_transaction)
        except Exception as e:
            logger.error(
                f"Error while merging temporary tables into target tables for {self.xml_file_path}"
            )
            logger.error(e)
            raise
    finally:
        logger.info(f"Dropping temporary tables for {self.xml_file_path}")
        self.model.drop_all_temp_tables()

    return inserted_rows
```

--------------------------------

### Python: Convert Flat Data to Document Tree

Source: https://cre-dev.github.io/xml2db/api/document

This method converts data stored in flat tables into a document tree. It first indexes the data for efficient access and then recursively builds the tree structure.

```python
def flat_data_to_doc_tree(self) -> tuple:\n    """Convert the data stored in flat tables into a document tree\n\n    Returns:\n        A tuple (node_type, content, hash) containing the document tree\n    """\n    data_index = {}\n\n    # convert data to keyed dict for easier access\n    temp = (\n        ""\n        if f"pk_{self.model.tables[self.model.root_table].name}"\n        in self.data[self.model.root_table]["records"][0]\n        else "temp_"\n    )\n    for tb in self.model.tables.values():\n        data_index[tb.type_name] = {\n            "records": {},\n            "relations_n": {},\n        }\n        if tb.type_name in self.data:\n            data_index[tb.type_name]["records"] = {\n                row[f"{temp}pk_{tb.name}"]: row\n                for row in self.data[tb.type_name]["records"]\n            }\n        for rel in tb.relations_n.values():\n            index = {}\n            if rel.other_table.is_reused:\n                if tb.type_name in self.data:\n                    for row in self.data[tb.type_name]["relations_n"ционного][\n                        rel.rel_table_name\n                    ]["records"]:\n                        if row[f"{temp}fk_{tb.name}"] not in index:\n                            index[row[f"{temp}fk_{tb.name}"]] = []\n                        index[row[f"{temp}fk_{tb.name}"]].append(\n                            row[f"{temp}fk_{rel.other_table.name}"]\n                        )\n            else:\n                if rel.other_table.type_name in self.data:\n                    for row in self.data[rel.other_table.type_name]["records"]:\n                        if row[f"{temp}fk_parent_{tb.name}"] not in index:\n                            index[row[f"{temp}fk_parent_{tb.name}"]] = []\n                        index[row[f"{temp}fk_parent_{tb.name}"]].append(\n                            row[f"{temp}pk_{rel.other_table.name}"]\n                        )\n            data_index[tb.type_name]["relations_n"][rel.rel_table_name] = index\n\n    def _build_node(node_type: str, node_pk: int) -> tuple:\n        """Build a dict node recursively\n\n        Args:\n            node_type: The node type\n            node_pk: The node primary key\n\n        Returns:\n            A node as a tuple (node_type, content, hash)\n        """\n        tb = self.model.tables[node_type]\n        content = {}\n\n        record = data_index[node_type]["records"][node_pk]\n        for field_type, rel_name, rel in tb.fields:\n            if field_type == "col" and record[rel_name] is not None:\n                content_key = (\n                    (\n                        f"{rel_name[:-5]}__attr"\n                        if rel.has_suffix\n                        else f"{rel_name}__attr"\n                    )\n                    if rel.is_attr\n                    else rel_name\n                )\n                if isinstance(record[rel_name], datetime.datetime):\n                    content[content_key] = [\n                        record[rel_name].isoformat(timespec="milliseconds")\n                    ]\n                else:\n                    content[content_key] = (\n                        list(csv.reader([str(record[rel_name])], escapechar="\"))[0]\n                        if "," in str(record[rel_name])\n                        else [record[rel_name]]\n                    )\n            elif (\n                field_type == "rel1"\n                and record[f"{temp}{rel.field_name}"] is not None\n            ):\n                content[rel_name] = [\n                    _build_node(\n                        rel.other_table.type_name, record[f"{temp}{rel.field_name}"]\n                    )\n                ]\n            elif (\n                field_type == "reln"\n                and node_pk\n                in data_index[tb.type_name]["relations_n"][rel.rel_table_name]\n            ):\n                content[rel_name] = [\n                    _build_node(rel.other_table.type_name, pk)\n                    for pk in data_index[tb.type_name]["relations_n"ционного][\n                        rel.rel_table_name\n                    ][node_pk]\n                ]\n        return node_type, content\n\n    return _build_node(\n        self.model.root_table,\n        int(list(data_index[self.model.root_table]["records"].keys())[0]),\n    )\n
```

--------------------------------

### Insert Data into Temporary Tables

Source: https://cre-dev.github.io/xml2db/api/document

Recreates temporary tables and inserts data into them from an XML file. This function is a prerequisite for merging data into target tables. It supports batch insertion controlled by `max_lines`.

```python
def insert_into_temp_tables(self, max_lines: int = -1) -> None:
    """Insert data into temporary tables

    (Re)creates temp tables before inserting data.

    Args:
        max_lines: The maximum number of lines to insert in a single statement
    """
    logger.info(f"Dropping temp tables if exist for {self.xml_file_path}")
    self.model.drop_all_temp_tables()

    logger.info(f"Creating temp tables for {self.xml_file_path}")
    self.model.create_all_tables(temp=True)

    logger.info(f"Inserting data into temporary tables from {self.xml_file_path}")
    # insert data (order does not really matter)
    for tb in self.model.fk_ordered_tables:
        for query, data in tb.get_insert_temp_records_statements(
            self.data.get(tb.type_name, None)
        ):
            if max_lines is None or max_lines < 0:
                max_lines = len(data)
            start_idx = 0
            while start_idx < len(data):
                with self.model.engine.begin() as conn:
                    self.model.dialect.bulk_insert(
                        conn,
                        query.table,
                        data[start_idx : (start_idx + max_lines)],
                    )
                start_idx = start_idx + max_lines
```

--------------------------------

### Convert Document Tree to XML

Source: https://cre-dev.github.io/xml2db/api/xml_converter

Use this method to convert a nested dictionary representing a document tree into an XML file. You can specify an output file path, a namespace map, and the indentation string for the XML output. The method returns the root XML node.

```python
def to_xml(
    self, out_file: str = None, nsmap: dict = None, indent: str = "  "
) -> etree.Element:
    """Convert a document tree (nested dict) into an XML file

    Args:
        out_file: If provided, write output to a file.
        nsmap: An optional namespace mapping.
        indent: A string used as indentin XML output.

    Returns:
        The etree object corresponding to the root XML node.
    """
    doc = self._make_xml_node(
        self.document_tree,
        self.model.tables[self.document_tree[0]].name,
        nsmap,
    )
    if self.model.tables[self.model.root_table].is_virtual_node:
        child = None
        for child in doc:
            break
        doc = child
    if out_file:
        etree.indent(doc, space=indent)
        with open(out_file, "wt") as f:
            f.write(
                etree.tostring(
                    doc,
                    pretty_print=True,
                    encoding="utf-8",
                    xml_declaration=True,
                ).decode("utf-8")
            )
    return doc
```

--------------------------------

### create_db_schema

Source: https://cre-dev.github.io/xml2db/api/data_model

Creates the database schema if it does not already exist. This method does not need to be called explicitly when using `Document.insert_into_target_tables()`.

```APIDOC
## create_db_schema()

### Description
Create database schema if it does not already exist. You do not have to call this method explicitly when using `Document.insert_into_target_tables()`.

### Method
POST

### Endpoint
/create_db_schema
```

--------------------------------

### Drop All Tables

Source: https://cre-dev.github.io/xml2db/api/data_model

Drops all target (unprefixed) tables associated with the data model. Use with extreme caution as this action is irreversible and performed without further confirmation.

```python
def drop_all_tables(self):
    """Drop the data model target (unprefixed) tables.

    Danger:
        BE CAUTIOUS, THIS METHOD DROPS TABLES WITHOUT FURTHER NOTICE!
    """
    for tb in self.fk_ordered_tables_reversed:
        tb.drop_tables(self.engine)
```

--------------------------------

### to_xml

Source: https://cre-dev.github.io/xml2db/api/xml_converter

Converts a document tree (nested dictionary) into an XML file. It can optionally write the output to a specified file and supports custom namespace mappings and indentation.

```APIDOC
## to_xml

### Description
Converts a document tree (nested dictionary) into an XML file. It can optionally write the output to a specified file and supports custom namespace mappings and indentation.

### Method
This is a method of a class, not a direct HTTP endpoint.

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Parameters
- **out_file** (str) - Optional - If provided, write output to a file.
- **nsmap** (dict) - Optional - An optional namespace mapping.
- **indent** (str) - Optional - A string used as indentin XML output. Defaults to '  '.

### Returns
- **Element** - The etree object corresponding to the root XML node.

### Request Example
```python
# Assuming 'converter' is an instance of the class containing to_xml
converter.to_xml(out_file='output.xml', nsmap={'ns': 'http://example.com'})
```

### Response
#### Success Response
- **Element** - The etree object corresponding to the root XML node.

#### Response Example
```python
# The returned etree object can be further processed or inspected
```
```

--------------------------------

### get_entity_rel_diagram

Source: https://cre-dev.github.io/xml2db/api/data_model

Builds an entity relationship diagram for the data model using mermaid.js syntax. This diagram can be visualized in various platforms like PyCharm or GitHub.

```APIDOC
## get_entity_rel_diagram

### Description
Generates a string representation of the Entity Relationship Diagram (ERD) for the data model. The ERD uses mermaid.js syntax for visualization.

### Method
Python Function

### Parameters
#### Arguments
- **text_context** (bool) - Optional - Determines whether to include a title, text explanation, and other contextual information along with the ERD. Defaults to True.

### Returns
- **str** - A string containing the ERD definition in mermaid.js format.
```

--------------------------------

### Extract Data from Database to XML

Source: https://cre-dev.github.io/xml2db/getting_started

Extracts data from the database based on a WHERE clause and saves it into an XML file. This is useful for testing and round-trip data validation.

```python
document = data_model.extract_from_database(
    root_select_where="xml2db_input_file_path='path/to/file.xml'",
)
document.to_xml("extracted_file.xml")
```

--------------------------------

### Drop All Temporary Tables

Source: https://cre-dev.github.io/xml2db/api/data_model

Drops all temporary (prefixed) tables used for data import. This operation is irreversible and should be used with caution.

```python
def drop_all_temp_tables(self):
    """Drop the data model temporary (prefixed) tables.

    Danger:
        BE CAUTIOUS, THIS METHOD DROPS TABLES WITHOUT FURTHER NOTICE!
    """
    for tb in self.fk_ordered_tables_reversed:
        tb.drop_temp_tables(self.engine)
```

--------------------------------

### insert_into_target_tables

Source: https://cre-dev.github.io/xml2db/api/document

Inserts and merges data into the database by first loading it into temporary tables and then merging those into the final target tables. It supports transactional integrity and controls the batch size for insertions.

```APIDOC
## insert_into_target_tables(single_transaction=True, max_lines=-1)

### Description
Insert and merge data into the database. This method inserts data into temporary tables and then merges these temporary tables into the final target tables. It ensures database consistency by optionally running all queries within a single transaction.

### Method
This is a Python method call.

### Parameters
#### Arguments
- **single_transaction** (bool) - Optional - Should we run all queries in a single transaction, or isolate queries at the minimum scope required to ensure database consistency? Defaults to True.
- **max_lines** (int) - Optional - The maximum number of lines to insert in a single statement when loading data to the temporary tables. Defaults to -1.

### Returns
- **int** - The number of inserted rows.

### Source
`xml2db/document.py`
```

--------------------------------

### Python: Parse XML Document

Source: https://cre-dev.github.io/xml2db/api/document

Parses an XML file, applies transformations, and prepares data for database insertion. Supports validation skipping, iterative parsing, and recovery from malformed XML. Can append to existing data structures.

```python
def parse_xml(
    self,
    xml_file: Union[str, BytesIO],
    metadata: dict = None,
    skip_validation: bool = True,
    iterparse: bool = True,
    recover: bool = False,
    flat_data: dict = None,
) -> None:
    """Parse an XML document and apply transformation corresponding to the target data model

    This method will first parse the XML file into a dict (document tree) using lxml
    and then compute hash for all nodes based on their content, and finally convert
    the document tree to tables data, creating primary keys and relations, ready to
    be inserted in the database.

    Args:
        xml_file: The path or the file object of an XML file to parse
        metadata: A dict of metadata values to add to the root table (a value for each key defined in
            `metadata_columns` passed to model config)
        skip_validation: Should we validate the document against the schema first?
        iterparse: Parse XML using iterative parsing, which is a bit slower but uses less memory
        recover: Should we try to parse incorrect XML? (argument passed to lxml parser)
        flat_data: An existing `document.data` dict from a previously parsed document. When provided, records
            from this XML file are appended to it rather than starting fresh, allowing multiple files to be
            accumulated in memory and inserted together with a single
            [`insert_into_target_tables`][xml2db.document.Document.insert_into_target_tables] call.
    """
    self.xml_file_path = xml_file[:255] if isinstance(xml_file, str) else "<stream>"

    document_tree = self.model.xml_converter.parse_xml(
        xml_file=xml_file,
        file_path=self.xml_file_path,
        skip_validation=skip_validation,
        recover=recover,
        iterparse=iterparse,
    )

    if self.model.model_config["document_tree_hook"] is not None:
        logger.info(f"Running document_tree_hook function for {self.xml_file_path}")
        document_tree = self.model.model_config["document_tree_hook"](document_tree)

    logger.info(f"Adding records to data model for {self.xml_file_path}")
    self.data = self.doc_tree_to_flat_data(
        document_tree,
        metadata=metadata,
        flat_data=flat_data,
    )

    logger.debug(self.__repr__())

```

--------------------------------

### insert_into_temp_tables

Source: https://cre-dev.github.io/xml2db/api/document

Inserts data into temporary tables. This method first recreates the temporary tables and then populates them with data, allowing control over the maximum number of lines inserted per statement.

```APIDOC
## insert_into_temp_tables(max_lines=-1)

### Description
Insert data into temporary tables. This method first (re)creates the temporary tables and then inserts data into them. It allows specifying the maximum number of lines to insert in a single statement for efficiency.

### Method
This is a Python method call.

### Parameters
#### Arguments
- **max_lines** (int) - Optional - The maximum number of lines to insert in a single statement. Defaults to -1.

### Returns
- **None**

### Source
`xml2db/document.py`
```

--------------------------------

### Generate Entity Relationship Diagram (ERD)

Source: https://cre-dev.github.io/xml2db/getting_started

Write a markdown file containing a visual representation of the data model as an Entity Relationship Diagram using the get_entity_rel_diagram method.

```python
with open(f"target_data_model_erd.md", "w") as f:
   f.write(data_model.get_entity_rel_diagram())

```

--------------------------------

### drop_all_tables

Source: https://cre-dev.github.io/xml2db/api/data_model

Drops all target (unprefixed) tables for the data model. Use with caution as this method drops tables without further notice.

```APIDOC
## drop_all_tables()

### Description
Drop the data model target (unprefixed) tables. BE CAUTIOUS, THIS METHOD DROPS TABLES WITHOUT FURTHER NOTICE!

### Method
DELETE

### Endpoint
/drop_all_tables
```

--------------------------------

### Convert Document Tree to XML

Source: https://cre-dev.github.io/xml2db/api/document

Use this function to convert a nested dictionary representation of a document into an XML file. You can specify an output file path, a namespace map, and the indentation string for the XML output.

```python
def to_xml(
    self, out_file: str = None, nsmap: dict = None, indent: str = "  "
) -> etree.Element:
    """Convert a document tree (nested dict) into an XML file

    Args:
        out_file: If provided, write output to a file.
        nsmap: An optional namespace mapping.
        indent: A string used as indent in XML output.

    Returns:
        The etree object corresponding to the root XML node.
    """
    converter = XMLConverter(self.model)
    converter.document_tree = self.flat_data_to_doc_tree()
    return converter.to_xml(out_file=out_file, nsmap=nsmap, indent=indent)
```

--------------------------------

### Generate Entity Relationship Diagram

Source: https://cre-dev.github.io/xml2db/api/data_model

Generates a string representation of the data model's ERD using Mermaid.js syntax. Use this method to visualize the data model structure, with an option to include explanatory text.

```python
def get_entity_rel_diagram(self, text_context: bool = True) -> str:
    """Build an entity relationship diagram for the data model

    The ERD syntax is used by mermaid.js to create a visual representation of the diagram, which is supported
    by Pycharm IDE or GitHub in markdown files, among others

    Args:
        text_context: Should we add a title, a text explanation, etc. or just the ERD?

    Returns:
        A string representation of the ERD
    """
    out = ["erDiagram"]
    for tb in self.fk_ordered_tables_reversed:
        out += tb.get_entity_rel_diagram()

    if text_context:
        out = (
            [
                f"# {self.data_flow_long_name}\n",
                f"### Data model name: `{self.data_flow_name}`\n",
                (
                    "The following *Entity Relationships Diagram* represents the target data model, after the "
                    "simplification of the source data model, but before the transformations performed to optimize "
                    "data storage (transformation of `1-1` and `1-n` relationships into `n-1` and `n-n` "
                    "relationships, respectively.\n"
                ),
                (
                    "As a consequence, not all tables of the actual data model used in the database are shown. "
                    "Specifically, `1-n` relationships presented may be stored in the database using an additional "
                    "relationship table (noted with an asterisk in the relationship name).\n"
                ),
                "```mermaid",
            ]
            + out
            + [
                "```",
                (
                    "`-N` suffix in field type indicates that the field can have multiple values, which will be "
                    "stored as comma separated values."
                ),
            ]
        )
    return "\n".join(out)
```

--------------------------------

### Extract Document from Database

Source: https://cre-dev.github.io/xml2db/api/data_model

Extracts a Document object from the database using a WHERE clause on the root table. This method is not optimized for performance and is primarily used for integration testing.

```python
def extract_from_database(
    self,
    root_select_where: str,
    force_tz: Union[str, None] = None,
) -> Document:
    """Extract a document from the database, based on a where clause applied to the root table. For instance, you
        can use the column `xml2db_input_file_path` to filter the data loaded from a specific file.

    It will query all the data in the database corresponding to the rows that you select from the root table of your
        data model. Typically, a single XML file will correspond to a single row in the root table. This function
        will query the data tree below this record.

    This method was not optimized for performance and can be quite slow. It is used in integration tests to check
        the output against the data inserted into the database.

    Args:
        root_select_where: A where clause to filter the root table of the model, as a string
        force_tz: Apply this timezone if database returns timezone-naïve datetime

    Returns:
        A [`Document`](document.md) object containing extracted data

    Examples:

    """
    doc = Document(self)
    doc.extract_from_database(self.root_table, root_select_where, force_tz=force_tz)
    return doc

```

--------------------------------

### drop_all_temp_tables

Source: https://cre-dev.github.io/xml2db/api/data_model

Drops all temporary (prefixed) tables for the data model. Use with caution as this method drops tables without further notice.

```APIDOC
## drop_all_temp_tables()

### Description
Drop the data model temporary (prefixed) tables. BE CAUTIOUS, THIS METHOD DROPS TABLES WITHOUT FURTHER NOTICE!

### Method
DELETE

### Endpoint
/drop_all_temp_tables
```