### Bayesian Hyperparameter Optimization Setup

Source: https://context7.com/opentabular/deeptab/llms.txt

Illustrates the setup for built-in Bayesian hyperparameter optimization within the DeepTab library. This example initializes a MambularRegressor and prepares data for tuning, although the full optimization call is omitted.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from deeptab.models import MambularRegressor

# Generate data
np.random.seed(42)
X = pd.DataFrame(np.random.randn(1000, 8), columns=[f"f{i}" for i in range(8)])
y = np.dot(X.values, np.random.randn(8)) + np.random.randn(1000) * 0.1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize model
model = MambularRegressor(numerical_preprocessing="standardization")


```

--------------------------------

### Custom Training Loop Setup with DeepTab (Python)

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

Sets up a custom training loop using DeepTab's Mambular base model. This example includes initializing the model with feature information, defining a loss function (MSELoss), and an optimizer (Adam). It highlights that base models expect lists of features as input.

```python
import torch
import torch.nn as nn
import torch.optim as optim
from deeptab.base_models import Mambular
from deeptab.configs import DefaultMambularConfig

# Dummy data and configuration
cat_feature_info = {
    "cat1": {
        "preprocessing": "imputer -> continuous_ordinal",
        "dimension": 1,
        "categories": 4,
    }
}  # Example categorical feature information
num_feature_info = {
    "num1": {"preprocessing": "imputer -> scaler", "dimension": 1, "categories": None}
} # Example numerical feature information
num_classes = 1
config = DefaultMambularConfig()  # Use the desired configuration

# Initialize model, loss function, and optimizer
model = Mambular(cat_feature_info, num_feature_info, num_classes, config)
criterion = nn.MSELoss()  # Use MSE for regression; change as appropriate for your task
optimizer = optim.Adam(model.parameters(), lr=0.001)
```

--------------------------------

### Install DeepTab Package

Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md

Installs the deeptab library using pip. This is the primary command for setting up the project.

```sh
pip install deeptab
```

--------------------------------

### Install DeepTab from Source using Poetry

Source: https://github.com/opentabular/deeptab/blob/master/docs/installation.md

Installs the DeepTab package by building it from the source code using Poetry. This method requires navigating to the project directory containing the 'pyproject.toml' file and then running the 'poetry install' command. It's useful for developers who need to work with the latest code or make modifications.

```bash
cd deeptab

poetry install
```

--------------------------------

### Install deeptab and Mamba for CUDA

Source: https://context7.com/opentabular/deeptab/llms.txt

Instructions for installing the deeptab library and its dependencies for native Mamba support with CUDA acceleration. This includes installing PyTorch with CUDA support and the mamba-ssm package.

```bash
pip install deeptab
pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
pip install mamba-ssm
```

--------------------------------

### Implementing Distributional Regression with MambularLSS

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Provides an example of initializing and fitting a MambularLSS model to predict full distributions, specifying the distribution family in the fit method.

```python
from deeptab.models import MambularLSS

model = MambularLSS(
    dropout=0.2,
    d_model=64,
    n_layers=8
)

model.fit(
    X,
    y,
    max_epochs=150,
    lr=1e-04,
    patience=10,
    family="normal"
)
```

--------------------------------

### Install DeepTab from PyPI using Pip

Source: https://github.com/opentabular/deeptab/blob/master/docs/installation.md

Installs the DeepTab package directly from the Python Package Index (PyPI) using pip. This is the recommended method for most users as it provides a stable, pre-compiled version of the library. The command ensures the latest version is installed or upgraded.

```bash
pip install -U deeptab
```

--------------------------------

### Install DeepTab with Pip

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

Installs the deeptab library using pip. Optionally installs mamba-ssm for specific implementations and specifies PyTorch and CUDA versions.

```sh
pip install deeptab

```

```sh
pip install mamba-ssm
```

```sh
pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 torchaudio==2.0.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
pip install mamba-ssm
```

--------------------------------

### Implement Custom Model and Sklearn Wrapper

Source: https://context7.com/opentabular/deeptab/llms.txt

Demonstrates how to define a custom configuration, implement a model class inheriting from BaseModel, and wrap it for scikit-learn compatibility. The example includes embedding layers and MLP architecture definition.

```python
from dataclasses import dataclass
import torch
import torch.nn as nn
from deeptab.base_models.utils import BaseModel
from deeptab.arch_utils.layer_utils.embedding_layer import EmbeddingLayer
from deeptab.configs import BaseConfig
from deeptab.models import SklearnBaseRegressor

@dataclass
class MyConfig(BaseConfig):
    lr: float = 1e-04
    weight_decay: float = 1e-06
    d_model: int = 64
    hidden_size: int = 128
    n_layers: int = 2
    pooling_method: str = "avg"

class MyCustomModel(BaseModel):
    def __init__(self, feature_information: tuple, num_classes: int = 1, config=None, **kwargs):
        super().__init__(config=config, **kwargs)
        self.save_hyperparameters(ignore=["feature_information"])
        self.returns_ensemble = False
        self.embedding_layer = EmbeddingLayer(*feature_information, config=config)
        layers = []
        input_dim = sum(len(info) for info in feature_information) * config.d_model
        for _ in range(config.n_layers):
            layers.extend([
                nn.Linear(input_dim if not layers else config.hidden_size, config.hidden_size),
                nn.ReLU(),
                nn.Dropout(0.1)
            ])
        layers.append(nn.Linear(config.hidden_size, num_classes))
        self.mlp = nn.Sequential(*layers)

    def forward(self, *data) -> torch.Tensor:
        x = self.embedding_layer(*data)
        B, S, D = x.shape
        x = x.reshape(B, S * D)
        return self.mlp(x)

class MyRegressor(SklearnBaseRegressor):
    def __init__(self, **kwargs):
        super().__init__(model=MyCustomModel, config=MyConfig, **kwargs)
```

--------------------------------

### PyTorch Training Loop Example

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

This code snippet demonstrates a typical training loop for a deep learning model using PyTorch. It includes forward and backward passes, optimizer steps, and loss calculation. It assumes the existence of a PyTorch model, optimizer, criterion, and dummy data generation.

```python
for epoch in range(10):
    model.train()
    optimizer.zero_grad()

    # Dummy Data
    num_features = [torch.randn(32, 1) for _ in num_feature_info]
    cat_features = [torch.randint(0, 5, (32,)) for _ in cat_feature_info]
    labels = torch.randn(32, num_classes)

    # Forward pass
    outputs = model(num_features, cat_features)
    loss = criterion(outputs, labels)

    # Backward pass and optimization
    loss.backward()
    optimizer.step()

    # Print loss for monitoring
    print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}")
```

--------------------------------

### Install PyTorch with CUDA Support

Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md

Installs specific versions of PyTorch, Torchvision, and Torchaudio with CUDA 11.8 support. This is crucial for GPU acceleration when working with compatible hardware.

```sh
pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 torchaudio==2.0.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
```

--------------------------------

### Install Mamba-SSM for Mamba Models

Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md

Installs the mamba-ssm package, which is required for utilizing the original mamba and mamba2 implementations within DeepTab.

```sh
pip install mamba-ssm
```

--------------------------------

### Initialize Deep Learning Models for Tabular Data in Python

Source: https://github.com/opentabular/deeptab/blob/master/efficiency/efficiency.ipynb

This Python snippet initializes various deep learning models for tabular data, including Mambular, FTTransformer, TabulaRNN, MLP, ResNet, and MambAttention. It sets up feature information dictionaries and prepares the models for use, potentially with CUDA acceleration. This code serves as a setup for further performance analysis.

```python
# Initialize models with updated feature info

# Initialize an empty DataFrame to store the results
df_results = pd.DataFrame(columns=["Model", "Num Layers", "Total CUDA Memory (MB)", "Total CUDA Time (ms)"])

```

--------------------------------

### Initialize Accelerator for Profiling

Source: https://github.com/opentabular/deeptab/blob/master/efficiency/efficiency.ipynb

Initializes the Accelerate Accelerator with memory profiling enabled. It configures the profiler to capture CPU and CUDA activities, including memory usage and recorded shapes. This setup is crucial for performance analysis.

```python
import re
import warnings

import pandas as pd
import torch
from accelerate import Accelerator
from accelerate.utils import ProfileKwargs

from mambular.base_models.ft_transformer import FTTransformer
from mambular.base_models.mambattn import MambAttention
from mambular.base_models.mambular import Mambular
from mambular.base_models.mlp import MLP
from mambular.base_models.resnet import ResNet
from mambular.base_models.tabularnn import TabulaRNN

warnings.filterwarnings("ignore")


import torch

# Initialize models with updated feature info


# Initialize an empty DataFrame to store the results
df_results = pd.DataFrame(columns=["Model", "Num Features", "Total CUDA Memory (MB)", "Total CUDA Time (ms)"])

# Set up the profiler with memory profiling enabled
profile_kwargs = ProfileKwargs(activities=["cpu", "cuda"], profile_memory=True, record_shapes=True)
accelerator = Accelerator(cpu=False, kwargs_handlers=[profile_kwargs])
```

--------------------------------

### Python: DeepTab MambularLSS Distributional Regression Example

Source: https://github.com/opentabular/deeptab/blob/master/docs/examples/distributional.md

This Python code snippet demonstrates how to perform distributional regression using the MambularLSS model from the deeptab package. It includes data simulation, model training, and evaluation. Dependencies include numpy, pandas, and scikit-learn.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from deeptab.models import MambularLSS

np.random.seed(0)

n_samples = 1000
n_features = 5

X = np.random.randn(n_samples, n_features)
coefficients = np.random.randn(n_features)

y = np.dot(X, coefficients) + np.random.randn(n_samples)

data = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(n_features)])
data["target"] = y

X = data.drop(columns=["target"])
y = np.array(data["target"])


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


regressor = MambularLSS()

regressor.fit(X_train, y_train, family="normal", max_epochs=10)

print(regressor.evaluate(X_test, y_test))
```

--------------------------------

### Initialize MLP Model

Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md

Demonstrates how to instantiate the MLP model for tabular data. It supports customizable layer sizes, dropout, and activation functions.

```python
from deeptab.base_models import MLP
from deeptab.config import DefaultMLPConfig

config = DefaultMLPConfig(layer_sizes=[256, 128, 32], dropout=0.2)
model = MLP(feature_information=feat_info, num_classes=1, config=config)
```

--------------------------------

### Initialize Mambular Model

Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md

Demonstrates how to instantiate the Mambular model with a custom configuration. The model processes tabular data through embedding layers and Mamba-based transformations.

```python
from deeptab.base_models import Mambular
from deeptab.config import DefaultMambularConfig

config = DefaultMambularConfig(d_model=64, n_layers=4)
model = Mambular(feature_information=feat_info, num_classes=1, config=config)
```

--------------------------------

### GET /encode

Source: https://context7.com/opentabular/deeptab/llms.txt

Extracts latent representations from the trained model.

```APIDOC
## GET /encode

### Description
Extracts learned feature representations (encoded embeddings) from the model for a given input.

### Method
GET

### Parameters
#### Query Parameters
- **X** (DataFrame) - Required - Input data to encode
- **batch_size** (int) - Optional - Batch size for processing

### Response
#### Success Response (200)
- **encoded** (ndarray) - The latent representation matrix of shape (n_samples, seq_len, d_model).
```

--------------------------------

### Hyperparameter Optimization with Random Search

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Demonstrates how to access the best parameters and scores from a random search object, and how to define a parameter distribution dictionary including preprocessing arguments.

```python
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

param_dist = {
    'd_model': randint(32, 128),
    'n_layers': randint(2, 10),
    'lr': uniform(1e-5, 1e-3),
    "numerical_preprocessing": ["ple", "standardization", "box-cox"]
}
```

--------------------------------

### Get Latent Representations with Deeptab Model

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Illustrates how to use the `encode` method of a Deeptab model to obtain latent representations for each feature in the input data X. This is useful for understanding feature embeddings.

```python
# simple encoding
model.encode(X)
```

--------------------------------

### MambularClassifier: Tabular Classification with Mamba

Source: https://context7.com/opentabular/deeptab/llms.txt

Example of using MambularClassifier for tabular classification tasks. Demonstrates data generation, model initialization with custom architecture parameters, training with early stopping, prediction, and evaluation.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from deeptab.models import MambularClassifier

# Generate sample classification data
X, y = make_classification(
    n_samples=2000, n_features=20, n_informative=15,
    n_classes=3, n_clusters_per_class=1, random_state=42
)
X = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(20)])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize classifier with custom architecture
classifier = MambularClassifier(
    d_model=64,           # Embedding dimension
    n_layers=4,           # Number of Mamba layers
    d_state=128,          # State dimension for Mamba
    d_conv=4,             # Convolution kernel size
    dropout=0.1,          # Regularization dropout
    numerical_preprocessing="standardization",  # Preprocessing method
    pooling_method="avg"  # Pooling method: avg, max, sum, last
)

# Train the model with early stopping
classifier.fit(
    X_train, y_train,
    max_epochs=100,
    lr=1e-04,
    batch_size=128,
    patience=15,          # Early stopping patience
    val_size=0.2          # Validation split
)

# Make predictions
predictions = classifier.predict(X_test)
probabilities = classifier.predict_proba(X_test)

# Evaluate model performance
results = classifier.evaluate(X_test, y_test)
print(f"Results: {results}")  # {'Accuracy': 0.85}
```

--------------------------------

### POST /models/resnet/init

Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md

Initializes a new ResNet model instance with specific feature configurations and hyperparameters.

```APIDOC
## POST /models/resnet/init

### Description
Instantiates the ResNet model class with provided feature metadata and configuration settings.

### Method
POST

### Endpoint
/models/resnet/init

### Parameters
#### Request Body
- **cat_feature_info** (dict) - Required - Dictionary defining categorical feature names and dimensions.
- **num_feature_info** (dict) - Required - Dictionary defining numerical feature names and dimensions.
- **num_classes** (int) - Optional - Number of output classes (default: 1).
- **config** (object) - Optional - Hyperparameter configuration object.

### Request Example
{
  "cat_feature_info": {"gender": 2, "region": 5},
  "num_feature_info": {"age": 1, "income": 1},
  "num_classes": 1
}

### Response
#### Success Response (200)
- **status** (string) - Confirmation of model initialization.

#### Response Example
{
  "status": "ResNet model initialized successfully"
}
```

--------------------------------

### Pretrain Embeddings with Contrastive Learning

Source: https://context7.com/opentabular/deeptab/llms.txt

Shows how to pretrain the embedding layer using contrastive learning to boost performance on limited labeled datasets. The process involves building the model, running the pretrain method, and then fine-tuning on the target task.

```python
model.build_model(X_train, y_train, batch_size=128)
model.pretrain(
    pretrain_epochs=15,
    k_neighbors=10,
    temperature=0.1,
    save_path="pretrained_embeddings.pth",
    lr=1e-3,
    use_positive=True,
    use_negative=False
)
model.fit(X_train, y_train, max_epochs=50, rebuild=False)
```

--------------------------------

### Initialize and Train DeepTab Model

Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md

Demonstrates how to initialize a MambularClassifier with specific architectural parameters and train it using the standard fit method.

```python
from deeptab.models import MambularClassifier
model = MambularClassifier(
    d_model=64,
    n_layers=4,
    numerical_preprocessing="ple",
    n_bins=50,
    d_conv=8
)
model.fit(X, y, max_epochs=150, lr=1e-04)
```

--------------------------------

### MambularRegressor: Tabular Regression with Mamba

Source: https://context7.com/opentabular/deeptab/llms.txt

Example of using MambularRegressor for tabular regression tasks. Demonstrates creating synthetic regression data with mixed types, initializing the regressor with Piecewise Linear Encoding, training with a custom validation set, prediction, and evaluation using MSE and R2 scores.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from deeptab.models import MambularRegressor

# Create synthetic regression data with mixed types
np.random.seed(42)
n_samples = 1500
data = pd.DataFrame({
    'age': np.random.randint(18, 80, n_samples),
    'income': np.random.uniform(20000, 150000, n_samples),
    'category': np.random.choice(['A', 'B', 'C', 'D'], n_samples),
    'score': np.random.uniform(0, 100, n_samples),
    'region': np.random.choice(['North', 'South', 'East', 'West'], n_samples)
})
# Target variable
y = (data['age'] * 100 + data['income'] * 0.1 +
     data['score'] * 50 + np.random.normal(0, 1000, n_samples))

X_train, X_test, y_train, y_test = train_test_split(
    data, y.values, test_size=0.2, random_state=42
)

# Initialize regressor with PLE preprocessing for numerical features
regressor = MambularRegressor(
    d_model=64,
    n_layers=4,
    numerical_preprocessing="ple",  # Piecewise Linear Encoding
    n_bins=50,                      # Number of bins for PLE
    bidirectional=False,            # Process features sequentially
    pooling_method="avg"
)

# Fit with explicit validation set
regressor.fit(
    X_train, y_train,
    X_val=X_test, y_val=y_test,  # Custom validation set
    max_epochs=150,
    lr=1e-04,
    patience=20
)

# Predict and evaluate
predictions = regressor.predict(X_test)
results = regressor.evaluate(X_test, y_test, metrics={
    "MSE": mean_squared_error,
    "R2": r2_score
})
print(f"Results: {results}")
```

--------------------------------

### MambularClassifier Training and Evaluation

Source: https://github.com/opentabular/deeptab/blob/master/docs/examples/classification.md

Demonstrates how to initialize the MambularClassifier, train it on a dataset, and evaluate its performance.

```APIDOC
## MambularClassifier.fit

### Description
Trains the MambularClassifier model on the provided training dataset.

### Method
Python Method Call

### Parameters
#### Arguments
- **X_train** (pd.DataFrame) - Required - The feature matrix for training.
- **y_train** (np.array) - Required - The target labels for training.
- **max_epochs** (int) - Optional - The number of training epochs (default: 10).

### Request Example
classifier.fit(X_train, y_train, max_epochs=10)

## MambularClassifier.evaluate

### Description
Evaluates the trained model on a test dataset and returns performance metrics.

### Method
Python Method Call

### Parameters
#### Arguments
- **X_test** (pd.DataFrame) - Required - The feature matrix for testing.
- **y_test** (np.array) - Required - The target labels for testing.

### Response
#### Success Response
- **metrics** (dict) - Returns a dictionary containing evaluation metrics such as accuracy or loss.

### Response Example
{"accuracy": 0.85, "loss": 0.42}
```

--------------------------------

### Implement PyTorch Training Loop for Tabular Models

Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md

This snippet demonstrates a standard training loop for a tabular deep learning model in PyTorch. It includes dummy data generation, forward pass, loss calculation, and backpropagation steps.

```python
for epoch in range(10):
    model.train()
    optimizer.zero_grad()
    num_features = [torch.randn(32, 1) for _ in num_feature_info]
    cat_features = [torch.randint(0, 5, (32,)) for _ in cat_feature_info]
    labels = torch.randn(32, num_classes)
    outputs = model(num_features, cat_features)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}")
```

--------------------------------

### Initialize and Fit MambularLSS for Distributional Regression

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

This Python code shows how to initialize the MambularLSS model for distributional regression and fit it to the data. It specifies model parameters, training configurations like max epochs and learning rate, and the desired distribution family (e.g., 'normal').

```python
from deeptab.models import MambularLSS

# Initialize the MambularLSS model
model = MambularLSS(
    dropout=0.2,
    d_model=64,
    n_layers=8,

)

# Fit the model to your data
model.fit(
    X,
    y,
    max_epochs=150,
    lr=1e-04,
    patience=10,
    family="normal" # define your distribution
    )
```

--------------------------------

### Fit MambularClassifier Model in DeepTab

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

Demonstrates how to initialize and fit a MambularClassifier model from the deeptab library. It shows setting model parameters and using the fit method with training data (X, y) and training configurations.

```python
from deeptab.models import MambularClassifier
# Initialize and fit your model
model = MambularClassifier(
    d_model=64,
    n_layers=4,
    numerical_preprocessing="ple",
    n_bins=50,
    d_conv=8
)
# X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array
model.fit(X, y, max_epochs=150, lr=1e-04)
```

--------------------------------

### Initialize and Fit MambularClassifier in Deeptab

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Demonstrates how to initialize a MambularClassifier with specified preprocessing and model parameters, and then fit it to the training data (X, y). It highlights the use of `max_epochs` and `lr` for training control. X can be a pandas DataFrame or a NumPy array convertible to one.

```python
from deeptab.models import MambularClassifier
# Initialize and fit your model
model = MambularClassifier(
    d_model=64,
    n_layers=4,
    numerical_preprocessing="ple",
    n_bins=50,
    d_conv=8
)

# X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array
model.fit(X, y, max_epochs=150, lr=1e-04)
```

--------------------------------

### Train Custom DeepTab Model (Python)

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

Demonstrates how to train a custom DeepTab regressor model. It initializes the MyRegressor with specified preprocessing and then fits the model to the training data, setting the maximum number of epochs.

```python
regressor = MyRegressor(numerical_preprocessing="ple")
regressor.fit(X_train, y_train, max_epochs=50)
```

--------------------------------

### Train a Mambular Classifier using deeptab

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

Demonstrates how to initialize and train a MambularClassifier model. It follows the standard scikit-learn fit pattern, accepting dataframes or numpy arrays as input.

```python
from deeptab.models import MambularClassifier
# Initialize and fit your model
model = MambularClassifier()

# X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array
model.fit(X, y, max_epochs=150, lr=1e-04)
```

--------------------------------

### NODEClassifier: Training and Evaluation

Source: https://context7.com/opentabular/deeptab/llms.txt

Demonstrates how to initialize, train, and evaluate the NODEClassifier model using the DeepTab library. It loads a dataset, splits it into training and testing sets, configures the NODEClassifier with specified parameters, and then trains the model. Finally, it evaluates the model's accuracy on the test set.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from deeptab.models import NODEClassifier

# Load real dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize NODE classifier
model = NODEClassifier(
    num_trees=1024,
    depth=6,
    tree_dim=3,
    numerical_preprocessing="standardization"
)

# Train
model.fit(
    X_train, y_train,
    max_epochs=100,
    lr=1e-03,
    batch_size=256,
    patience=20
)

# Evaluate
results = model.evaluate(X_test, y_test)
print(f"Accuracy: {results['Accuracy']:.4f}")
```

--------------------------------

### Implement Custom Model Architecture

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Define a custom model by inheriting from BaseModel, implementing the __init__ method to handle feature information, and defining the forward pass logic.

```python
from deeptab.base_models.utils import BaseModel
import torch
import torch.nn as nn
import numpy as np

class MyCustomModel(BaseModel):
    def __init__(self, feature_information: tuple, num_classes: int = 1, config=None, **kwargs):
        super().__init__(**kwargs)
        self.save_hyperparameters(ignore=["feature_information"])
        self.returns_ensemble = False
        self.embedding_layer = EmbeddingLayer(*feature_information, config=config)
        input_dim = np.sum([len(info) * self.hparams.d_model for info in feature_information])
        self.linear = nn.Linear(input_dim, num_classes)

    def forward(self, *data) -> torch.Tensor:
        x = self.embedding_layer(*data)
        B, S, D = x.shape
        x = x.reshape(B, S * D)
        return self.linear(x)
```

--------------------------------

### DeepTab Custom Model Implementation

Source: https://github.com/opentabular/deeptab/blob/master/README.md

This section details the steps to implement a custom model in DeepTab, including defining a configuration class, creating the model architecture by inheriting from BaseModel, and integrating it with DeepTab's API.

```APIDOC
## Implementing a Custom Model in DeepTab

This guide outlines the process of creating and integrating your own PyTorch models within the DeepTab framework. You will define a configuration class and then your custom model by inheriting from `deeptab.base_models.utils.BaseModel`.

### 1. Define Your Configuration

Create a configuration class that inherits from `deeptab.configs.BaseConfig` to specify hyperparameters and settings for your model.

```python
from dataclasses import dataclass
from deeptab.configs import BaseConfig

@dataclass
class MyConfig(BaseConfig):
    lr: float = 1e-04
    lr_patience: int = 10
    weight_decay: float = 1e-06
    n_layers: int = 4
    pooling_method:str = "avg"
```

### 2. Define Your Custom Model

Inherit from `BaseModel` and define the model's architecture and forward pass. The `__init__` method should accept `feature_information`, `num_classes`, and an optional `config`.

```python
from deeptab.base_models.utils import BaseModel
from deeptab.utils.get_feature_dimensions import get_feature_dimensions
import torch
import torch.nn

class MyCustomModel(BaseModel):
    def __init__(
        self,
        feature_information: tuple,
        num_classes: int = 1,
        config=None,
        **kwargs,
    ):
         super().__init__(**kwargs)
         self.save_hyperparameters(ignore=["feature_information"])
         self.returns_ensemble = False

         # embedding layer
         self.embedding_layer = EmbeddingLayer(
             *feature_information,
             config=config,
         )

        input_dim = np.sum(
             [len(info) * self.hparams.d_model for info in feature_information]
         )

        self.linear = nn.Linear(input_dim, num_classes)

    def forward(self, *data) -> torch.Tensor:
         x = self.embedding_layer(*data)
         B, S, D = x.shape
         x = x.reshape(B, S * D)

        # Pass through linear layer
        output = self.linear(x)
        return output
```

### 3. Leverage the DeepTab API

To integrate your custom model with DeepTab's training and evaluation methods, you can create a wrapper class that inherits from `SklearnBaseRegressor`, `SklearnBaseClassifier`, or `SklearnBaseLSS`.

```python
from deeptab.models.utils import SklearnBaseRegressor

class MyRegressor(SklearnBaseRegressor):
    def __init__(self, **kwargs):
        super().__init__(model=MyCustomModel, config=MyConfig, **kwargs)
```

### 4. Train and Evaluate Your Model

Once your custom model is set up, you can train and evaluate it using the familiar DeepTab API.

```python
regressor = MyRegressor(numerical_preprocessing="ple")
regressor.fit(X_train, y_train, max_epochs=50)

regressor.evaluate(X_test, y_test)
```
```

--------------------------------

### Define Custom Model Configuration

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Create a configuration class for your model by inheriting from BaseConfig and using a dataclass to define hyperparameters.

```python
from dataclasses import dataclass
from deeptab.configs import BaseConfig

@dataclass
class MyConfig(BaseConfig):
    lr: float = 1e-04
    lr_patience: int = 10
    weight_decay: float = 1e-06
    n_layers: int = 4
    pooling_method: str = "avg"
```

--------------------------------

### POST /fit

Source: https://context7.com/opentabular/deeptab/llms.txt

Trains the model using tabular data and optional multi-modal embeddings.

```APIDOC
## POST /fit

### Description
Fits the model to the provided tabular data, optionally incorporating external embeddings.

### Method
POST

### Parameters
#### Request Body
- **X_train** (DataFrame) - Required - Training features
- **y_train** (Series/Array) - Required - Training labels
- **embeddings** (list) - Optional - List of pre-computed embedding arrays (e.g., image/text)
- **max_epochs** (int) - Optional - Training epochs

### Response
#### Success Response (200)
- **model** (object) - The trained model instance.
```

--------------------------------

### POST /pretrain

Source: https://context7.com/opentabular/deeptab/llms.txt

Performs contrastive pretraining on the model's embedding layer.

```APIDOC
## POST /pretrain

### Description
Pretrains the embedding layer using contrastive learning to improve performance on downstream tasks.

### Method
POST

### Parameters
#### Request Body
- **pretrain_epochs** (int) - Required - Number of epochs for pretraining
- **k_neighbors** (int) - Optional - Neighbors for contrastive loss
- **temperature** (float) - Optional - Temperature for softmax
- **save_path** (str) - Optional - Path to save the model weights

### Response
#### Success Response (200)
- **status** (str) - Confirmation of successful pretraining.
```

--------------------------------

### Configure MambularDataModule for PyTorch Lightning

Source: https://github.com/opentabular/deeptab/blob/master/docs/api/data_utils/Datautils.md

Sets up a data module to manage training and validation loaders. This class automates data splitting, preprocessing, and batching for integration with PyTorch Lightning training loops.

```python
from deeptab.data_utils import MambularDataModule

data_module = MambularDataModule(
    preprocessor=my_preprocessor,
    batch_size=32,
    shuffle=True,
    val_size=0.2,
    random_state=101
)

# Prepare data and loaders
data_module.preprocess_data(X_train, y_train)
data_module.setup(stage='fit')
```

--------------------------------

### Define Hyperparameter Search Space with Preprocessing Options

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

This Python code defines a dictionary for hyperparameter optimization using random search. It includes parameters for model layers, learning rate, and specific preprocessing methods, demonstrating how to optimize preprocessing steps.

```python
param_dist = {
    'd_model': randint(32, 128),
    'n_layers': randint(2, 10),
    'lr': uniform(1e-5, 1e-3),
    "prepro__numerical_preprocessing": ["ple", "standardization", "box-cox"]
}
```

--------------------------------

### Print Best Parameters and Score

Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md

This snippet prints the best parameters found by a random search and the corresponding best score achieved. It's useful for understanding the outcome of hyperparameter optimization.

```python
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)
```

--------------------------------

### TabTransformer Model

Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md

Initializes the TabTransformer model for tasks utilizing Transformer architecture and normalization techniques.

```APIDOC
## TabTransformer

### Description
A PyTorch model for tabular tasks utilizing the Transformer architecture and various normalization techniques.

### Parameters
- **cat_feature_info** (dict) - Required - Dictionary of categorical features.
- **num_feature_info** (dict) - Required - Dictionary of numerical features.
- **num_classes** (int) - Optional - Number of output classes (default 1).
- **config** (DefaultTabTransformerConfig) - Optional - Configuration object containing hyperparameters.

### Configuration Attributes
- **lr** (float) - Learning rate.
- **lr_patience** (int) - Patience for learning rate scheduler.
- **weight_decay** (float) - Weight decay for optimizer.
- **pooling_method** (str) - Method to pool the features.
```

--------------------------------

### Train and Evaluate MambularRegressor

Source: https://github.com/opentabular/deeptab/blob/master/docs/examples/regression.md

Demonstrates the end-to-end workflow for regression using MambularRegressor. It covers synthetic data generation, train-test splitting, model fitting with specified epochs, and performance evaluation.

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from deeptab.models import MambularRegressor

np.random.seed(0)
n_samples = 1000
n_features = 5

X = np.random.randn(n_samples, n_features)
coefficients = np.random.randn(n_features)
y = np.dot(X, coefficients) + np.random.randn(n_samples)

data = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(n_features)])
data["target"] = y

X = data.drop(columns=["target"])
y = np.array(data["target"])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

regressor = MambularRegressor()
regressor.fit(X_train, y_train, max_epochs=10)

print(regressor.evaluate(X_test, y_test))
```

--------------------------------

### Optimize Hyperparameters with Bayesian Search

Source: https://context7.com/opentabular/deeptab/llms.txt

Demonstrates the use of the optimize_hparams method to perform Bayesian optimization on model hyperparameters. It allows for pruning, trial limits, and fixing specific parameters to streamline the search process.

```python
best_params = model.optimize_hparams(
    X_train, y_train,
    X_val=X_test, y_val=y_test,
    time=50,
    max_epochs=100,
    prune_by_epoch=True,
    prune_epoch=5,
    fixed_params={
        "pooling_method": "avg",
        "cat_encoding": "int"
    }
)
print(f"Best hyperparameters: {best_params}")
```

--------------------------------

### Benchmark Tabular Models GPU Efficiency

Source: https://github.com/opentabular/deeptab/blob/master/efficiency/efficiency.ipynb

This script iterates through a range of feature counts, initializes various deep learning models, and profiles their CUDA memory consumption and execution time. It uses regex to parse profiler output and aggregates the results into a pandas DataFrame for analysis.

```python
import re
import pandas as pd
import torch
from accelerate import Accelerator
from accelerate.utils import ProfileKwargs
from torch.profiler import profile
from mambular.base_models.ft_transformer import FTTransformer
from mambular.base_models.mambattn import MambAttention
from mambular.base_models.mambular import Mambular
from mambular.base_models.mlp import MLP
from mambular.base_models.resnet import ResNet
from mambular.base_models.tabularnn import TabulaRNN

df_results = pd.DataFrame(columns=["Model", "Num Features", "Total CUDA Memory (MB)", "Total CUDA Time (ms)"])
profile_kwargs = ProfileKwargs(activities=["cpu", "cuda"], profile_memory=True, record_shapes=True)
accelerator = Accelerator(cpu=False, kwargs_handlers=[profile_kwargs])

for n_features in range(10, 100, 10):
    cat_feature_info = {f"cat_feature_{i}": 10 for i in range(int(n_features / 2))}
    num_feature_info = {f"num_feature_{i}": 64 for i in range(int(n_features / 2))}
    num_features = [torch.randn(32, 64).cuda() for _ in range(int(n_features / 2))]
    cat_features = [torch.randint(low=0, high=10, size=(32, 1)).cuda() for _ in range(int(n_features / 2))]

    models = [
        Mambular(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, d_model=64).cuda(),
        FTTransformer(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, d_model=64, n_layers=5).cuda(),
        TabulaRNN(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, d_model=128, dim_feedforward=256, numerical_preprocessing="ple", n_bins=64, n_layers=4).cuda(),
        MLP(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, layer_sizes=[512, 256, 128, 32]).cuda(),
        ResNet(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, layer_sizes=[512, 256, 16]).cuda(),
        MambAttention(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, d_state=172).cuda()
    ]

    for model in models:
        with profile(profile_memory=True, record_shapes=True) as prof:
            with torch.no_grad():
                outputs = model(num_features, cat_features)
        key_averages = prof.key_averages()
        key_avg_output = str(key_averages.total_average())
        cuda_memory_match = re.search(r"cuda_memory_usage=(\d+)", key_avg_output)
        total_cuda_memory = int(cuda_memory_match.group(1)) / (1024**2) if cuda_memory_match else 0.0
        cuda_time_match = re.search(r"self_cuda_time=([\d.]+)ms", key_avg_output)
        total_cuda_time = float(cuda_time_match.group(1)) if cuda_time_match else 0.0
        new_row = {"Model": model.__class__.__name__, "Num Features": n_features, "Total CUDA Time (ms)": total_cuda_time, "Total CUDA Memory (MB)": total_cuda_memory}
        df_results = pd.concat([df_results, pd.DataFrame([new_row])], ignore_index=True)
```

--------------------------------

### Fit Deeptab Model with Unstructured Data Embeddings

Source: https://github.com/opentabular/deeptab/blob/master/README.md

Demonstrates how to fit a Deeptab model using both tabular data (X_train, y_train) and pre-computed embeddings from unstructured data like images and text. This allows for multimodal learning.

```python
# load pretrained models
image_model = ...
nlp_model = ...

# create embeddings
img_embs = image_model.encode(images)
txt_embs = nlp_model.encode(texts)

# fit model on tabular data and unstructured data
model.fit(X_train, y_train, embeddings=[img_embs, txt_embs])
```