### Bayesian Hyperparameter Optimization Setup Source: https://context7.com/opentabular/deeptab/llms.txt Illustrates the setup for built-in Bayesian hyperparameter optimization within the DeepTab library. This example initializes a MambularRegressor and prepares data for tuning, although the full optimization call is omitted. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from deeptab.models import MambularRegressor # Generate data np.random.seed(42) X = pd.DataFrame(np.random.randn(1000, 8), columns=[f"f{i}" for i in range(8)]) y = np.dot(X.values, np.random.randn(8)) + np.random.randn(1000) * 0.1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize model model = MambularRegressor(numerical_preprocessing="standardization") ``` -------------------------------- ### Custom Training Loop Setup with DeepTab (Python) Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md Sets up a custom training loop using DeepTab's Mambular base model. This example includes initializing the model with feature information, defining a loss function (MSELoss), and an optimizer (Adam). It highlights that base models expect lists of features as input. ```python import torch import torch.nn as nn import torch.optim as optim from deeptab.base_models import Mambular from deeptab.configs import DefaultMambularConfig # Dummy data and configuration cat_feature_info = { "cat1": { "preprocessing": "imputer -> continuous_ordinal", "dimension": 1, "categories": 4, } } # Example categorical feature information num_feature_info = { "num1": {"preprocessing": "imputer -> scaler", "dimension": 1, "categories": None} } # Example numerical feature information num_classes = 1 config = DefaultMambularConfig() # Use the desired configuration # Initialize model, loss function, and optimizer model = Mambular(cat_feature_info, num_feature_info, num_classes, config) criterion = nn.MSELoss() # Use MSE for regression; change as appropriate for your task optimizer = optim.Adam(model.parameters(), lr=0.001) ``` -------------------------------- ### Install DeepTab Package Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md Installs the deeptab library using pip. This is the primary command for setting up the project. ```sh pip install deeptab ``` -------------------------------- ### Install DeepTab from Source using Poetry Source: https://github.com/opentabular/deeptab/blob/master/docs/installation.md Installs the DeepTab package by building it from the source code using Poetry. This method requires navigating to the project directory containing the 'pyproject.toml' file and then running the 'poetry install' command. It's useful for developers who need to work with the latest code or make modifications. ```bash cd deeptab poetry install ``` -------------------------------- ### Install deeptab and Mamba for CUDA Source: https://context7.com/opentabular/deeptab/llms.txt Instructions for installing the deeptab library and its dependencies for native Mamba support with CUDA acceleration. This includes installing PyTorch with CUDA support and the mamba-ssm package. ```bash pip install deeptab pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html pip install mamba-ssm ``` -------------------------------- ### Implementing Distributional Regression with MambularLSS Source: https://github.com/opentabular/deeptab/blob/master/README.md Provides an example of initializing and fitting a MambularLSS model to predict full distributions, specifying the distribution family in the fit method. ```python from deeptab.models import MambularLSS model = MambularLSS( dropout=0.2, d_model=64, n_layers=8 ) model.fit( X, y, max_epochs=150, lr=1e-04, patience=10, family="normal" ) ``` -------------------------------- ### Install DeepTab from PyPI using Pip Source: https://github.com/opentabular/deeptab/blob/master/docs/installation.md Installs the DeepTab package directly from the Python Package Index (PyPI) using pip. This is the recommended method for most users as it provides a stable, pre-compiled version of the library. The command ensures the latest version is installed or upgraded. ```bash pip install -U deeptab ``` -------------------------------- ### Install DeepTab with Pip Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md Installs the deeptab library using pip. Optionally installs mamba-ssm for specific implementations and specifies PyTorch and CUDA versions. ```sh pip install deeptab ``` ```sh pip install mamba-ssm ``` ```sh pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 torchaudio==2.0.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html pip install mamba-ssm ``` -------------------------------- ### Implement Custom Model and Sklearn Wrapper Source: https://context7.com/opentabular/deeptab/llms.txt Demonstrates how to define a custom configuration, implement a model class inheriting from BaseModel, and wrap it for scikit-learn compatibility. The example includes embedding layers and MLP architecture definition. ```python from dataclasses import dataclass import torch import torch.nn as nn from deeptab.base_models.utils import BaseModel from deeptab.arch_utils.layer_utils.embedding_layer import EmbeddingLayer from deeptab.configs import BaseConfig from deeptab.models import SklearnBaseRegressor @dataclass class MyConfig(BaseConfig): lr: float = 1e-04 weight_decay: float = 1e-06 d_model: int = 64 hidden_size: int = 128 n_layers: int = 2 pooling_method: str = "avg" class MyCustomModel(BaseModel): def __init__(self, feature_information: tuple, num_classes: int = 1, config=None, **kwargs): super().__init__(config=config, **kwargs) self.save_hyperparameters(ignore=["feature_information"]) self.returns_ensemble = False self.embedding_layer = EmbeddingLayer(*feature_information, config=config) layers = [] input_dim = sum(len(info) for info in feature_information) * config.d_model for _ in range(config.n_layers): layers.extend([ nn.Linear(input_dim if not layers else config.hidden_size, config.hidden_size), nn.ReLU(), nn.Dropout(0.1) ]) layers.append(nn.Linear(config.hidden_size, num_classes)) self.mlp = nn.Sequential(*layers) def forward(self, *data) -> torch.Tensor: x = self.embedding_layer(*data) B, S, D = x.shape x = x.reshape(B, S * D) return self.mlp(x) class MyRegressor(SklearnBaseRegressor): def __init__(self, **kwargs): super().__init__(model=MyCustomModel, config=MyConfig, **kwargs) ``` -------------------------------- ### PyTorch Training Loop Example Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md This code snippet demonstrates a typical training loop for a deep learning model using PyTorch. It includes forward and backward passes, optimizer steps, and loss calculation. It assumes the existence of a PyTorch model, optimizer, criterion, and dummy data generation. ```python for epoch in range(10): model.train() optimizer.zero_grad() # Dummy Data num_features = [torch.randn(32, 1) for _ in num_feature_info] cat_features = [torch.randint(0, 5, (32,)) for _ in cat_feature_info] labels = torch.randn(32, num_classes) # Forward pass outputs = model(num_features, cat_features) loss = criterion(outputs, labels) # Backward pass and optimization loss.backward() optimizer.step() # Print loss for monitoring print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}") ``` -------------------------------- ### Install PyTorch with CUDA Support Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md Installs specific versions of PyTorch, Torchvision, and Torchaudio with CUDA 11.8 support. This is crucial for GPU acceleration when working with compatible hardware. ```sh pip install torch==2.0.0+cu118 torchvision==0.15.0+cu118 torchaudio==2.0.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html ``` -------------------------------- ### Install Mamba-SSM for Mamba Models Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md Installs the mamba-ssm package, which is required for utilizing the original mamba and mamba2 implementations within DeepTab. ```sh pip install mamba-ssm ``` -------------------------------- ### Initialize Deep Learning Models for Tabular Data in Python Source: https://github.com/opentabular/deeptab/blob/master/efficiency/efficiency.ipynb This Python snippet initializes various deep learning models for tabular data, including Mambular, FTTransformer, TabulaRNN, MLP, ResNet, and MambAttention. It sets up feature information dictionaries and prepares the models for use, potentially with CUDA acceleration. This code serves as a setup for further performance analysis. ```python # Initialize models with updated feature info # Initialize an empty DataFrame to store the results df_results = pd.DataFrame(columns=["Model", "Num Layers", "Total CUDA Memory (MB)", "Total CUDA Time (ms)"]) ``` -------------------------------- ### Initialize Accelerator for Profiling Source: https://github.com/opentabular/deeptab/blob/master/efficiency/efficiency.ipynb Initializes the Accelerate Accelerator with memory profiling enabled. It configures the profiler to capture CPU and CUDA activities, including memory usage and recorded shapes. This setup is crucial for performance analysis. ```python import re import warnings import pandas as pd import torch from accelerate import Accelerator from accelerate.utils import ProfileKwargs from mambular.base_models.ft_transformer import FTTransformer from mambular.base_models.mambattn import MambAttention from mambular.base_models.mambular import Mambular from mambular.base_models.mlp import MLP from mambular.base_models.resnet import ResNet from mambular.base_models.tabularnn import TabulaRNN warnings.filterwarnings("ignore") import torch # Initialize models with updated feature info # Initialize an empty DataFrame to store the results df_results = pd.DataFrame(columns=["Model", "Num Features", "Total CUDA Memory (MB)", "Total CUDA Time (ms)"]) # Set up the profiler with memory profiling enabled profile_kwargs = ProfileKwargs(activities=["cpu", "cuda"], profile_memory=True, record_shapes=True) accelerator = Accelerator(cpu=False, kwargs_handlers=[profile_kwargs]) ``` -------------------------------- ### Python: DeepTab MambularLSS Distributional Regression Example Source: https://github.com/opentabular/deeptab/blob/master/docs/examples/distributional.md This Python code snippet demonstrates how to perform distributional regression using the MambularLSS model from the deeptab package. It includes data simulation, model training, and evaluation. Dependencies include numpy, pandas, and scikit-learn. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from deeptab.models import MambularLSS np.random.seed(0) n_samples = 1000 n_features = 5 X = np.random.randn(n_samples, n_features) coefficients = np.random.randn(n_features) y = np.dot(X, coefficients) + np.random.randn(n_samples) data = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(n_features)]) data["target"] = y X = data.drop(columns=["target"]) y = np.array(data["target"]) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) regressor = MambularLSS() regressor.fit(X_train, y_train, family="normal", max_epochs=10) print(regressor.evaluate(X_test, y_test)) ``` -------------------------------- ### Initialize MLP Model Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md Demonstrates how to instantiate the MLP model for tabular data. It supports customizable layer sizes, dropout, and activation functions. ```python from deeptab.base_models import MLP from deeptab.config import DefaultMLPConfig config = DefaultMLPConfig(layer_sizes=[256, 128, 32], dropout=0.2) model = MLP(feature_information=feat_info, num_classes=1, config=config) ``` -------------------------------- ### Initialize Mambular Model Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md Demonstrates how to instantiate the Mambular model with a custom configuration. The model processes tabular data through embedding layers and Mamba-based transformations. ```python from deeptab.base_models import Mambular from deeptab.config import DefaultMambularConfig config = DefaultMambularConfig(d_model=64, n_layers=4) model = Mambular(feature_information=feat_info, num_classes=1, config=config) ``` -------------------------------- ### GET /encode Source: https://context7.com/opentabular/deeptab/llms.txt Extracts latent representations from the trained model. ```APIDOC ## GET /encode ### Description Extracts learned feature representations (encoded embeddings) from the model for a given input. ### Method GET ### Parameters #### Query Parameters - **X** (DataFrame) - Required - Input data to encode - **batch_size** (int) - Optional - Batch size for processing ### Response #### Success Response (200) - **encoded** (ndarray) - The latent representation matrix of shape (n_samples, seq_len, d_model). ``` -------------------------------- ### Hyperparameter Optimization with Random Search Source: https://github.com/opentabular/deeptab/blob/master/README.md Demonstrates how to access the best parameters and scores from a random search object, and how to define a parameter distribution dictionary including preprocessing arguments. ```python print("Best Parameters:", random_search.best_params_) print("Best Score:", random_search.best_score_) param_dist = { 'd_model': randint(32, 128), 'n_layers': randint(2, 10), 'lr': uniform(1e-5, 1e-3), "numerical_preprocessing": ["ple", "standardization", "box-cox"] } ``` -------------------------------- ### Get Latent Representations with Deeptab Model Source: https://github.com/opentabular/deeptab/blob/master/README.md Illustrates how to use the `encode` method of a Deeptab model to obtain latent representations for each feature in the input data X. This is useful for understanding feature embeddings. ```python # simple encoding model.encode(X) ``` -------------------------------- ### MambularClassifier: Tabular Classification with Mamba Source: https://context7.com/opentabular/deeptab/llms.txt Example of using MambularClassifier for tabular classification tasks. Demonstrates data generation, model initialization with custom architecture parameters, training with early stopping, prediction, and evaluation. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.datasets import make_classification from deeptab.models import MambularClassifier # Generate sample classification data X, y = make_classification( n_samples=2000, n_features=20, n_informative=15, n_classes=3, n_clusters_per_class=1, random_state=42 ) X = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(20)]) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize classifier with custom architecture classifier = MambularClassifier( d_model=64, # Embedding dimension n_layers=4, # Number of Mamba layers d_state=128, # State dimension for Mamba d_conv=4, # Convolution kernel size dropout=0.1, # Regularization dropout numerical_preprocessing="standardization", # Preprocessing method pooling_method="avg" # Pooling method: avg, max, sum, last ) # Train the model with early stopping classifier.fit( X_train, y_train, max_epochs=100, lr=1e-04, batch_size=128, patience=15, # Early stopping patience val_size=0.2 # Validation split ) # Make predictions predictions = classifier.predict(X_test) probabilities = classifier.predict_proba(X_test) # Evaluate model performance results = classifier.evaluate(X_test, y_test) print(f"Results: {results}") # {'Accuracy': 0.85} ``` -------------------------------- ### POST /models/resnet/init Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md Initializes a new ResNet model instance with specific feature configurations and hyperparameters. ```APIDOC ## POST /models/resnet/init ### Description Instantiates the ResNet model class with provided feature metadata and configuration settings. ### Method POST ### Endpoint /models/resnet/init ### Parameters #### Request Body - **cat_feature_info** (dict) - Required - Dictionary defining categorical feature names and dimensions. - **num_feature_info** (dict) - Required - Dictionary defining numerical feature names and dimensions. - **num_classes** (int) - Optional - Number of output classes (default: 1). - **config** (object) - Optional - Hyperparameter configuration object. ### Request Example { "cat_feature_info": {"gender": 2, "region": 5}, "num_feature_info": {"age": 1, "income": 1}, "num_classes": 1 } ### Response #### Success Response (200) - **status** (string) - Confirmation of model initialization. #### Response Example { "status": "ResNet model initialized successfully" } ``` -------------------------------- ### Pretrain Embeddings with Contrastive Learning Source: https://context7.com/opentabular/deeptab/llms.txt Shows how to pretrain the embedding layer using contrastive learning to boost performance on limited labeled datasets. The process involves building the model, running the pretrain method, and then fine-tuning on the target task. ```python model.build_model(X_train, y_train, batch_size=128) model.pretrain( pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path="pretrained_embeddings.pth", lr=1e-3, use_positive=True, use_negative=False ) model.fit(X_train, y_train, max_epochs=50, rebuild=False) ``` -------------------------------- ### Initialize and Train DeepTab Model Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md Demonstrates how to initialize a MambularClassifier with specific architectural parameters and train it using the standard fit method. ```python from deeptab.models import MambularClassifier model = MambularClassifier( d_model=64, n_layers=4, numerical_preprocessing="ple", n_bins=50, d_conv=8 ) model.fit(X, y, max_epochs=150, lr=1e-04) ``` -------------------------------- ### MambularRegressor: Tabular Regression with Mamba Source: https://context7.com/opentabular/deeptab/llms.txt Example of using MambularRegressor for tabular regression tasks. Demonstrates creating synthetic regression data with mixed types, initializing the regressor with Piecewise Linear Encoding, training with a custom validation set, prediction, and evaluation using MSE and R2 scores. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score from deeptab.models import MambularRegressor # Create synthetic regression data with mixed types np.random.seed(42) n_samples = 1500 data = pd.DataFrame({ 'age': np.random.randint(18, 80, n_samples), 'income': np.random.uniform(20000, 150000, n_samples), 'category': np.random.choice(['A', 'B', 'C', 'D'], n_samples), 'score': np.random.uniform(0, 100, n_samples), 'region': np.random.choice(['North', 'South', 'East', 'West'], n_samples) }) # Target variable y = (data['age'] * 100 + data['income'] * 0.1 + data['score'] * 50 + np.random.normal(0, 1000, n_samples)) X_train, X_test, y_train, y_test = train_test_split( data, y.values, test_size=0.2, random_state=42 ) # Initialize regressor with PLE preprocessing for numerical features regressor = MambularRegressor( d_model=64, n_layers=4, numerical_preprocessing="ple", # Piecewise Linear Encoding n_bins=50, # Number of bins for PLE bidirectional=False, # Process features sequentially pooling_method="avg" ) # Fit with explicit validation set regressor.fit( X_train, y_train, X_val=X_test, y_val=y_test, # Custom validation set max_epochs=150, lr=1e-04, patience=20 ) # Predict and evaluate predictions = regressor.predict(X_test) results = regressor.evaluate(X_test, y_test, metrics={ "MSE": mean_squared_error, "R2": r2_score }) print(f"Results: {results}") ``` -------------------------------- ### MambularClassifier Training and Evaluation Source: https://github.com/opentabular/deeptab/blob/master/docs/examples/classification.md Demonstrates how to initialize the MambularClassifier, train it on a dataset, and evaluate its performance. ```APIDOC ## MambularClassifier.fit ### Description Trains the MambularClassifier model on the provided training dataset. ### Method Python Method Call ### Parameters #### Arguments - **X_train** (pd.DataFrame) - Required - The feature matrix for training. - **y_train** (np.array) - Required - The target labels for training. - **max_epochs** (int) - Optional - The number of training epochs (default: 10). ### Request Example classifier.fit(X_train, y_train, max_epochs=10) ## MambularClassifier.evaluate ### Description Evaluates the trained model on a test dataset and returns performance metrics. ### Method Python Method Call ### Parameters #### Arguments - **X_test** (pd.DataFrame) - Required - The feature matrix for testing. - **y_test** (np.array) - Required - The target labels for testing. ### Response #### Success Response - **metrics** (dict) - Returns a dictionary containing evaluation metrics such as accuracy or loss. ### Response Example {"accuracy": 0.85, "loss": 0.42} ``` -------------------------------- ### Implement PyTorch Training Loop for Tabular Models Source: https://github.com/opentabular/deeptab/blob/master/docs/index.md This snippet demonstrates a standard training loop for a tabular deep learning model in PyTorch. It includes dummy data generation, forward pass, loss calculation, and backpropagation steps. ```python for epoch in range(10): model.train() optimizer.zero_grad() num_features = [torch.randn(32, 1) for _ in num_feature_info] cat_features = [torch.randint(0, 5, (32,)) for _ in cat_feature_info] labels = torch.randn(32, num_classes) outputs = model(num_features, cat_features) loss = criterion(outputs, labels) loss.backward() optimizer.step() print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}") ``` -------------------------------- ### Initialize and Fit MambularLSS for Distributional Regression Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md This Python code shows how to initialize the MambularLSS model for distributional regression and fit it to the data. It specifies model parameters, training configurations like max epochs and learning rate, and the desired distribution family (e.g., 'normal'). ```python from deeptab.models import MambularLSS # Initialize the MambularLSS model model = MambularLSS( dropout=0.2, d_model=64, n_layers=8, ) # Fit the model to your data model.fit( X, y, max_epochs=150, lr=1e-04, patience=10, family="normal" # define your distribution ) ``` -------------------------------- ### Fit MambularClassifier Model in DeepTab Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md Demonstrates how to initialize and fit a MambularClassifier model from the deeptab library. It shows setting model parameters and using the fit method with training data (X, y) and training configurations. ```python from deeptab.models import MambularClassifier # Initialize and fit your model model = MambularClassifier( d_model=64, n_layers=4, numerical_preprocessing="ple", n_bins=50, d_conv=8 ) # X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array model.fit(X, y, max_epochs=150, lr=1e-04) ``` -------------------------------- ### Initialize and Fit MambularClassifier in Deeptab Source: https://github.com/opentabular/deeptab/blob/master/README.md Demonstrates how to initialize a MambularClassifier with specified preprocessing and model parameters, and then fit it to the training data (X, y). It highlights the use of `max_epochs` and `lr` for training control. X can be a pandas DataFrame or a NumPy array convertible to one. ```python from deeptab.models import MambularClassifier # Initialize and fit your model model = MambularClassifier( d_model=64, n_layers=4, numerical_preprocessing="ple", n_bins=50, d_conv=8 ) # X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array model.fit(X, y, max_epochs=150, lr=1e-04) ``` -------------------------------- ### Train Custom DeepTab Model (Python) Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md Demonstrates how to train a custom DeepTab regressor model. It initializes the MyRegressor with specified preprocessing and then fits the model to the training data, setting the maximum number of epochs. ```python regressor = MyRegressor(numerical_preprocessing="ple") regressor.fit(X_train, y_train, max_epochs=50) ``` -------------------------------- ### Train a Mambular Classifier using deeptab Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md Demonstrates how to initialize and train a MambularClassifier model. It follows the standard scikit-learn fit pattern, accepting dataframes or numpy arrays as input. ```python from deeptab.models import MambularClassifier # Initialize and fit your model model = MambularClassifier() # X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array model.fit(X, y, max_epochs=150, lr=1e-04) ``` -------------------------------- ### NODEClassifier: Training and Evaluation Source: https://context7.com/opentabular/deeptab/llms.txt Demonstrates how to initialize, train, and evaluate the NODEClassifier model using the DeepTab library. It loads a dataset, splits it into training and testing sets, configures the NODEClassifier with specified parameters, and then trains the model. Finally, it evaluates the model's accuracy on the test set. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer from deeptab.models import NODEClassifier # Load real dataset data = load_breast_cancer() X = pd.DataFrame(data.data, columns=data.feature_names) y = data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize NODE classifier model = NODEClassifier( num_trees=1024, depth=6, tree_dim=3, numerical_preprocessing="standardization" ) # Train model.fit( X_train, y_train, max_epochs=100, lr=1e-03, batch_size=256, patience=20 ) # Evaluate results = model.evaluate(X_test, y_test) print(f"Accuracy: {results['Accuracy']:.4f}") ``` -------------------------------- ### Implement Custom Model Architecture Source: https://github.com/opentabular/deeptab/blob/master/README.md Define a custom model by inheriting from BaseModel, implementing the __init__ method to handle feature information, and defining the forward pass logic. ```python from deeptab.base_models.utils import BaseModel import torch import torch.nn as nn import numpy as np class MyCustomModel(BaseModel): def __init__(self, feature_information: tuple, num_classes: int = 1, config=None, **kwargs): super().__init__(**kwargs) self.save_hyperparameters(ignore=["feature_information"]) self.returns_ensemble = False self.embedding_layer = EmbeddingLayer(*feature_information, config=config) input_dim = np.sum([len(info) * self.hparams.d_model for info in feature_information]) self.linear = nn.Linear(input_dim, num_classes) def forward(self, *data) -> torch.Tensor: x = self.embedding_layer(*data) B, S, D = x.shape x = x.reshape(B, S * D) return self.linear(x) ``` -------------------------------- ### DeepTab Custom Model Implementation Source: https://github.com/opentabular/deeptab/blob/master/README.md This section details the steps to implement a custom model in DeepTab, including defining a configuration class, creating the model architecture by inheriting from BaseModel, and integrating it with DeepTab's API. ```APIDOC ## Implementing a Custom Model in DeepTab This guide outlines the process of creating and integrating your own PyTorch models within the DeepTab framework. You will define a configuration class and then your custom model by inheriting from `deeptab.base_models.utils.BaseModel`. ### 1. Define Your Configuration Create a configuration class that inherits from `deeptab.configs.BaseConfig` to specify hyperparameters and settings for your model. ```python from dataclasses import dataclass from deeptab.configs import BaseConfig @dataclass class MyConfig(BaseConfig): lr: float = 1e-04 lr_patience: int = 10 weight_decay: float = 1e-06 n_layers: int = 4 pooling_method:str = "avg" ``` ### 2. Define Your Custom Model Inherit from `BaseModel` and define the model's architecture and forward pass. The `__init__` method should accept `feature_information`, `num_classes`, and an optional `config`. ```python from deeptab.base_models.utils import BaseModel from deeptab.utils.get_feature_dimensions import get_feature_dimensions import torch import torch.nn class MyCustomModel(BaseModel): def __init__( self, feature_information: tuple, num_classes: int = 1, config=None, **kwargs, ): super().__init__(**kwargs) self.save_hyperparameters(ignore=["feature_information"]) self.returns_ensemble = False # embedding layer self.embedding_layer = EmbeddingLayer( *feature_information, config=config, ) input_dim = np.sum( [len(info) * self.hparams.d_model for info in feature_information] ) self.linear = nn.Linear(input_dim, num_classes) def forward(self, *data) -> torch.Tensor: x = self.embedding_layer(*data) B, S, D = x.shape x = x.reshape(B, S * D) # Pass through linear layer output = self.linear(x) return output ``` ### 3. Leverage the DeepTab API To integrate your custom model with DeepTab's training and evaluation methods, you can create a wrapper class that inherits from `SklearnBaseRegressor`, `SklearnBaseClassifier`, or `SklearnBaseLSS`. ```python from deeptab.models.utils import SklearnBaseRegressor class MyRegressor(SklearnBaseRegressor): def __init__(self, **kwargs): super().__init__(model=MyCustomModel, config=MyConfig, **kwargs) ``` ### 4. Train and Evaluate Your Model Once your custom model is set up, you can train and evaluate it using the familiar DeepTab API. ```python regressor = MyRegressor(numerical_preprocessing="ple") regressor.fit(X_train, y_train, max_epochs=50) regressor.evaluate(X_test, y_test) ``` ``` -------------------------------- ### Define Custom Model Configuration Source: https://github.com/opentabular/deeptab/blob/master/README.md Create a configuration class for your model by inheriting from BaseConfig and using a dataclass to define hyperparameters. ```python from dataclasses import dataclass from deeptab.configs import BaseConfig @dataclass class MyConfig(BaseConfig): lr: float = 1e-04 lr_patience: int = 10 weight_decay: float = 1e-06 n_layers: int = 4 pooling_method: str = "avg" ``` -------------------------------- ### POST /fit Source: https://context7.com/opentabular/deeptab/llms.txt Trains the model using tabular data and optional multi-modal embeddings. ```APIDOC ## POST /fit ### Description Fits the model to the provided tabular data, optionally incorporating external embeddings. ### Method POST ### Parameters #### Request Body - **X_train** (DataFrame) - Required - Training features - **y_train** (Series/Array) - Required - Training labels - **embeddings** (list) - Optional - List of pre-computed embedding arrays (e.g., image/text) - **max_epochs** (int) - Optional - Training epochs ### Response #### Success Response (200) - **model** (object) - The trained model instance. ``` -------------------------------- ### POST /pretrain Source: https://context7.com/opentabular/deeptab/llms.txt Performs contrastive pretraining on the model's embedding layer. ```APIDOC ## POST /pretrain ### Description Pretrains the embedding layer using contrastive learning to improve performance on downstream tasks. ### Method POST ### Parameters #### Request Body - **pretrain_epochs** (int) - Required - Number of epochs for pretraining - **k_neighbors** (int) - Optional - Neighbors for contrastive loss - **temperature** (float) - Optional - Temperature for softmax - **save_path** (str) - Optional - Path to save the model weights ### Response #### Success Response (200) - **status** (str) - Confirmation of successful pretraining. ``` -------------------------------- ### Configure MambularDataModule for PyTorch Lightning Source: https://github.com/opentabular/deeptab/blob/master/docs/api/data_utils/Datautils.md Sets up a data module to manage training and validation loaders. This class automates data splitting, preprocessing, and batching for integration with PyTorch Lightning training loops. ```python from deeptab.data_utils import MambularDataModule data_module = MambularDataModule( preprocessor=my_preprocessor, batch_size=32, shuffle=True, val_size=0.2, random_state=101 ) # Prepare data and loaders data_module.preprocess_data(X_train, y_train) data_module.setup(stage='fit') ``` -------------------------------- ### Define Hyperparameter Search Space with Preprocessing Options Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md This Python code defines a dictionary for hyperparameter optimization using random search. It includes parameters for model layers, learning rate, and specific preprocessing methods, demonstrating how to optimize preprocessing steps. ```python param_dist = { 'd_model': randint(32, 128), 'n_layers': randint(2, 10), 'lr': uniform(1e-5, 1e-3), "prepro__numerical_preprocessing": ["ple", "standardization", "box-cox"] } ``` -------------------------------- ### Print Best Parameters and Score Source: https://github.com/opentabular/deeptab/blob/master/docs/homepage.md This snippet prints the best parameters found by a random search and the corresponding best score achieved. It's useful for understanding the outcome of hyperparameter optimization. ```python print("Best Parameters:", random_search.best_params_) print("Best Score:", random_search.best_score_) ``` -------------------------------- ### TabTransformer Model Source: https://github.com/opentabular/deeptab/blob/master/docs/api/base_models/BaseModels.md Initializes the TabTransformer model for tasks utilizing Transformer architecture and normalization techniques. ```APIDOC ## TabTransformer ### Description A PyTorch model for tabular tasks utilizing the Transformer architecture and various normalization techniques. ### Parameters - **cat_feature_info** (dict) - Required - Dictionary of categorical features. - **num_feature_info** (dict) - Required - Dictionary of numerical features. - **num_classes** (int) - Optional - Number of output classes (default 1). - **config** (DefaultTabTransformerConfig) - Optional - Configuration object containing hyperparameters. ### Configuration Attributes - **lr** (float) - Learning rate. - **lr_patience** (int) - Patience for learning rate scheduler. - **weight_decay** (float) - Weight decay for optimizer. - **pooling_method** (str) - Method to pool the features. ``` -------------------------------- ### Train and Evaluate MambularRegressor Source: https://github.com/opentabular/deeptab/blob/master/docs/examples/regression.md Demonstrates the end-to-end workflow for regression using MambularRegressor. It covers synthetic data generation, train-test splitting, model fitting with specified epochs, and performance evaluation. ```python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from deeptab.models import MambularRegressor np.random.seed(0) n_samples = 1000 n_features = 5 X = np.random.randn(n_samples, n_features) coefficients = np.random.randn(n_features) y = np.dot(X, coefficients) + np.random.randn(n_samples) data = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(n_features)]) data["target"] = y X = data.drop(columns=["target"]) y = np.array(data["target"]) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) regressor = MambularRegressor() regressor.fit(X_train, y_train, max_epochs=10) print(regressor.evaluate(X_test, y_test)) ``` -------------------------------- ### Optimize Hyperparameters with Bayesian Search Source: https://context7.com/opentabular/deeptab/llms.txt Demonstrates the use of the optimize_hparams method to perform Bayesian optimization on model hyperparameters. It allows for pruning, trial limits, and fixing specific parameters to streamline the search process. ```python best_params = model.optimize_hparams( X_train, y_train, X_val=X_test, y_val=y_test, time=50, max_epochs=100, prune_by_epoch=True, prune_epoch=5, fixed_params={ "pooling_method": "avg", "cat_encoding": "int" } ) print(f"Best hyperparameters: {best_params}") ``` -------------------------------- ### Benchmark Tabular Models GPU Efficiency Source: https://github.com/opentabular/deeptab/blob/master/efficiency/efficiency.ipynb This script iterates through a range of feature counts, initializes various deep learning models, and profiles their CUDA memory consumption and execution time. It uses regex to parse profiler output and aggregates the results into a pandas DataFrame for analysis. ```python import re import pandas as pd import torch from accelerate import Accelerator from accelerate.utils import ProfileKwargs from torch.profiler import profile from mambular.base_models.ft_transformer import FTTransformer from mambular.base_models.mambattn import MambAttention from mambular.base_models.mambular import Mambular from mambular.base_models.mlp import MLP from mambular.base_models.resnet import ResNet from mambular.base_models.tabularnn import TabulaRNN df_results = pd.DataFrame(columns=["Model", "Num Features", "Total CUDA Memory (MB)", "Total CUDA Time (ms)"]) profile_kwargs = ProfileKwargs(activities=["cpu", "cuda"], profile_memory=True, record_shapes=True) accelerator = Accelerator(cpu=False, kwargs_handlers=[profile_kwargs]) for n_features in range(10, 100, 10): cat_feature_info = {f"cat_feature_{i}": 10 for i in range(int(n_features / 2))} num_feature_info = {f"num_feature_{i}": 64 for i in range(int(n_features / 2))} num_features = [torch.randn(32, 64).cuda() for _ in range(int(n_features / 2))] cat_features = [torch.randint(low=0, high=10, size=(32, 1)).cuda() for _ in range(int(n_features / 2))] models = [ Mambular(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, d_model=64).cuda(), FTTransformer(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, d_model=64, n_layers=5).cuda(), TabulaRNN(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, d_model=128, dim_feedforward=256, numerical_preprocessing="ple", n_bins=64, n_layers=4).cuda(), MLP(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, layer_sizes=[512, 256, 128, 32]).cuda(), ResNet(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, layer_sizes=[512, 256, 16]).cuda(), MambAttention(num_feature_info=num_feature_info, cat_feature_info=cat_feature_info, numerical_preprocessing="ple", n_bins=64, d_state=172).cuda() ] for model in models: with profile(profile_memory=True, record_shapes=True) as prof: with torch.no_grad(): outputs = model(num_features, cat_features) key_averages = prof.key_averages() key_avg_output = str(key_averages.total_average()) cuda_memory_match = re.search(r"cuda_memory_usage=(\d+)", key_avg_output) total_cuda_memory = int(cuda_memory_match.group(1)) / (1024**2) if cuda_memory_match else 0.0 cuda_time_match = re.search(r"self_cuda_time=([\d.]+)ms", key_avg_output) total_cuda_time = float(cuda_time_match.group(1)) if cuda_time_match else 0.0 new_row = {"Model": model.__class__.__name__, "Num Features": n_features, "Total CUDA Time (ms)": total_cuda_time, "Total CUDA Memory (MB)": total_cuda_memory} df_results = pd.concat([df_results, pd.DataFrame([new_row])], ignore_index=True) ``` -------------------------------- ### Fit Deeptab Model with Unstructured Data Embeddings Source: https://github.com/opentabular/deeptab/blob/master/README.md Demonstrates how to fit a Deeptab model using both tabular data (X_train, y_train) and pre-computed embeddings from unstructured data like images and text. This allows for multimodal learning. ```python # load pretrained models image_model = ... nlp_model = ... # create embeddings img_embs = image_model.encode(images) txt_embs = nlp_model.encode(texts) # fit model on tabular data and unstructured data model.fit(X_train, y_train, embeddings=[img_embs, txt_embs]) ```