# RelBench

RelBench is a benchmark for deep learning on relational databases, providing a standardized framework for end-to-end representation learning across multiple interconnected tables. It enables researchers and practitioners to develop, train, and evaluate machine learning models that operate directly on relational data without the need for manual feature engineering. The benchmark includes 11 realistic databases spanning domains like medical records, social networks, e-commerce, and sports, with 66 pre-defined predictive tasks covering entity prediction, recommendation, and autocomplete scenarios.

The framework is designed to be ML-framework agnostic while providing first-class support for PyTorch Geometric and PyTorch Frame for GNN-based approaches. RelBench handles automatic data downloading, caching, temporal train/val/test splitting, and standardized evaluation metrics. It supports three main task types: EntityTask for node-level predictions, RecommendationTask for link prediction, and AutoCompleteTask for predicting existing column values. The benchmark prevents temporal data leakage by design and provides a public leaderboard for tracking state-of-the-art progress.

## Installation

Install RelBench core functionality or with optional dependencies for modeling and examples.

```bash
# Core installation
pip install relbench

# Full installation with PyTorch Geometric and PyTorch Frame
pip install relbench[full]

# For running examples
pip install relbench[example]

# For CTU datasets integration
pip install relbench[ctu]

# For 4DBInfer datasets integration
pip install relbench[dbinfer]
```

## get_dataset_names

Returns a list of all registered dataset names available in RelBench. Use this to discover available databases before loading them.

```python
from relbench.datasets import get_dataset_names

# List all available datasets
available_datasets = get_dataset_names()
print(available_datasets)
# Output: ['rel-amazon', 'rel-avito', 'rel-event', 'rel-f1', 'rel-hm',
#          'rel-stack', 'rel-mimic', 'rel-trial', 'rel-arxiv', 'rel-salt',
#          'rel-ratebeer', 'dbinfer-avs', 'dbinfer-diginetica', ...]
```

## get_dataset

Loads a RelBench dataset by name, optionally downloading it from the RelBench server. The dataset contains a database with multiple tables linked by primary-foreign key relationships, along with validation and test timestamps for temporal splitting.

```python
from relbench.datasets import get_dataset
from relbench.base import Dataset, Database

# Load dataset with automatic download
dataset: Dataset = get_dataset("rel-f1", download=True)

# Access temporal split timestamps
print(f"Validation cutoff: {dataset.val_timestamp}")  # 2005-01-01
print(f"Test cutoff: {dataset.test_timestamp}")       # 2010-01-01

# Get database (excludes rows after test_timestamp by default)
db: Database = dataset.get_db()

# Get full database without temporal filtering
full_db: Database = dataset.get_db(upto_test_timestamp=False)

# Check database time range
print(f"Data spans: {db.min_timestamp} to {db.max_timestamp}")
```

## Database.table_dict

Access all tables in the database through a dictionary mapping table names to Table objects. Each table contains a pandas DataFrame along with metadata about primary keys, foreign keys, and time columns.

```python
from relbench.datasets import get_dataset

dataset = get_dataset("rel-f1", download=True)
db = dataset.get_db()

# List all tables in the database
print(db.table_dict.keys())
# Output: dict_keys(['constructors', 'results', 'standings',
#                    'constructor_results', 'drivers', 'qualifying',
#                    'races', 'circuits', 'constructor_standings'])

# Access a specific table
drivers_table = db.table_dict["drivers"]

# Access the underlying DataFrame
df = drivers_table.df
print(df.head())
#    driverId    driverRef code  forename    surname        dob nationality
# 0         0     hamilton  HAM     Lewis   Hamilton 1985-01-07     British
# 1         1     heidfeld  HEI      Nick   Heidfeld 1977-05-10      German

# Check table metadata
print(f"Primary key: {drivers_table.pkey_col}")           # driverId
print(f"Time column: {drivers_table.time_col}")           # None (static entity)
print(f"Foreign keys: {drivers_table.fkey_col_to_pkey_table}")  # {}

# Access a table with foreign keys
results_table = db.table_dict["results"]
print(f"Foreign keys: {results_table.fkey_col_to_pkey_table}")
# {'raceId': 'races', 'driverId': 'drivers', 'constructorId': 'constructors'}
print(f"Time column: {results_table.time_col}")  # date
```

## get_task_names

Returns a list of all registered task names for a given dataset. Each dataset has multiple predictive tasks defined on it.

```python
from relbench.tasks import get_task_names

# List all tasks for a dataset
f1_tasks = get_task_names("rel-f1")
print(f1_tasks)
# Output: ['driver-position', 'driver-dnf', 'driver-top3',
#          'driver-circuit-compete', 'results-position', 'qualifying-position']

amazon_tasks = get_task_names("rel-amazon")
print(amazon_tasks)
# Output: ['user-churn', 'user-ltv', 'item-churn', 'item-ltv',
#          'user-item-purchase', 'user-item-rate', 'user-item-review',
#          'review-rating']
```

## get_task

Loads a specific task for a dataset. Tasks define the prediction target, entity table, temporal parameters, and evaluation metrics. Returns an EntityTask, RecommendationTask, or AutoCompleteTask depending on the task type.

```python
from relbench.tasks import get_task
from relbench.base import EntityTask, TaskType

# Load an entity prediction task
task: EntityTask = get_task("rel-f1", "driver-top3", download=True)

# Check task properties
print(f"Task type: {task.task_type}")        # TaskType.BINARY_CLASSIFICATION
print(f"Entity table: {task.entity_table}")  # drivers
print(f"Entity column: {task.entity_col}")   # driverId
print(f"Target column: {task.target_col}")   # qualifying
print(f"Time delta: {task.timedelta}")       # 30 days

# Load a recommendation task
from relbench.base import RecommendationTask

rec_task: RecommendationTask = get_task("rel-hm", "user-item-purchase", download=True)
print(f"Source entity: {rec_task.src_entity_table}")  # customer
print(f"Dest entity: {rec_task.dst_entity_table}")    # article
print(f"Eval k: {rec_task.eval_k}")                   # 12
```

## task.get_table

Retrieves train, validation, or test tables for a task. Each table contains (timestamp, entity_id, label) triples. Test labels are masked by default to prevent data leakage.

```python
from relbench.tasks import get_task
from relbench.base import Table

task = get_task("rel-f1", "driver-top3", download=True)

# Get training table with labels
train_table: Table = task.get_table("train")
print(train_table.df.head())
#         date  driverId  qualifying
# 0 2004-08-04        12           0
# 1 2004-08-04        20           0
# 2 2004-07-05        10           0

print(f"Training samples: {len(train_table)}")
print(f"Time column: {train_table.time_col}")  # date
print(f"Foreign keys: {train_table.fkey_col_to_pkey_table}")  # {'driverId': 'drivers'}

# Get validation table
val_table: Table = task.get_table("val")
print(f"Validation samples: {len(val_table)}")

# Get test table (labels hidden by default)
test_table: Table = task.get_table("test")
print(test_table.df.columns.tolist())  # ['date', 'driverId'] - no 'qualifying'

# Get test table with labels (use carefully!)
test_table_full: Table = task.get_table("test", mask_input_cols=False)
print(test_table_full.df.columns.tolist())  # ['date', 'driverId', 'qualifying']
```

## task.evaluate

Evaluates model predictions against ground truth labels using task-specific metrics. Pass predictions as a numpy array matching the order of the target table.

```python
import numpy as np
from relbench.tasks import get_task

task = get_task("rel-f1", "driver-top3", download=True)

# Get validation table for evaluation
val_table = task.get_table("val")

# Generate predictions (example: random predictions)
val_pred = np.random.rand(len(val_table))

# Evaluate predictions
metrics = task.evaluate(val_pred, val_table)
print(metrics)
# {'average_precision': 0.23, 'accuracy': 0.65, 'f1': 0.18, 'roc_auc': 0.52}

# Evaluate on test set (uses test table by default)
test_table = task.get_table("test", mask_input_cols=False)
test_pred = np.random.rand(len(test_table))
test_metrics = task.evaluate(test_pred)
print(test_metrics)

# For regression tasks
task_reg = get_task("rel-f1", "driver-position", download=True)
val_table_reg = task_reg.get_table("val")
pred_reg = np.random.rand(len(val_table_reg)) * 20  # Positions 0-20
metrics_reg = task_reg.evaluate(pred_reg, val_table_reg)
print(metrics_reg)
# {'r2': -0.5, 'mae': 8.2, 'rmse': 9.1}
```

## make_pkey_fkey_graph

Constructs a heterogeneous PyTorch Geometric graph from a RelBench database using primary-foreign key relationships as edges. Returns the graph data and column statistics for each table.

```python
import torch
from torch_frame import stype
from torch_frame.config import TextEmbedderConfig
from relbench.datasets import get_dataset
from relbench.modeling.graph import make_pkey_fkey_graph
from relbench.modeling.utils import get_stype_proposal

dataset = get_dataset("rel-f1", download=True)
db = dataset.get_db()

# Get automatic semantic type proposals for columns
col_to_stype_dict = get_stype_proposal(db)
# Example output:
# {'drivers': {'driverRef': stype.text_embedded, 'nationality': stype.categorical},
#  'results': {'position': stype.numerical, 'points': stype.numerical}, ...}

# Build heterogeneous graph
data, col_stats_dict = make_pkey_fkey_graph(
    db,
    col_to_stype_dict=col_to_stype_dict,
    text_embedder_cfg=None,  # Or provide TextEmbedderConfig for text columns
    cache_dir="./cache/materialized",
)

# Inspect the graph structure
print(data.node_types)  # ['constructors', 'results', 'drivers', ...]
print(data.edge_types)  # [('results', 'f2p_driverId', 'drivers'), ...]

# Access node features (TensorFrame format)
print(data['drivers'].tf)  # TensorFrame with driver features
print(data['results'].tf)  # TensorFrame with result features

# Access edge indices
edge_type = ('results', 'f2p_driverId', 'drivers')
print(data[edge_type].edge_index.shape)  # [2, num_edges]

# Access time attributes for temporal tables
if hasattr(data['results'], 'time'):
    print(data['results'].time.shape)  # [num_results]
```

## get_node_train_table_input

Converts a task table into the format required for PyTorch Geometric's NeighborLoader for node prediction tasks. Returns node indices, timestamps, targets, and a transform function.

```python
import torch
from torch_geometric.loader import NeighborLoader
from relbench.datasets import get_dataset
from relbench.tasks import get_task
from relbench.modeling.graph import make_pkey_fkey_graph, get_node_train_table_input
from relbench.modeling.utils import get_stype_proposal

dataset = get_dataset("rel-f1", download=True)
task = get_task("rel-f1", "driver-top3", download=True)
db = dataset.get_db()

# Build graph
col_to_stype_dict = get_stype_proposal(db)
data, col_stats_dict = make_pkey_fkey_graph(db, col_to_stype_dict)

# Get training table input
train_table = task.get_table("train")
table_input = get_node_train_table_input(table=train_table, task=task)

print(f"Node type: {table_input.nodes[0]}")      # 'drivers'
print(f"Node indices shape: {table_input.nodes[1].shape}")  # [num_train_samples]
print(f"Time shape: {table_input.time.shape}")   # [num_train_samples]
print(f"Target shape: {table_input.target.shape}")  # [num_train_samples]

# Create NeighborLoader for training
train_loader = NeighborLoader(
    data,
    num_neighbors=[128, 64],  # Neighbors per layer
    time_attr="time",
    input_nodes=table_input.nodes,
    input_time=table_input.time,
    transform=table_input.transform,  # Attaches labels to batch
    batch_size=512,
    temporal_strategy="uniform",
    shuffle=True,
)

# Iterate through batches
for batch in train_loader:
    print(f"Batch node types: {batch.node_types}")
    print(f"Target labels: {batch['drivers'].y.shape}")
    break
```

## get_link_train_table_input

Converts a recommendation task table into the format required for link prediction training. Returns source nodes, destination nodes (as sparse tensor), and timestamps.

```python
import torch
from relbench.datasets import get_dataset
from relbench.tasks import get_task
from relbench.modeling.graph import make_pkey_fkey_graph, get_link_train_table_input
from relbench.modeling.loader import LinkNeighborLoader
from relbench.modeling.utils import get_stype_proposal

dataset = get_dataset("rel-hm", download=True)
task = get_task("rel-hm", "user-item-purchase", download=True)
db = dataset.get_db()

# Build graph
col_to_stype_dict = get_stype_proposal(db)
data, col_stats_dict = make_pkey_fkey_graph(db, col_to_stype_dict)

# Get link training input
train_table = task.get_table("train")
table_input = get_link_train_table_input(train_table, task)

print(f"Source node type: {table_input.src_nodes[0]}")  # 'customer'
print(f"Source indices: {table_input.src_nodes[1].shape}")
print(f"Dest node type: {table_input.dst_nodes[0]}")    # 'article'
print(f"Num dest nodes: {table_input.num_dst_nodes}")
print(f"Source time: {table_input.src_time.shape}")

# Create LinkNeighborLoader for training
train_loader = LinkNeighborLoader(
    data=data,
    num_neighbors=[64, 32],
    time_attr="time",
    src_nodes=table_input.src_nodes,
    dst_nodes=table_input.dst_nodes,
    num_dst_nodes=table_input.num_dst_nodes,
    src_time=table_input.src_time,
    share_same_time=True,
    batch_size=512,
    temporal_strategy="uniform",
)

# Each batch returns (src_batch, pos_dst_batch, neg_dst_batch)
for src_batch, pos_dst_batch, neg_dst_batch in train_loader:
    print(f"Source batch nodes: {src_batch['customer'].x.shape if hasattr(src_batch['customer'], 'x') else 'TensorFrame'}")
    break
```

## Table Class

The Table class wraps a pandas DataFrame with metadata about primary keys, foreign keys, and time columns. It supports temporal filtering and serialization.

```python
import pandas as pd
from relbench.base import Table

# Create a custom table
df = pd.DataFrame({
    'user_id': [0, 1, 2, 3],
    'product_id': [10, 20, 10, 30],
    'rating': [4.5, 3.0, 5.0, 2.5],
    'timestamp': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04'])
})

table = Table(
    df=df,
    fkey_col_to_pkey_table={'user_id': 'users', 'product_id': 'products'},
    pkey_col=None,  # No primary key for this event table
    time_col='timestamp'
)

# Access properties
print(f"Number of rows: {len(table)}")
print(f"Min timestamp: {table.min_timestamp}")
print(f"Max timestamp: {table.max_timestamp}")

# Temporal filtering
cutoff = pd.Timestamp('2024-01-02')
table_before = table.upto(cutoff)  # Rows with timestamp <= cutoff
table_after = table.from_(cutoff)  # Rows with timestamp >= cutoff

print(f"Rows before cutoff: {len(table_before)}")  # 2
print(f"Rows after cutoff: {len(table_after)}")    # 3

# Save and load
table.save("./my_table.parquet")
loaded_table = Table.load("./my_table.parquet")
```

## Database Class

The Database class manages a collection of tables with methods for temporal filtering, saving, and loading.

```python
import pandas as pd
from relbench.base import Database, Table

# Create tables
users_df = pd.DataFrame({
    'user_id': [0, 1, 2],
    'name': ['Alice', 'Bob', 'Charlie']
})
users_table = Table(users_df, {}, pkey_col='user_id', time_col=None)

events_df = pd.DataFrame({
    'event_id': [0, 1, 2],
    'user_id': [0, 1, 0],
    'action': ['click', 'purchase', 'click'],
    'timestamp': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03'])
})
events_table = Table(events_df, {'user_id': 'users'}, pkey_col='event_id', time_col='timestamp')

# Create database
db = Database({
    'users': users_table,
    'events': events_table
})

# Access tables
print(db.table_dict.keys())  # dict_keys(['users', 'events'])

# Temporal properties
print(f"Database time range: {db.min_timestamp} to {db.max_timestamp}")

# Filter database by time
cutoff = pd.Timestamp('2024-01-02')
db_filtered = db.upto(cutoff)  # All tables filtered to timestamp <= cutoff

# Save and load database
db.save("./my_database/")
loaded_db = Database.load("./my_database/")
```

## Evaluation Metrics

RelBench provides standardized metrics for different task types. Metrics are automatically selected based on task type but can be customized.

```python
import numpy as np
from relbench.metrics import (
    # Classification metrics
    accuracy, f1, roc_auc, average_precision,
    macro_f1, micro_f1, mrr,
    # Regression metrics
    mae, mse, rmse, r2,
    # Link prediction metrics
    link_prediction_recall, link_prediction_precision,
    link_prediction_map, link_prediction_ndcg,
    # Multilabel metrics
    multilabel_auprc_macro, multilabel_auroc_micro
)

# Binary classification
y_true = np.array([0, 1, 1, 0, 1])
y_pred = np.array([0.2, 0.8, 0.6, 0.3, 0.9])

print(f"Accuracy: {accuracy(y_true, y_pred):.3f}")
print(f"F1 Score: {f1(y_true, y_pred):.3f}")
print(f"ROC AUC: {roc_auc(y_true, y_pred):.3f}")
print(f"Average Precision: {average_precision(y_true, y_pred):.3f}")

# Regression
y_true_reg = np.array([1.0, 2.0, 3.0, 4.0])
y_pred_reg = np.array([1.1, 2.2, 2.8, 4.1])

print(f"MAE: {mae(y_true_reg, y_pred_reg):.3f}")
print(f"RMSE: {rmse(y_true_reg, y_pred_reg):.3f}")
print(f"R2: {r2(y_true_reg, y_pred_reg):.3f}")

# Multiclass classification
y_true_mc = np.array([0, 1, 2, 1])
y_pred_mc = np.array([
    [0.8, 0.1, 0.1],
    [0.2, 0.7, 0.1],
    [0.1, 0.2, 0.7],
    [0.3, 0.5, 0.2]
])
print(f"MRR: {mrr(y_true_mc, y_pred_mc):.3f}")
print(f"Macro F1: {macro_f1(y_true_mc, y_pred_mc):.3f}")

# Link prediction (pred_isin: bool array, dst_count: int array)
pred_isin = np.array([[True, False, True], [False, True, False]])
dst_count = np.array([2, 3])
print(f"MAP: {link_prediction_map(pred_isin, dst_count):.3f}")
print(f"NDCG: {link_prediction_ndcg(pred_isin, dst_count):.3f}")
```

## Complete GNN Training Example for Entity Prediction

End-to-end example training a Graph Neural Network on a RelBench entity prediction task.

```python
import torch
import numpy as np
from torch.nn import BCEWithLogitsLoss
from torch_frame import stype
from torch_geometric.loader import NeighborLoader
from torch_geometric.nn import SAGEConv
import torch.nn.functional as F

from relbench.datasets import get_dataset
from relbench.tasks import get_task
from relbench.modeling.graph import make_pkey_fkey_graph, get_node_train_table_input
from relbench.modeling.utils import get_stype_proposal

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dataset = get_dataset("rel-f1", download=True)
task = get_task("rel-f1", "driver-top3", download=True)
db = dataset.get_db()

# Build heterogeneous graph
col_to_stype_dict = get_stype_proposal(db)
data, col_stats_dict = make_pkey_fkey_graph(
    db,
    col_to_stype_dict=col_to_stype_dict,
    cache_dir="./cache/f1_materialized"
)

# Create data loaders
loaders = {}
for split in ["train", "val", "test"]:
    table = task.get_table(split)
    table_input = get_node_train_table_input(table=table, task=task)
    loaders[split] = NeighborLoader(
        data,
        num_neighbors=[64, 32],
        time_attr="time",
        input_nodes=table_input.nodes,
        input_time=table_input.time,
        transform=table_input.transform,
        batch_size=256,
        shuffle=(split == "train"),
    )

# Training loop
loss_fn = BCEWithLogitsLoss()
# ... model definition and training code ...

# Evaluation
val_preds = []
for batch in loaders["val"]:
    batch = batch.to(device)
    # pred = model(batch, task.entity_table)
    # val_preds.append(torch.sigmoid(pred).cpu().numpy())

# val_pred = np.concatenate(val_preds)
# metrics = task.evaluate(val_pred, task.get_table("val"))
# print(f"Validation metrics: {metrics}")
```

## Complete Link Prediction Example

End-to-end example for training a recommendation model on a RelBench link prediction task.

```python
import torch
import numpy as np
import torch.nn.functional as F
from torch_geometric.loader import NeighborLoader

from relbench.datasets import get_dataset
from relbench.tasks import get_task
from relbench.modeling.graph import make_pkey_fkey_graph, get_link_train_table_input
from relbench.modeling.loader import LinkNeighborLoader
from relbench.modeling.utils import get_stype_proposal

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dataset = get_dataset("rel-hm", download=True)
task = get_task("rel-hm", "user-item-purchase", download=True)
db = dataset.get_db()

# Build graph
col_to_stype_dict = get_stype_proposal(db)
data, col_stats_dict = make_pkey_fkey_graph(db, col_to_stype_dict)

# Training loader
train_table_input = get_link_train_table_input(task.get_table("train"), task)
train_loader = LinkNeighborLoader(
    data=data,
    num_neighbors=[64, 32],
    time_attr="time",
    src_nodes=train_table_input.src_nodes,
    dst_nodes=train_table_input.dst_nodes,
    num_dst_nodes=train_table_input.num_dst_nodes,
    src_time=train_table_input.src_time,
    share_same_time=True,
    batch_size=512,
)

# Evaluation loaders
val_table = task.get_table("val")
seed_time = int(dataset.val_timestamp.timestamp())
src_indices = torch.from_numpy(val_table.df[task.src_entity_col].values)

src_loader = NeighborLoader(
    data,
    num_neighbors=[64, 32],
    time_attr="time",
    input_nodes=(task.src_entity_table, src_indices),
    input_time=torch.full((len(src_indices),), seed_time, dtype=torch.long),
    batch_size=512,
    shuffle=False,
)

dst_loader = NeighborLoader(
    data,
    num_neighbors=[64, 32],
    time_attr="time",
    input_nodes=task.dst_entity_table,
    input_time=torch.full((task.num_dst_nodes,), seed_time, dtype=torch.long),
    batch_size=512,
    shuffle=False,
)

# Training with BPR loss
for src_batch, pos_dst_batch, neg_dst_batch in train_loader:
    # x_src = model(src_batch, task.src_entity_table)
    # x_pos = model(pos_dst_batch, task.dst_entity_table)
    # x_neg = model(neg_dst_batch, task.dst_entity_table)
    #
    # pos_score = (x_src * x_pos).sum(dim=1)
    # neg_score = (x_src * x_neg).sum(dim=1)
    # loss = F.softplus(neg_score - pos_score).mean()
    break

# Evaluation: compute embeddings and top-k predictions
# dst_embs = [model(batch, task.dst_entity_table) for batch in dst_loader]
# dst_emb = torch.cat(dst_embs, dim=0)
#
# pred_list = []
# for batch in src_loader:
#     src_emb = model(batch, task.src_entity_table)
#     _, pred_idx = torch.topk(src_emb @ dst_emb.t(), k=task.eval_k, dim=1)
#     pred_list.append(pred_idx.cpu())
#
# predictions = torch.cat(pred_list, dim=0).numpy()
# metrics = task.evaluate(predictions, val_table)
# print(f"Metrics: {metrics}")
# Expected output: {'link_prediction_recall': 0.02, 'link_prediction_precision': 0.01,
#                   'link_prediction_map': 0.008, 'link_prediction_ndcg': 0.015}
```

## Custom Dataset Creation

Create your own RelBench-compatible dataset from custom relational data.

```python
import pandas as pd
import numpy as np
from relbench.base import Database, Dataset, Table

class MyCustomDataset(Dataset):
    # Define temporal split points
    val_timestamp = pd.Timestamp("2023-06-01")
    test_timestamp = pd.Timestamp("2023-09-01")

    def __init__(self, cache_dir=None):
        super().__init__(cache_dir=cache_dir)

    def make_db(self) -> Database:
        # Create users table (static entities)
        users_df = pd.DataFrame({
            'user_id': range(1000),
            'age': np.random.randint(18, 65, 1000),
            'country': np.random.choice(['US', 'UK', 'DE', 'FR'], 1000)
        })
        users_table = Table(
            df=users_df,
            fkey_col_to_pkey_table={},
            pkey_col='user_id',
            time_col=None
        )

        # Create products table
        products_df = pd.DataFrame({
            'product_id': range(500),
            'category': np.random.choice(['electronics', 'clothing', 'food'], 500),
            'price': np.random.uniform(10, 500, 500)
        })
        products_table = Table(
            df=products_df,
            fkey_col_to_pkey_table={},
            pkey_col='product_id',
            time_col=None
        )

        # Create purchases table (events with foreign keys)
        n_purchases = 10000
        purchases_df = pd.DataFrame({
            'purchase_id': range(n_purchases),
            'user_id': np.random.randint(0, 1000, n_purchases),
            'product_id': np.random.randint(0, 500, n_purchases),
            'amount': np.random.uniform(1, 100, n_purchases),
            'timestamp': pd.date_range('2023-01-01', periods=n_purchases, freq='H')
        })
        purchases_table = Table(
            df=purchases_df,
            fkey_col_to_pkey_table={'user_id': 'users', 'product_id': 'products'},
            pkey_col='purchase_id',
            time_col='timestamp'
        )

        return Database({
            'users': users_table,
            'products': products_table,
            'purchases': purchases_table
        })

# Use the custom dataset
dataset = MyCustomDataset(cache_dir="./cache/my_dataset")
db = dataset.get_db()
print(f"Tables: {list(db.table_dict.keys())}")
print(f"Users: {len(db.table_dict['users'])}")
print(f"Products: {len(db.table_dict['products'])}")
print(f"Purchases: {len(db.table_dict['purchases'])}")
```

## Summary

RelBench provides a complete ecosystem for developing machine learning models on relational databases. The primary use cases include: (1) benchmarking new relational deep learning algorithms across diverse domains with standardized evaluation, (2) developing production ML pipelines that operate directly on normalized database schemas without manual feature engineering, and (3) researching temporal graph neural networks with built-in support for time-aware sampling and train/val/test splitting that prevents data leakage.

The framework integrates seamlessly with PyTorch Geometric for graph-based modeling and PyTorch Frame for tabular feature processing. Common integration patterns include loading datasets with `get_dataset()`, defining prediction tasks with `get_task()`, constructing heterogeneous graphs with `make_pkey_fkey_graph()`, and using `NeighborLoader` or `LinkNeighborLoader` for mini-batch training. The `task.evaluate()` method ensures consistent metric computation across experiments. For custom applications, users can subclass `Dataset` to create their own benchmarks or extend task classes to define novel prediction problems on their relational data.