# TS2Vec

TS2Vec is a universal framework for learning representations of time series data through contrastive learning. Developed as part of the research paper "TS2Vec: Towards Universal Representation of Time Series" (AAAI-22), it enables learning timestamp-level and instance-level representations that can be used for various downstream tasks including classification, forecasting, and anomaly detection.

The framework employs hierarchical contrastive learning with temporal and instance-based contrasts at multiple scales. By using dilated convolutional networks and contextual consistency techniques, TS2Vec produces robust representations that maintain semantic consistency across augmented views of time series data. The learned representations generalize well across different domains and tasks without requiring task-specific architectural modifications.

## TS2Vec Model Initialization

The TS2Vec class is the main interface for creating and training time series representation models. It accepts configuration for input/output dimensions, network depth, device placement, and training hyperparameters.

```python
from ts2vec import TS2Vec
import numpy as np

# Initialize a TS2Vec model for univariate time series
model = TS2Vec(
    input_dims=1,              # Number of input features (1 for univariate)
    output_dims=320,           # Representation dimension
    hidden_dims=64,            # Hidden layer dimension
    depth=10,                  # Number of residual blocks
    device='cuda',             # Device for training ('cuda' or 'cpu' or GPU index)
    lr=0.001,                  # Learning rate
    batch_size=16,             # Training batch size
    max_train_length=3000,     # Maximum sequence length (longer sequences are cropped)
    temporal_unit=0            # Minimum temporal unit for contrastive learning
)

# Initialize for multivariate time series (e.g., 7 features)
multivariate_model = TS2Vec(
    input_dims=7,
    output_dims=320,
    hidden_dims=64,
    depth=10,
    device=0                   # Use GPU 0
)
```

## Training with fit()

The fit method trains the model on time series data using hierarchical contrastive learning. It accepts training data as a 3D numpy array and supports iteration or epoch-based training with optional callbacks for checkpointing.

```python
from ts2vec import TS2Vec
import numpy as np

# Generate sample training data: 100 instances, 200 timestamps, 3 features
train_data = np.random.randn(100, 200, 3).astype(np.float32)

model = TS2Vec(
    input_dims=3,
    output_dims=320,
    device='cuda'
)

# Train for a fixed number of iterations
loss_log = model.fit(
    train_data,
    n_iters=200,        # Train for 200 iterations
    verbose=True        # Print loss after each epoch
)
# Output: Epoch #0: loss=1.234, Epoch #1: loss=0.987, ...

# Alternative: Train for a fixed number of epochs
loss_log = model.fit(
    train_data,
    n_epochs=10,
    verbose=True
)

# Training with callbacks for checkpointing
def checkpoint_callback(model, loss):
    if model.n_epochs % 5 == 0:
        model.save(f'checkpoint_epoch_{model.n_epochs}.pkl')

model_with_callback = TS2Vec(
    input_dims=3,
    output_dims=320,
    device='cuda',
    after_epoch_callback=checkpoint_callback
)
loss_log = model_with_callback.fit(train_data, n_epochs=20, verbose=True)
```

## Encoding with encode()

The encode method computes representations for time series data. It supports multiple encoding modes including timestamp-level, instance-level, multiscale, and sliding window inference for efficient processing of long sequences.

```python
from ts2vec import TS2Vec
import numpy as np

# Sample test data: 50 instances, 150 timestamps, 3 features
test_data = np.random.randn(50, 150, 3).astype(np.float32)

# Assume model is already trained
model = TS2Vec(input_dims=3, output_dims=320, device='cuda')
# model.fit(train_data, n_iters=200)

# Timestamp-level representations (default)
repr_timestamp = model.encode(test_data)
# Shape: (50, 150, 320) - one representation per timestamp

# Instance-level representations (pooled over entire series)
repr_instance = model.encode(test_data, encoding_window='full_series')
# Shape: (50, 320) - one representation per instance

# Multiscale representations
repr_multiscale = model.encode(test_data, encoding_window='multiscale')
# Shape: (50, 150, 320*num_scales) - concatenated multiscale features

# Fixed window pooling
repr_windowed = model.encode(test_data, encoding_window=10)
# Shape: (50, 150, 320) - pooled over windows of size 10

# Sliding inference for long sequences (causal mode for forecasting)
repr_sliding = model.encode(
    test_data,
    causal=True,           # Only use past information
    sliding_length=1,      # Slide by 1 timestamp
    sliding_padding=50,    # Use 50 timestamps of context
    batch_size=256
)
# Shape: (50, 150, 320) - each timestamp uses only [t-50, t] context

# Apply masking during encoding
repr_masked = model.encode(test_data, mask='mask_last')
# Masks the last timestamp (useful for anomaly detection)
```

## Saving and Loading Models

The save and load methods enable persisting trained models to disk and restoring them for later use. Models are saved as PyTorch state dictionaries.

```python
from ts2vec import TS2Vec
import numpy as np

# Train and save a model
train_data = np.random.randn(100, 200, 1).astype(np.float32)

model = TS2Vec(input_dims=1, output_dims=320, device='cuda')
model.fit(train_data, n_iters=200, verbose=True)

# Save the trained model
model.save('trained_model.pkl')

# Load the model for inference
loaded_model = TS2Vec(input_dims=1, output_dims=320, device='cuda')
loaded_model.load('trained_model.pkl')

# Use loaded model for encoding
test_data = np.random.randn(10, 200, 1).astype(np.float32)
representations = loaded_model.encode(test_data, encoding_window='full_series')
print(f"Representations shape: {representations.shape}")  # (10, 320)
```

## Loading UCR Classification Datasets

The load_UCR function loads univariate time series classification datasets from the UCR archive. It handles data loading, label transformation, and optional normalization automatically.

```python
import datautils

# Load the ECG200 dataset
train_data, train_labels, test_data, test_labels = datautils.load_UCR('ECG200')

print(f"Train data shape: {train_data.shape}")   # (100, 96, 1)
print(f"Train labels shape: {train_labels.shape}") # (100,)
print(f"Test data shape: {test_data.shape}")     # (100, 96, 1)
print(f"Unique labels: {set(train_labels)}")     # {0, 1}

# Load another dataset - GunPoint
train_data, train_labels, test_data, test_labels = datautils.load_UCR('GunPoint')
print(f"GunPoint train shape: {train_data.shape}")  # (50, 150, 1)

# Load FordA (larger dataset)
train_data, train_labels, test_data, test_labels = datautils.load_UCR('FordA')
print(f"FordA train shape: {train_data.shape}")  # (3601, 500, 1)
```

## Loading UEA Multivariate Datasets

The load_UEA function loads multivariate time series classification datasets from the UEA archive. It processes ARFF files, applies standard scaling, and transforms labels to integer indices.

```python
import datautils

# Load the BasicMotions multivariate dataset
train_data, train_labels, test_data, test_labels = datautils.load_UEA('BasicMotions')

print(f"Train data shape: {train_data.shape}")   # (40, 100, 6) - 6 channels
print(f"Train labels shape: {train_labels.shape}") # (40,)
print(f"Number of classes: {len(set(train_labels))}")  # 4

# Load ArticularyWordRecognition
train_data, train_labels, test_data, test_labels = datautils.load_UEA('ArticularyWordRecognition')
print(f"ArticularyWordRecognition shape: {train_data.shape}")  # (275, 144, 9)
```

## Loading Forecasting Datasets

The load_forecast_csv and load_forecast_npy functions load time series forecasting datasets with automatic train/validation/test splitting, time feature extraction, and normalization.

```python
import datautils

# Load ETTh1 dataset for multivariate forecasting
data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols = \
    datautils.load_forecast_csv('ETTh1')

print(f"Data shape: {data.shape}")              # (1, 17420, 14) - 7 features + 7 time covariates
print(f"Train slice: {train_slice}")            # slice(None, 8640)
print(f"Valid slice: {valid_slice}")            # slice(8640, 11520)
print(f"Test slice: {test_slice}")              # slice(11520, 14400)
print(f"Prediction lengths: {pred_lens}")       # [24, 48, 168, 336, 720]
print(f"Covariate columns: {n_covariate_cols}") # 7

# Extract training data
train_data = data[:, train_slice]
print(f"Training data shape: {train_data.shape}")  # (1, 8640, 14)

# Load for univariate forecasting
data_univar, _, _, _, _, _, _ = datautils.load_forecast_csv('ETTh1', univar=True)
print(f"Univariate data shape: {data_univar.shape}")  # (1, 17420, 8) - 1 target + 7 covariates

# Load electricity dataset (each variable as separate instance)
data_elec, train_slice, valid_slice, test_slice, scaler, pred_lens, n_cov = \
    datautils.load_forecast_csv('electricity')
print(f"Electricity shape: {data_elec.shape}")  # (321, 26304, 8) - 321 households
```

## Loading Anomaly Detection Datasets

The load_anomaly function loads anomaly detection datasets containing multiple time series with labels and timestamps. The gen_ano_train_data function prepares the data for model training.

```python
import datautils

# Load Yahoo anomaly detection dataset
all_train_data, all_train_labels, all_train_timestamps, \
all_test_data, all_test_labels, all_test_timestamps, delay = \
    datautils.load_anomaly('yahoo')

print(f"Number of series: {len(all_train_data)}")
print(f"Delay threshold: {delay}")

# Examine individual series
for key in list(all_train_data.keys())[:3]:
    print(f"Series {key}: train={len(all_train_data[key])}, test={len(all_test_data[key])}")

# Generate training data for the model
train_data = datautils.gen_ano_train_data(all_train_data)
print(f"Training data shape: {train_data.shape}")  # (num_series, max_length, 1)

# Load KPI dataset
all_train_data, all_train_labels, all_train_timestamps, \
all_test_data, all_test_labels, all_test_timestamps, delay = \
    datautils.load_anomaly('kpi')
```

## Classification Evaluation

The eval_classification function evaluates time series representations on classification tasks using SVM, linear regression, or k-NN classifiers. It computes accuracy and AUPRC metrics.

```python
import datautils
import tasks
from ts2vec import TS2Vec

# Load dataset and train model
train_data, train_labels, test_data, test_labels = datautils.load_UCR('ECG200')

model = TS2Vec(input_dims=1, output_dims=320, device='cuda')
model.fit(train_data, n_iters=200, verbose=True)

# Evaluate with SVM classifier
predictions, eval_results = tasks.eval_classification(
    model,
    train_data,
    train_labels,
    test_data,
    test_labels,
    eval_protocol='svm'
)

print(f"Accuracy: {eval_results['acc']:.4f}")
print(f"AUPRC: {eval_results['auprc']:.4f}")
# Output: Accuracy: 0.8700, AUPRC: 0.9234

# Evaluate with linear classifier
_, eval_results_linear = tasks.eval_classification(
    model, train_data, train_labels, test_data, test_labels,
    eval_protocol='linear'
)
print(f"Linear accuracy: {eval_results_linear['acc']:.4f}")

# Evaluate with k-NN classifier
_, eval_results_knn = tasks.eval_classification(
    model, train_data, train_labels, test_data, test_labels,
    eval_protocol='knn'
)
print(f"KNN accuracy: {eval_results_knn['acc']:.4f}")
```

## Forecasting Evaluation

The eval_forecasting function evaluates time series representations for forecasting tasks. It uses sliding inference with causal encoding and fits ridge regression models for multiple prediction horizons.

```python
import datautils
import tasks
from ts2vec import TS2Vec

# Load ETTh1 forecasting dataset
data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols = \
    datautils.load_forecast_csv('ETTh1')

# Train model on training data
train_data = data[:, train_slice]
model = TS2Vec(
    input_dims=train_data.shape[-1],
    output_dims=320,
    device='cuda',
    max_train_length=3000
)
model.fit(train_data, n_iters=200, verbose=True)

# Evaluate forecasting performance
out_log, eval_results = tasks.eval_forecasting(
    model, data, train_slice, valid_slice, test_slice,
    scaler, pred_lens, n_covariate_cols
)

# Print results for each prediction length
print("Forecasting Results (normalized):")
for pred_len in pred_lens:
    mse = eval_results['ours'][pred_len]['norm']['MSE']
    mae = eval_results['ours'][pred_len]['norm']['MAE']
    print(f"  Horizon {pred_len}: MSE={mse:.4f}, MAE={mae:.4f}")

# Output:
# Forecasting Results (normalized):
#   Horizon 24: MSE=0.0523, MAE=0.1734
#   Horizon 48: MSE=0.0687, MAE=0.1956
#   Horizon 168: MSE=0.1234, MAE=0.2567
#   Horizon 336: MSE=0.1567, MAE=0.2934
#   Horizon 720: MSE=0.2134, MAE=0.3456

print(f"\nInference time: {eval_results['ts2vec_infer_time']:.2f}s")
```

## Anomaly Detection Evaluation

The eval_anomaly_detection and eval_anomaly_detection_coldstart functions evaluate time series representations for detecting anomalies. They use representation difference with and without masking to identify anomalous points.

```python
import datautils
import tasks
from ts2vec import TS2Vec

# Load anomaly detection dataset
all_train_data, all_train_labels, all_train_timestamps, \
all_test_data, all_test_labels, all_test_timestamps, delay = \
    datautils.load_anomaly('yahoo')

# Prepare training data
train_data = datautils.gen_ano_train_data(all_train_data)

# Train model
model = TS2Vec(
    input_dims=1,
    output_dims=320,
    device='cuda'
)
model.fit(train_data, n_iters=200, verbose=True)

# Standard anomaly detection evaluation
predictions, eval_results = tasks.eval_anomaly_detection(
    model,
    all_train_data, all_train_labels, all_train_timestamps,
    all_test_data, all_test_labels, all_test_timestamps,
    delay
)

print("Anomaly Detection Results:")
print(f"  F1 Score: {eval_results['f1']:.4f}")
print(f"  Precision: {eval_results['precision']:.4f}")
print(f"  Recall: {eval_results['recall']:.4f}")
print(f"  Inference time: {eval_results['infer_time']:.2f}s")
# Output:
#   F1 Score: 0.7234
#   Precision: 0.6890
#   Recall: 0.7612
#   Inference time: 12.34s

# Cold-start anomaly detection (no target-specific training data)
train_data_coldstart, _, _, _ = datautils.load_UCR('FordA')
model_coldstart = TS2Vec(input_dims=1, output_dims=320, device='cuda')
model_coldstart.fit(train_data_coldstart, n_iters=200)

predictions_cs, eval_results_cs = tasks.eval_anomaly_detection_coldstart(
    model_coldstart,
    all_train_data, all_train_labels, all_train_timestamps,
    all_test_data, all_test_labels, all_test_timestamps,
    delay
)
print(f"\nCold-start F1: {eval_results_cs['f1']:.4f}")
```

## Command-Line Training Interface

The train.py script provides a command-line interface for training and evaluating TS2Vec models on various datasets. It supports classification, forecasting, and anomaly detection tasks with customizable hyperparameters.

```bash
# Train and evaluate on UCR classification dataset
python train.py ECG200 my_experiment \
    --loader UCR \
    --batch-size 8 \
    --repr-dims 320 \
    --gpu 0 \
    --epochs 100 \
    --eval

# Train on UEA multivariate classification dataset
python train.py BasicMotions experiment_uea \
    --loader UEA \
    --batch-size 16 \
    --repr-dims 320 \
    --gpu 0 \
    --eval

# Train for time series forecasting
python train.py ETTh1 forecast_exp \
    --loader forecast_csv \
    --batch-size 8 \
    --repr-dims 320 \
    --max-train-length 3000 \
    --gpu 0 \
    --eval

# Train for univariate forecasting
python train.py ETTh1 univar_forecast \
    --loader forecast_csv_univar \
    --batch-size 8 \
    --gpu 0 \
    --eval

# Train for anomaly detection
python train.py yahoo anomaly_exp \
    --loader anomaly \
    --batch-size 8 \
    --repr-dims 320 \
    --gpu 0 \
    --eval

# Cold-start anomaly detection (transfer learning)
python train.py yahoo coldstart_exp \
    --loader anomaly_coldstart \
    --batch-size 8 \
    --gpu 0 \
    --eval

# Training with custom hyperparameters and checkpointing
python train.py FordA custom_exp \
    --loader UCR \
    --batch-size 16 \
    --lr 0.0005 \
    --repr-dims 512 \
    --max-train-length 1000 \
    --iters 500 \
    --save-every 100 \
    --seed 42 \
    --gpu 0 \
    --eval

# Handle irregular/missing data
python train.py ECG200 irregular_exp \
    --loader UCR \
    --irregular 0.1 \
    --gpu 0 \
    --eval
```

## TSEncoder Neural Network Architecture

The TSEncoder class implements the encoder network using dilated convolutions for extracting hierarchical features from time series. It supports various masking strategies for contrastive learning and handles missing data gracefully.

```python
from models import TSEncoder
import torch

# Initialize encoder directly (for advanced use cases)
encoder = TSEncoder(
    input_dims=3,           # Number of input features
    output_dims=320,        # Representation dimension
    hidden_dims=64,         # Hidden layer size
    depth=10,               # Number of dilated conv layers
    mask_mode='binomial'    # Default masking strategy
)

# Forward pass with sample data
batch_size, seq_len, features = 32, 100, 3
x = torch.randn(batch_size, seq_len, features)

# During training (applies binomial mask automatically)
encoder.train()
output_train = encoder(x)
print(f"Training output shape: {output_train.shape}")  # (32, 100, 320)

# During inference (no mask applied)
encoder.eval()
output_eval = encoder(x)
print(f"Eval output shape: {output_eval.shape}")  # (32, 100, 320)

# Custom masking modes
output_continuous = encoder(x, mask='continuous')  # Continuous segment masking
output_mask_last = encoder(x, mask='mask_last')    # Mask only last timestamp
output_all_true = encoder(x, mask='all_true')      # No masking

# Handle data with NaN values (missing observations)
x_with_nan = x.clone()
x_with_nan[0, 10:20, :] = float('nan')  # Missing segment
output_nan = encoder(x_with_nan)  # Automatically handles NaN values
```

## Hierarchical Contrastive Loss

The hierarchical_contrastive_loss function implements the core training objective combining instance-level and temporal contrastive losses at multiple scales through max-pooling hierarchies.

```python
from models.losses import hierarchical_contrastive_loss, instance_contrastive_loss, temporal_contrastive_loss
import torch

# Sample encoder outputs for two augmented views
batch_size, seq_len, repr_dim = 16, 64, 320
z1 = torch.randn(batch_size, seq_len, repr_dim, requires_grad=True)
z2 = torch.randn(batch_size, seq_len, repr_dim, requires_grad=True)

# Compute hierarchical contrastive loss (default alpha=0.5)
loss = hierarchical_contrastive_loss(z1, z2)
print(f"Hierarchical loss: {loss.item():.4f}")

# Custom alpha balances instance vs temporal contrast
# alpha=1.0: only instance contrast, alpha=0.0: only temporal contrast
loss_instance_heavy = hierarchical_contrastive_loss(z1, z2, alpha=0.8)
loss_temporal_heavy = hierarchical_contrastive_loss(z1, z2, alpha=0.2)

# Set temporal_unit to skip early levels for long sequences
loss_long_seq = hierarchical_contrastive_loss(z1, z2, temporal_unit=2)

# Individual loss components for analysis
inst_loss = instance_contrastive_loss(z1, z2)
temp_loss = temporal_contrastive_loss(z1, z2)
print(f"Instance loss: {inst_loss.item():.4f}")
print(f"Temporal loss: {temp_loss.item():.4f}")

# Backpropagation
loss.backward()
print(f"Gradient computed: z1.grad.shape = {z1.grad.shape}")
```

## Summary

TS2Vec provides a comprehensive framework for learning universal time series representations through hierarchical contrastive learning. The primary use cases include: (1) time series classification where instance-level representations are extracted and fed to traditional classifiers like SVM or linear models, (2) time series forecasting using causal sliding-window encoding combined with ridge regression for multi-horizon predictions, and (3) anomaly detection by comparing masked and unmasked representations to identify abnormal patterns. The framework handles both univariate and multivariate time series, supports missing data through NaN handling, and scales to long sequences via sliding inference.

Integration with existing workflows is straightforward through the modular API design. Users can leverage pre-built data loaders for popular benchmarks (UCR, UEA, ETT, Yahoo, KPI) or prepare custom datasets as numpy arrays with shape (n_instances, n_timestamps, n_features). The trained encoder produces fixed-dimensional representations that can be directly consumed by downstream models, enabling transfer learning across different time series tasks. For production deployments, models can be saved and loaded using standard PyTorch serialization, and the command-line interface facilitates experimentation with different hyperparameters and evaluation protocols.