Datamapplot (tutteinstitute/datamapplot)

Datamapplot

https://github.com/tutteinstitute/datamapplot
Admin
Datamapplot is a Python library designed for creating 'data maps' by aggregating data points into a...

Tokens:69,251
Snippets:537
Trust Score:7.6
Update:3 months ago
Show doc for...
Context Summary (auto-generated)
Raw
# DataMapPlot

DataMapPlot is a Python library designed to create beautiful, publication-quality visualizations of data maps. It specializes in rendering 2D embeddings from dimensionality reduction techniques like UMAP, t-SNE, PacMAP, and PyMDE with automatic labeling, coloring, and styling. The library handles the complex aesthetic choices automatically while providing extensive customization options.

The library offers two primary modes of visualization: static plots rendered via matplotlib for high-quality publication graphics, and interactive HTML plots powered by Deck.GL that support zooming, panning, searching, and hovering. Both modes automatically place text labels, generate color palettes based on cluster positions, and create visually appealing glow effects around clusters.

## Installation

```bash
pip install datamapplot
```

or with conda:

```bash
conda install -c conda-forge datamapplot
```

## Core API

### create_plot - Static Data Map Visualization

Creates a static matplotlib figure from 2D coordinates with automatic label placement, color palette generation, and optional glow effects. This is the primary function for generating publication-ready data map visualizations that can be saved as PNG, PDF, or SVG files.

```python
import datamapplot
import numpy as np

# Load your data map coordinates (from UMAP, t-SNE, etc.)
data_map_coords = np.load("data_map.npz")["arr_0"]
labels = np.load("cluster_labels.npz", allow_pickle=True)["arr_0"]

# Create a basic data map plot
fig, ax = datamapplot.create_plot(
    data_map_coords,
    labels,
    title="My Data Map",
    sub_title="A visualization of embedded documents",
    figsize=(12, 12),
    dpi=150
)
fig.savefig("data_map.png", bbox_inches="tight")

# Create a dark mode plot with custom fonts and highlighted labels
fig, ax = datamapplot.create_plot(
    data_map_coords,
    labels,
    title="ArXiv ML Landscape",
    sub_title="Papers from the Machine Learning section of ArXiv",
    darkmode=True,
    font_family="Playfair Display SC",
    label_font_size=8,
    label_linespacing=1.25,
    highlight_labels=["Deep Learning", "Neural Networks", "Transformers"],
    highlight_label_keywords={"fontsize": 11, "fontweight": "bold"},
    title_keywords={"fontsize": 28}
)

# Create a word-cloud style plot with labels over points
fig, ax = datamapplot.create_plot(
    data_map_coords,
    labels,
    label_over_points=True,
    dynamic_label_size=True,
    label_wrap_width=10,
    min_font_size=4,
    max_font_size=36,
    min_font_weight=100,
    max_font_weight=1000,
    font_family="Roboto Condensed"
)

# Use a custom color map for palette generation
import colorcet
fig, ax = datamapplot.create_plot(
    data_map_coords,
    labels,
    cmap=colorcet.cm.CET_C2,
    palette_hue_shift=-90,
    cvd_safer=True  # Use colorblind-safe palette
)
```

### create_interactive_plot - Interactive HTML Data Map

Creates an interactive HTML visualization using Deck.GL that supports zooming, panning, searching, hover tooltips, and multi-layer label hierarchies. The result can be displayed in Jupyter notebooks or saved as standalone HTML files.

```python
import datamapplot
import numpy as np

# Load data with multiple label layers (coarse to fine resolution)
data_map_coords = np.load("data_map.npz")["arr_0"]
hover_text = np.load("hover_text.npz", allow_pickle=True)["arr_0"]

# Load hierarchical labels at different resolutions
label_layers = []
for i in range(6):
    label_layers.append(
        np.load(f"layer{i}_cluster_labels.npz", allow_pickle=True)["arr_0"]
    )

# Create interactive plot with search and hierarchical labels
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    label_layers[0],  # Fine-grained labels
    label_layers[2],  # Medium resolution
    label_layers[5],  # Coarse labels
    hover_text=hover_text,
    title="CORD-19 Data Map",
    sub_title="Papers relating to COVID-19 and SARS-CoV-2",
    font_family="Cinzel",
    enable_search=True,
    darkmode=True,
    cluster_boundary_polygons=True,
    on_click="window.open(`https://google.com/search?q=\"{hover_text}\"`)"
)

# Save to HTML file
plot.save("interactive_map.html")

# Display in Jupyter notebook
plot  # Automatically renders in notebook

# Create plot with custom marker sizes and colors
marker_sizes = np.log(1 + citation_counts)
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    marker_size_array=marker_sizes,
    noise_color="#aaaaaa66",
    initial_zoom_fraction=0.99,
    logo="https://example.com/logo.png",
    logo_width=128
)

# Non-inline data for large datasets (creates separate data files)
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=hover_text,
    inline_data=False,
    offline_data_prefix="my_dataset"
)
plot.save("large_dataset.html")
```

### Interactive Plot with Histogram Filtering

Adds a histogram widget to interactive plots that allows filtering data points by a numeric or datetime attribute, such as publication date.

```python
import datamapplot
import numpy as np
import pandas as pd

# Load data
data_map_coords = np.load("data_map.npz")["arr_0"]
labels = np.load("labels.npz", allow_pickle=True)["arr_0"]
publication_dates = pd.Series(np.load("dates.npz", allow_pickle=True)["arr_0"])

# Create interactive plot with histogram for date filtering
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    histogram_data=publication_dates,
    histogram_group_datetime_by="month",
    histogram_n_bins=50,
    histogram_settings={
        "histogram_title": "Publication Date",
        "histogram_bin_fill_color": "#6290C3",
        "histogram_bin_selected_fill_color": "#2EBFA5",
        "histogram_log_scale": True
    },
    enable_search=True
)
plot.save("data_map_with_histogram.html")
```

### Interactive Plot with Topic Tree Navigation

Enables a hierarchical topic tree sidebar for navigating between label layers and exploring the data map structure.

```python
import datamapplot
import numpy as np

data_map_coords = np.load("data_map.npz")["arr_0"]
label_layers = [np.load(f"layer{i}_labels.npz", allow_pickle=True)["arr_0"] for i in range(5)]

# Create plot with topic tree navigation
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    label_layers[0],
    label_layers[2],
    label_layers[4],
    hover_text=document_titles,
    enable_topic_tree=True,
    topic_tree_kwds={
        "title": "Topic Hierarchy",
        "font_size": "12pt",
        "max_width": "30vw",
        "max_height": "42vh",
        "color_bullets": True,
        "button_on_click": "console.log({hover_text})",  # Access selected points
        "button_icon": "&#128194"
    },
    darkmode=True
)
plot.save("topic_tree_map.html")
```

### Interactive Plot with Colormaps

Adds a colormap selector dropdown allowing users to color points by different metadata fields.

```python
import datamapplot
import numpy as np

data_map_coords = np.load("data_map.npz")["arr_0"]
labels = np.load("labels.npz", allow_pickle=True)["arr_0"]

# Metadata for coloring
citation_counts = np.load("citations.npz")["arr_0"]
years = np.load("years.npz")["arr_0"]
categories = np.load("categories.npz", allow_pickle=True)["arr_0"]

# Simple colormap setup using dict
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    colormaps={
        "Citations": citation_counts,
        "Year": years,
        "Category": categories
    }
)

# Advanced colormap setup with explicit metadata
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    colormap_rawdata=[citation_counts, years],
    colormap_metadata=[
        {"field": "Citations", "description": "Number of citations", "cmap": "viridis"},
        {"field": "Year", "description": "Publication year", "cmap": "plasma"}
    ],
    cluster_layer_colormaps=True  # Add per-layer cluster coloring options
)
```

### render_plot - Low-Level Static Plot Rendering

Provides fine-grained control over static plot rendering when direct manipulation of label positions, colors, or other visual elements is needed.

```python
import datamapplot
import numpy as np

data_map_coords = np.load("data_map.npz")["arr_0"]
color_list = ["#ff0000" if label == "Important" else "#999999" for label in labels]

# Compute custom label locations (e.g., medoids instead of centroids)
unique_labels = list(set(labels) - {"Unlabelled"})
label_locations = np.array([
    data_map_coords[labels == lbl].mean(axis=0) for lbl in unique_labels
])
label_text = unique_labels
label_cluster_sizes = np.array([np.sum(labels == lbl) for lbl in unique_labels])

fig, ax = datamapplot.render_plot(
    data_map_coords,
    color_list,
    label_text,
    label_locations,
    label_cluster_sizes,
    title="Custom Rendered Plot",
    figsize=(14, 14),
    dpi=200,
    font_family="Roboto",
    font_weight=600,
    label_font_size=10,
    add_glow=True,
    glow_keywords={
        "kernel": "gaussian",
        "kernel_bandwidth": 0.3,
        "approx_patch_size": 64
    },
    arrowprops={"arrowstyle": "->", "color": "gray", "lw": 0.5},
    darkmode=True
)
```

### render_html - Low-Level Interactive HTML Rendering

Provides direct control over HTML rendering when custom point and label dataframes are needed.

```python
import datamapplot
import pandas as pd
import numpy as np

# Create point dataframe with required columns
point_df = pd.DataFrame({
    "x": data_map_coords[:, 0],
    "y": data_map_coords[:, 1],
    "r": np.full(len(data_map_coords), 100, dtype=np.uint8),
    "g": np.full(len(data_map_coords), 150, dtype=np.uint8),
    "b": np.full(len(data_map_coords), 200, dtype=np.uint8),
    "a": np.full(len(data_map_coords), 180, dtype=np.uint8),
    "hover_text": document_titles
})

# Create label dataframe
label_df = pd.DataFrame({
    "x": label_centroids[:, 0],
    "y": label_centroids[:, 1],
    "label": label_names,
    "size": cluster_sizes,
    "r": np.full(len(label_names), 50, dtype=np.uint8),
    "g": np.full(len(label_names), 50, dtype=np.uint8),
    "b": np.full(len(label_names), 50, dtype=np.uint8),
    "a": np.full(len(label_names), 64, dtype=np.uint8)
})

html_str = datamapplot.render_html(
    point_df,
    label_df,
    title="Custom HTML Plot",
    font_family="Inter",
    enable_search=True,
    point_size_scale=0.5,
    text_min_pixel_size=12,
    text_max_pixel_size=28
)

# Save directly
with open("custom_plot.html", "w") as f:
    f.write(html_str)
```

## Selection Handlers

Selection handlers enable custom behavior when users select points in interactive plots using lasso selection or other methods.

### DisplaySample - Show Random Sample of Selected Items

```python
from datamapplot.selection_handlers import DisplaySample

# Display a random sample of selected items in a sidebar
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    selection_handler=DisplaySample(
        n_samples=100,
        font_family="Roboto"
    )
)
```

### WordCloud - Generate Word Cloud from Selection

```python
from datamapplot.selection_handlers import WordCloud

# Generate a word cloud from selected text items
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=document_text,
    selection_handler=WordCloud(
        n_words=200,
        width=400,
        height=400,
        font_family="Impact",
        color_scale="YlGnBu",
        n_rotations=3,
        use_idf=True,  # Use TF-IDF weighting
        location="bottom-right"
    )
)
```

### TagSelection - Create and Save Tags for Selected Items

```python
from datamapplot.selection_handlers import TagSelection

# Allow users to create and save tags for selected items
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    selection_handler=TagSelection(
        location="top-right",
        width=308,
        max_height="80vh"
    )
)
# Users can create tags, add selections to existing tags, and export to JSON
```

### CohereSummary - AI-Generated Summary of Selection

```python
from datamapplot.selection_handlers import CohereSummary

# Generate AI summaries of selected items using Cohere API
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=abstracts,
    selection_handler=CohereSummary(
        model="command-r",
        n_keywords=128,
        n_samples=64,
        width=500,
        location="top-right"
    )
)
# Users enter their Cohere API key in the resulting HTML page
```

## Configuration Management

DataMapPlot supports persistent configuration through a JSON config file.

```python
from datamapplot.config import ConfigManager

cfg = ConfigManager()

# Set default values
cfg["dpi"] = 150
cfg["figsize"] = (14, 14)
cfg["cdn_url"] = "unpkg.com"

# Save configuration (persists across sessions)
cfg.save()

# Configuration values are automatically applied to function calls
# unless explicitly overridden
```

## Offline Mode and System Fonts

For environments without internet access, DataMapPlot supports offline mode and system fonts.

```python
import datamapplot

# Use only system-installed fonts (no Google Fonts download)
fig, ax = datamapplot.create_plot(
    data_map_coords,
    labels,
    use_system_fonts=True,
    font_family="DejaVu Sans"  # Use a system font
)

# Interactive plot with offline mode (embeds all dependencies)
plot = datamapplot.create_interactive_plot(
    data_map_coords,
    labels,
    hover_text=titles,
    offline_mode=True,
    offline_mode_js_data_file="js_deps.json",
    offline_mode_font_data_file="font_data.json"
)
```

## Summary

DataMapPlot excels at creating beautiful visualizations of high-dimensional data embeddings with minimal code. Its primary use cases include visualizing document collections, topic models, image embeddings, and any 2D projections from dimensionality reduction. The library automatically handles color palette generation, label placement, and aesthetic styling while providing extensive customization options for publication-quality outputs.

For integration, DataMapPlot works seamlessly with common dimensionality reduction libraries like UMAP, t-SNE, PacMAP, and PyMDE. The static plotting mode produces matplotlib figures that can be further customized or saved in any format matplotlib supports. The interactive mode generates standalone HTML files that can be shared, embedded in web pages, or displayed in Jupyter notebooks. Selection handlers extend the interactive mode with custom behaviors like word clouds, AI summaries, and tagging systems for data exploration workflows.