Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Add Docs
Datamapplot
https://github.com/tutteinstitute/datamapplot
Admin
Datamapplot is a Python library designed for creating 'data maps' by aggregating data points into a
...
Tokens:
69,251
Snippets:
537
Trust Score:
7.6
Update:
3 months ago
Context
Skills
Chat
Benchmark
86.4
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# DataMapPlot DataMapPlot is a Python library designed to create beautiful, publication-quality visualizations of data maps. It specializes in rendering 2D embeddings from dimensionality reduction techniques like UMAP, t-SNE, PacMAP, and PyMDE with automatic labeling, coloring, and styling. The library handles the complex aesthetic choices automatically while providing extensive customization options. The library offers two primary modes of visualization: static plots rendered via matplotlib for high-quality publication graphics, and interactive HTML plots powered by Deck.GL that support zooming, panning, searching, and hovering. Both modes automatically place text labels, generate color palettes based on cluster positions, and create visually appealing glow effects around clusters. ## Installation ```bash pip install datamapplot ``` or with conda: ```bash conda install -c conda-forge datamapplot ``` ## Core API ### create_plot - Static Data Map Visualization Creates a static matplotlib figure from 2D coordinates with automatic label placement, color palette generation, and optional glow effects. This is the primary function for generating publication-ready data map visualizations that can be saved as PNG, PDF, or SVG files. ```python import datamapplot import numpy as np # Load your data map coordinates (from UMAP, t-SNE, etc.) data_map_coords = np.load("data_map.npz")["arr_0"] labels = np.load("cluster_labels.npz", allow_pickle=True)["arr_0"] # Create a basic data map plot fig, ax = datamapplot.create_plot( data_map_coords, labels, title="My Data Map", sub_title="A visualization of embedded documents", figsize=(12, 12), dpi=150 ) fig.savefig("data_map.png", bbox_inches="tight") # Create a dark mode plot with custom fonts and highlighted labels fig, ax = datamapplot.create_plot( data_map_coords, labels, title="ArXiv ML Landscape", sub_title="Papers from the Machine Learning section of ArXiv", darkmode=True, font_family="Playfair Display SC", label_font_size=8, label_linespacing=1.25, highlight_labels=["Deep Learning", "Neural Networks", "Transformers"], highlight_label_keywords={"fontsize": 11, "fontweight": "bold"}, title_keywords={"fontsize": 28} ) # Create a word-cloud style plot with labels over points fig, ax = datamapplot.create_plot( data_map_coords, labels, label_over_points=True, dynamic_label_size=True, label_wrap_width=10, min_font_size=4, max_font_size=36, min_font_weight=100, max_font_weight=1000, font_family="Roboto Condensed" ) # Use a custom color map for palette generation import colorcet fig, ax = datamapplot.create_plot( data_map_coords, labels, cmap=colorcet.cm.CET_C2, palette_hue_shift=-90, cvd_safer=True # Use colorblind-safe palette ) ``` ### create_interactive_plot - Interactive HTML Data Map Creates an interactive HTML visualization using Deck.GL that supports zooming, panning, searching, hover tooltips, and multi-layer label hierarchies. The result can be displayed in Jupyter notebooks or saved as standalone HTML files. ```python import datamapplot import numpy as np # Load data with multiple label layers (coarse to fine resolution) data_map_coords = np.load("data_map.npz")["arr_0"] hover_text = np.load("hover_text.npz", allow_pickle=True)["arr_0"] # Load hierarchical labels at different resolutions label_layers = [] for i in range(6): label_layers.append( np.load(f"layer{i}_cluster_labels.npz", allow_pickle=True)["arr_0"] ) # Create interactive plot with search and hierarchical labels plot = datamapplot.create_interactive_plot( data_map_coords, label_layers[0], # Fine-grained labels label_layers[2], # Medium resolution label_layers[5], # Coarse labels hover_text=hover_text, title="CORD-19 Data Map", sub_title="Papers relating to COVID-19 and SARS-CoV-2", font_family="Cinzel", enable_search=True, darkmode=True, cluster_boundary_polygons=True, on_click="window.open(`https://google.com/search?q=\"{hover_text}\"`)" ) # Save to HTML file plot.save("interactive_map.html") # Display in Jupyter notebook plot # Automatically renders in notebook # Create plot with custom marker sizes and colors marker_sizes = np.log(1 + citation_counts) plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, marker_size_array=marker_sizes, noise_color="#aaaaaa66", initial_zoom_fraction=0.99, logo="https://example.com/logo.png", logo_width=128 ) # Non-inline data for large datasets (creates separate data files) plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=hover_text, inline_data=False, offline_data_prefix="my_dataset" ) plot.save("large_dataset.html") ``` ### Interactive Plot with Histogram Filtering Adds a histogram widget to interactive plots that allows filtering data points by a numeric or datetime attribute, such as publication date. ```python import datamapplot import numpy as np import pandas as pd # Load data data_map_coords = np.load("data_map.npz")["arr_0"] labels = np.load("labels.npz", allow_pickle=True)["arr_0"] publication_dates = pd.Series(np.load("dates.npz", allow_pickle=True)["arr_0"]) # Create interactive plot with histogram for date filtering plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, histogram_data=publication_dates, histogram_group_datetime_by="month", histogram_n_bins=50, histogram_settings={ "histogram_title": "Publication Date", "histogram_bin_fill_color": "#6290C3", "histogram_bin_selected_fill_color": "#2EBFA5", "histogram_log_scale": True }, enable_search=True ) plot.save("data_map_with_histogram.html") ``` ### Interactive Plot with Topic Tree Navigation Enables a hierarchical topic tree sidebar for navigating between label layers and exploring the data map structure. ```python import datamapplot import numpy as np data_map_coords = np.load("data_map.npz")["arr_0"] label_layers = [np.load(f"layer{i}_labels.npz", allow_pickle=True)["arr_0"] for i in range(5)] # Create plot with topic tree navigation plot = datamapplot.create_interactive_plot( data_map_coords, label_layers[0], label_layers[2], label_layers[4], hover_text=document_titles, enable_topic_tree=True, topic_tree_kwds={ "title": "Topic Hierarchy", "font_size": "12pt", "max_width": "30vw", "max_height": "42vh", "color_bullets": True, "button_on_click": "console.log({hover_text})", # Access selected points "button_icon": "📂" }, darkmode=True ) plot.save("topic_tree_map.html") ``` ### Interactive Plot with Colormaps Adds a colormap selector dropdown allowing users to color points by different metadata fields. ```python import datamapplot import numpy as np data_map_coords = np.load("data_map.npz")["arr_0"] labels = np.load("labels.npz", allow_pickle=True)["arr_0"] # Metadata for coloring citation_counts = np.load("citations.npz")["arr_0"] years = np.load("years.npz")["arr_0"] categories = np.load("categories.npz", allow_pickle=True)["arr_0"] # Simple colormap setup using dict plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, colormaps={ "Citations": citation_counts, "Year": years, "Category": categories } ) # Advanced colormap setup with explicit metadata plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, colormap_rawdata=[citation_counts, years], colormap_metadata=[ {"field": "Citations", "description": "Number of citations", "cmap": "viridis"}, {"field": "Year", "description": "Publication year", "cmap": "plasma"} ], cluster_layer_colormaps=True # Add per-layer cluster coloring options ) ``` ### render_plot - Low-Level Static Plot Rendering Provides fine-grained control over static plot rendering when direct manipulation of label positions, colors, or other visual elements is needed. ```python import datamapplot import numpy as np data_map_coords = np.load("data_map.npz")["arr_0"] color_list = ["#ff0000" if label == "Important" else "#999999" for label in labels] # Compute custom label locations (e.g., medoids instead of centroids) unique_labels = list(set(labels) - {"Unlabelled"}) label_locations = np.array([ data_map_coords[labels == lbl].mean(axis=0) for lbl in unique_labels ]) label_text = unique_labels label_cluster_sizes = np.array([np.sum(labels == lbl) for lbl in unique_labels]) fig, ax = datamapplot.render_plot( data_map_coords, color_list, label_text, label_locations, label_cluster_sizes, title="Custom Rendered Plot", figsize=(14, 14), dpi=200, font_family="Roboto", font_weight=600, label_font_size=10, add_glow=True, glow_keywords={ "kernel": "gaussian", "kernel_bandwidth": 0.3, "approx_patch_size": 64 }, arrowprops={"arrowstyle": "->", "color": "gray", "lw": 0.5}, darkmode=True ) ``` ### render_html - Low-Level Interactive HTML Rendering Provides direct control over HTML rendering when custom point and label dataframes are needed. ```python import datamapplot import pandas as pd import numpy as np # Create point dataframe with required columns point_df = pd.DataFrame({ "x": data_map_coords[:, 0], "y": data_map_coords[:, 1], "r": np.full(len(data_map_coords), 100, dtype=np.uint8), "g": np.full(len(data_map_coords), 150, dtype=np.uint8), "b": np.full(len(data_map_coords), 200, dtype=np.uint8), "a": np.full(len(data_map_coords), 180, dtype=np.uint8), "hover_text": document_titles }) # Create label dataframe label_df = pd.DataFrame({ "x": label_centroids[:, 0], "y": label_centroids[:, 1], "label": label_names, "size": cluster_sizes, "r": np.full(len(label_names), 50, dtype=np.uint8), "g": np.full(len(label_names), 50, dtype=np.uint8), "b": np.full(len(label_names), 50, dtype=np.uint8), "a": np.full(len(label_names), 64, dtype=np.uint8) }) html_str = datamapplot.render_html( point_df, label_df, title="Custom HTML Plot", font_family="Inter", enable_search=True, point_size_scale=0.5, text_min_pixel_size=12, text_max_pixel_size=28 ) # Save directly with open("custom_plot.html", "w") as f: f.write(html_str) ``` ## Selection Handlers Selection handlers enable custom behavior when users select points in interactive plots using lasso selection or other methods. ### DisplaySample - Show Random Sample of Selected Items ```python from datamapplot.selection_handlers import DisplaySample # Display a random sample of selected items in a sidebar plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, selection_handler=DisplaySample( n_samples=100, font_family="Roboto" ) ) ``` ### WordCloud - Generate Word Cloud from Selection ```python from datamapplot.selection_handlers import WordCloud # Generate a word cloud from selected text items plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=document_text, selection_handler=WordCloud( n_words=200, width=400, height=400, font_family="Impact", color_scale="YlGnBu", n_rotations=3, use_idf=True, # Use TF-IDF weighting location="bottom-right" ) ) ``` ### TagSelection - Create and Save Tags for Selected Items ```python from datamapplot.selection_handlers import TagSelection # Allow users to create and save tags for selected items plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, selection_handler=TagSelection( location="top-right", width=308, max_height="80vh" ) ) # Users can create tags, add selections to existing tags, and export to JSON ``` ### CohereSummary - AI-Generated Summary of Selection ```python from datamapplot.selection_handlers import CohereSummary # Generate AI summaries of selected items using Cohere API plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=abstracts, selection_handler=CohereSummary( model="command-r", n_keywords=128, n_samples=64, width=500, location="top-right" ) ) # Users enter their Cohere API key in the resulting HTML page ``` ## Configuration Management DataMapPlot supports persistent configuration through a JSON config file. ```python from datamapplot.config import ConfigManager cfg = ConfigManager() # Set default values cfg["dpi"] = 150 cfg["figsize"] = (14, 14) cfg["cdn_url"] = "unpkg.com" # Save configuration (persists across sessions) cfg.save() # Configuration values are automatically applied to function calls # unless explicitly overridden ``` ## Offline Mode and System Fonts For environments without internet access, DataMapPlot supports offline mode and system fonts. ```python import datamapplot # Use only system-installed fonts (no Google Fonts download) fig, ax = datamapplot.create_plot( data_map_coords, labels, use_system_fonts=True, font_family="DejaVu Sans" # Use a system font ) # Interactive plot with offline mode (embeds all dependencies) plot = datamapplot.create_interactive_plot( data_map_coords, labels, hover_text=titles, offline_mode=True, offline_mode_js_data_file="js_deps.json", offline_mode_font_data_file="font_data.json" ) ``` ## Summary DataMapPlot excels at creating beautiful visualizations of high-dimensional data embeddings with minimal code. Its primary use cases include visualizing document collections, topic models, image embeddings, and any 2D projections from dimensionality reduction. The library automatically handles color palette generation, label placement, and aesthetic styling while providing extensive customization options for publication-quality outputs. For integration, DataMapPlot works seamlessly with common dimensionality reduction libraries like UMAP, t-SNE, PacMAP, and PyMDE. The static plotting mode produces matplotlib figures that can be further customized or saved in any format matplotlib supports. The interactive mode generates standalone HTML files that can be shared, embedded in web pages, or displayed in Jupyter notebooks. Selection handlers extend the interactive mode with custom behaviors like word clouds, AI summaries, and tagging systems for data exploration workflows.