### Install upsetplot using pip Source: https://github.com/jnothman/upsetplot/blob/master/README.rst Installs the upsetplot library using pip. Ensure you have Python and pip configured. ```bash pip install upsetplot ``` -------------------------------- ### Generate and Plot Example Data Source: https://github.com/jnothman/upsetplot/blob/master/doc/index.md Generates sample data using `generate_counts` and then plots it using the `plot` function. Requires matplotlib to display the plot. ```python from upsetplot import generate_counts example = generate_counts() print(example) ``` ```python from upsetplot import plot plot(example) from matplotlib import pyplot pyplot.show() ``` -------------------------------- ### Import upsetplot library Source: https://github.com/jnothman/upsetplot/blob/master/README.rst Imports the upsetplot library in a Python environment after installation. This confirms the installation was successful. ```python import upsetplot ``` -------------------------------- ### Create Data from Memberships Source: https://github.com/jnothman/upsetplot/blob/master/doc/index.md Reconstructs the example data structure from a list of category memberships for each data point and their corresponding counts. This is an alternative to `generate_counts`. ```python from upsetplot import from_memberships example = from_memberships( [[], ['cat2'], ['cat1'], ['cat1', 'cat2'], ['cat0'], ['cat0', 'cat2'], ['cat0', 'cat1'], ['cat0', 'cat1', 'cat2'], ], data=[56, 283, 1279, 5882, 24, 90, 429, 1957] ) print(example) ``` -------------------------------- ### Create UpSet Plot from Memberships Source: https://github.com/jnothman/upsetplot/blob/master/README.rst Reconstructs the example data format for UpSetPlot using category memberships for each data point. This is an alternative to `generate_counts`. ```python from upsetplot import from_memberships example = from_memberships( [[], ['cat2'], ['cat1'], ['cat1', 'cat2'], ['cat0'], ['cat0', 'cat2'], ['cat0', 'cat1'], ['cat0', 'cat1', 'cat2'], ], data=[56, 283, 1279, 5882, 24, 90, 429, 1957] ) print(example) ``` -------------------------------- ### Generate Example Sample Values for UpSetPlot Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Generates a pandas Series with a 3-level MultiIndex, where each row represents an observation with associated floating-point values. Use this when you have raw data points and want to aggregate them. ```python from upsetplot import generate_samples example_values = generate_samples().value example_values ``` -------------------------------- ### Generate Example Counts for UpSetPlot Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Generates a pandas Series with a 3-level MultiIndex representing subset memberships and their counts. Use this as input for plotting when you have aggregate counts. ```python from upsetplot import generate_counts example_counts = generate_counts() example_counts ``` -------------------------------- ### Generate Example Counts for UpSetPlot Source: https://github.com/jnothman/upsetplot/blob/master/README.rst Generates a pandas Series with counts for subset sizes, used as input for plotting. The index represents intersections of named categories. ```python from upsetplot import generate_counts example = generate_counts() print(example) ``` -------------------------------- ### Prepare Data with from_indicators Source: https://context7.com/jnothman/upsetplot/llms.txt Demonstrates creating indicator-based data structures for UpSet plots using column names, callables, or missingness. ```python # Using column names from a DataFrame df = pd.DataFrame({ "value": [5, 4, 6, 4], "cat1": [True, False, True, False], "cat2": [False, True, False, False], "cat3": [True, True, False, False] }) result = from_indicators(["cat1", "cat3"], data=df) # Using a callable to identify indicator columns result = from_indicators(lambda data: data.select_dtypes(bool), data=df) # Using missingness as an indicator (useful for missing data analysis) missing_data = pd.DataFrame({ "val1": [pd.NA, 0.7, pd.NA, 0.9], "val2": ["male", pd.NA, "female", "female"], "val3": [pd.NA, pd.NA, 23000, 78000] }) result = from_indicators(pd.isna, data=missing_data) ``` -------------------------------- ### Importing from Indicators Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb This is a placeholder import statement. Use `from_indicators` when your data is already in a boolean format, where each column represents a set. ```python from upsetplot import from_indicators ``` -------------------------------- ### Generate sample DataFrame Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Creates a sample DataFrame for use with UpSetPlot. ```python from upsetplot import generate_samples example_samples_df = generate_samples() example_samples_df.head() ``` -------------------------------- ### Build Data with Associated Information from Contents Source: https://context7.com/jnothman/upsetplot/llms.txt Builds data from category contents and associates it with additional information, such as metadata for each member. ```python # With associated data info = pd.DataFrame({ 'color': ['red', 'blue', 'green', 'yellow', 'purple'] }, index=['a', 'b', 'c', 'd', 'e']) result = from_contents(contents, data=info) print(result) ``` -------------------------------- ### Create Plots with plot() Source: https://context7.com/jnothman/upsetplot/llms.txt Convenience function for quick visualization with various sorting, filtering, and formatting options. ```python from upsetplot import plot, generate_counts from matplotlib import pyplot as plt example = generate_counts() # Basic plot with default settings plot(example) plt.suptitle("Basic UpSet Plot") plt.show() # With sorting by cardinality instead of degree plot(example, sort_by="cardinality") plt.suptitle("Sorted by cardinality") plt.show() # Show counts and percentages on bars plot(example, show_counts="{:,}", show_percentages=True) plt.suptitle("With counts and percentages") plt.show() # Vertical orientation plot(example, orientation="vertical", show_counts="{:d}") plt.suptitle("Vertical orientation") plt.show() # Custom figure size fig = plt.figure(figsize=(12, 4)) plot(example, fig=fig, element_size=None) plt.show() # Filtering by subset size plot(example, min_subset_size=100, max_subset_size=2000) plt.suptitle("Filtered by size") plt.show() # Filtering by degree (number of intersecting categories) plot(example, min_degree=1, max_degree=2) plt.suptitle("Degree 1 or 2 only") plt.show() ``` -------------------------------- ### Access Specific Subset Data from Samples Source: https://context7.com/jnothman/upsetplot/llms.txt Demonstrates how to access specific subset data from the generated samples DataFrame, filtering by category membership. ```python # Access specific subset data print(samples.loc[(True, True, True)]) # Samples in all three categories ``` -------------------------------- ### Generate Sample Data with UpSetPlot Source: https://context7.com/jnothman/upsetplot/llms.txt Generates artificial samples assigned to set intersections, returning a DataFrame with a 'value' column and boolean index levels. Useful for raw sample data. ```python from upsetplot import generate_samples # Generate sample data with 3 categories samples = generate_samples(seed=0, n_samples=10000, n_categories=3) print(samples.head()) ``` -------------------------------- ### Build Data from Category Contents with UpSetPlot Source: https://context7.com/jnothman/upsetplot/llms.txt Builds data from category listings where keys are category names and values are sets of identifiers. Useful when data is organized as 'which items belong to each category'. ```python from upsetplot import from_contents import pandas as pd # Basic usage: dictionary of category -> member IDs contents = { 'cat1': ['a', 'b', 'c'], 'cat2': ['b', 'd'], 'cat3': ['e'] } data = from_contents(contents) print(data) ``` -------------------------------- ### Load Data with Associated Values from Memberships Source: https://context7.com/jnothman/upsetplot/llms.txt Loads data from category memberships and associates it with provided data values. Useful when each membership has associated numerical data. ```python # With associated data values values = from_memberships( [['A', 'B'], ['B', 'C'], ['A'], ['C']], data=np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) ) print(values) ``` -------------------------------- ### Convert memberships to UpSet format Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Converts a mapping of items to their memberships into a format suitable for UpSetPlot using from_memberships. ```python from upsetplot import from_memberships animal_memberships = { "Cat": "Mammal", "Dog": "Mammal,Domesticated", "Horse": "Mammal,Herbivore,Domesticated", "Sheep": "Mammal,Herbivore,Domesticated", "Pig": "Mammal,Domesticated", "Cattle": "Mammal,Herbivore,Domesticated", "Rhinoceros": "Mammal,Herbivore", "Moose": "Mammal,Herbivore", "Chicken": "Domesticated", "Duck": "Domesticated", } # Turn this into a list of lists: animal_membership_lists = [ categories.split(",") for categories in animal_memberships.values() ] animals = from_memberships(animal_membership_lists) animals ``` -------------------------------- ### Load Data from Category Memberships with UpSetPlot Source: https://context7.com/jnothman/upsetplot/llms.txt Loads data where each sample has a collection of category names indicating its set memberships. The output is suitable for UpSet or plot functions. ```python from upsetplot import from_memberships import numpy as np # Basic usage: list of category memberships data = from_memberships([ ['cat1', 'cat3'], # Item belongs to cat1 and cat3 ['cat2', 'cat3'], # Item belongs to cat2 and cat3 ['cat1'], # Item belongs only to cat1 [] # Item belongs to no categories ]) print(data) ``` -------------------------------- ### Creating Genre Indicator DataFrame Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Convert a DataFrame with comma-separated genres into a boolean indicator DataFrame. Each column represents a genre, and `True` indicates the presence of that genre for a given row. This format is suitable for `from_indicators`. ```python import pandas as pd from upsetplot import from_indicators genre_indicators = pd.DataFrame( [{cat: True for cat in cats} for cats in movies.Genre.str.split(",").values] ).fillna(False) genre_indicators ``` -------------------------------- ### Style Subsets with style_subsets Source: https://context7.com/jnothman/upsetplot/llms.txt Updates visual attributes of subsets based on category presence, size, or degree. ```python from upsetplot import UpSet, generate_counts from matplotlib import pyplot as plt example = generate_counts() # Style by category presence upset = UpSet(example) upset.style_subsets(present=["cat1", "cat2"], facecolor="blue", label="Both cat1 & cat2") upset.plot() plt.suptitle("Highlight subsets with both cat1 and cat2") plt.show() # Style by category absence upset = UpSet(example) upset.style_subsets(present="cat2", absent="cat1", edgecolor="red", linewidth=2) upset.plot() plt.suptitle("Border for subsets with cat2 but not cat1") plt.show() # Style by subset size upset = UpSet(example) upset.style_subsets(min_subset_size=1000, facecolor="lightblue", hatch="xx", label="Large (>1000)") upset.plot() plt.suptitle("Hatched subsets with size > 1000") plt.show() # Style by degree (color gradient) upset = UpSet(example) upset.style_subsets(min_degree=1, facecolor="blue") upset.style_subsets(min_degree=2, facecolor="purple") upset.style_subsets(min_degree=3, facecolor="red") upset.plot() plt.suptitle("Color by intersection degree") plt.show() # Multiple overlapping styles upset = UpSet(example, facecolor="gray") upset.style_subsets(present="cat0", label="Contains cat0", facecolor="blue") upset.style_subsets(present="cat1", label="Contains cat1", hatch="xx", edgecolor="black") upset.style_subsets(present="cat2", label="Contains cat2", edgecolor="red") upset.plot() plt.show() ``` -------------------------------- ### Convert dictionary of contents to UpSet format Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Constructs a DataFrame from a dictionary of category members using from_contents. ```python from upsetplot import from_contents animals = from_contents( {"mammal": mammals, "herbivore": herbivores, "domesticated": domesticated} ) animals ``` -------------------------------- ### Load Data from Boolean Indicators with UpSetPlot Source: https://context7.com/jnothman/upsetplot/llms.txt Loads category membership indicated by a boolean indicator matrix. Supports column names, DataFrames, or callable functions to derive indicators from data. ```python from upsetplot import from_indicators import pandas as pd # Using a dictionary of boolean indicators indicators = { "cat1": [True, False, True, False], "cat2": [False, True, False, False], "cat3": [True, True, False, False] } data = from_indicators(indicators) print(data) ``` -------------------------------- ### Generate Default Counts with UpSetPlot Source: https://context7.com/jnothman/upsetplot/llms.txt Generates default counts for set intersections using 3 categories and 10000 samples. Useful for testing and demonstrations. ```python from upsetplot import generate_counts # Generate default counts with 3 categories and 10000 samples example = generate_counts() print(example) ``` -------------------------------- ### Customize Plots with UpSet Class Source: https://context7.com/jnothman/upsetplot/llms.txt Provides full control over styling, layout, and plot elements for complex visualizations. ```python from upsetplot import UpSet, generate_counts from matplotlib import pyplot as plt example = generate_counts() # Basic UpSet plot upset = UpSet(example) axes = upset.plot() plt.show() # With custom sorting and display options upset = UpSet( example, orientation="horizontal", sort_by="cardinality", sort_categories_by="cardinality", show_counts=True, show_percentages="{:.1%}", element_size=32, intersection_plot_elements=6, totals_plot_elements=3 ) upset.plot() plt.show() # Customize colors upset = UpSet( example, facecolor="darkblue", other_dots_color=0.3, shading_color="lightgray" ) upset.plot() plt.show() # Disable totals plot or intersection bars upset = UpSet(example, totals_plot_elements=0) # No totals upset = UpSet(example, intersection_plot_elements=0) # No intersection bars ``` -------------------------------- ### Apply Matplotlib themes and custom colors Source: https://context7.com/jnothman/upsetplot/llms.txt Adapts UpSet plots to Matplotlib styles and allows manual color configuration for elements like faces, dots, and shading. ```python from upsetplot import plot, generate_counts from matplotlib import pyplot as plt example = generate_counts() # Using matplotlib stylesheets with plt.style.context("dark_background"): plot(example, show_counts=True) plt.suptitle("Dark background theme") plt.show() with plt.style.context("Solarize_Light2"): plot(example) plt.suptitle("Solarize theme") plt.show() # Manual color control on dark background with plt.style.context("dark_background"): plot( example, show_counts=True, facecolor="red", other_dots_color=0.4, # 40% opacity of facecolor shading_color=0.2 # 20% opacity of facecolor ) plt.suptitle("Custom colors on dark background") plt.show() ``` -------------------------------- ### Style Categories with style_categories Source: https://context7.com/jnothman/upsetplot/llms.txt Updates the visual style of category total bars and row shading. ```python from upsetplot import UpSet, generate_counts from matplotlib import pyplot as plt example = generate_counts() # Style category shading upset = UpSet(example) upset.style_categories("cat2", shading_edgecolor="darkgreen", shading_linewidth=1) upset.style_categories("cat1", shading_facecolor="lavender") upset.plot() plt.suptitle("Custom category shading") plt.show() ``` -------------------------------- ### Add stacked bar charts to UpSet Source: https://context7.com/jnothman/upsetplot/llms.txt Visualizes the distribution of a categorical variable within each intersection using stacked bars, supporting custom color mappings. ```python from upsetplot import UpSet from matplotlib import pyplot as plt, cm import pandas as pd # Load Titanic data TITANIC_URL = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(TITANIC_URL) # Create binary indicators for survival and first class df = df.set_index(df.Survived == 1).set_index(df.Pclass == 1, append=True) df.index.names = ["Survived", "FirstClass"] # UpSet with stacked bars by gender upset = UpSet(df, intersection_plot_elements=0) # Disable default bar chart upset.add_stacked_bars( by="Sex", colors=cm.Pastel1, title="Count by gender", elements=10 ) upset.plot() plt.suptitle("Gender distribution for survival/class combinations") plt.show() # With custom color mapping upset = UpSet(df, intersection_plot_elements=0) upset.add_stacked_bars( by="Sex", colors={"male": "steelblue", "female": "coral"}, title="Passengers by sex" ) upset.plot() plt.show() ``` -------------------------------- ### Create DataFrame with Genre Indicators Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Concatenates the original movies DataFrame with genre indicator columns. This prepares the data for creating upset plots based on genre intersections. ```python movies_with_indicators = pd.concat([movies, genre_indicators], axis=1) movies_with_indicators ``` -------------------------------- ### Define categorical lists Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Defines lists of items belonging to specific categories for conversion. ```python mammals = ["Cat", "Dog", "Horse", "Sheep", "Pig", "Cattle", "Rhinoceros", "Moose"] herbivores = ["Horse", "Sheep", "Cattle", "Moose", "Rhinoceros"] domesticated = ["Dog", "Chicken", "Horse", "Sheep", "Pig", "Cattle", "Duck"] (mammals, herbivores, domesticated) ``` -------------------------------- ### Customize Count Generation in UpSetPlot Source: https://context7.com/jnothman/upsetplot/llms.txt Customizes the generation of artificial counts by specifying the random seed, number of samples, and number of categories. ```python # Customize generation parameters counts = generate_counts(seed=42, n_samples=5000, n_categories=4) ``` -------------------------------- ### Data Loading Functions Source: https://context7.com/jnothman/upsetplot/llms.txt Functions for loading data into a format suitable for UpSet plots from various input types. ```APIDOC ## from_memberships ### Description Loads data where each sample has a collection of category names indicating its set memberships. The output is suitable for passing directly to UpSet or plot functions. ### Method `from_memberships` ### Parameters #### Arguments - **memberships** (list of lists or similar) - Required - A list where each inner list contains category names for a sample. - **data** (numpy.ndarray or similar) - Optional - Associated data values for each sample. ### Request Example ```python from upsetplot import from_memberships import numpy as np # Basic usage: list of category memberships data = from_memberships([ ['cat1', 'cat3'], # Item belongs to cat1 and cat3 ['cat2', 'cat3'], # Item belongs to cat2 and cat3 ['cat1'], # Item belongs only to cat1 [] # Item belongs to no categories ]) print(data) # With associated data values values = from_memberships( [['A', 'B'], ['B', 'C'], ['A'], ['C']], data=np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) ) print(values) ``` ### Response #### Success Response - **pandas.Series or pandas.DataFrame** - Data structured for UpSet plots, with a MultiIndex representing set memberships. ``` ```APIDOC ## from_contents ### Description Builds data from category listings where keys are category names and values are sets of identifiers. Useful when your data is organized as "which items belong to each category" rather than "which categories each item belongs to." ### Method `from_contents` ### Parameters #### Arguments - **contents** (dict) - Required - A dictionary where keys are category names and values are lists or sets of member identifiers. - **data** (pandas.DataFrame or similar) - Optional - DataFrame containing associated data for each member identifier. ### Request Example ```python from upsetplot import from_contents import pandas as pd # Basic usage: dictionary of category -> member IDs contents = { 'cat1': ['a', 'b', 'c'], 'cat2': ['b', 'd'], 'cat3': ['e'] } data = from_contents(contents) print(data) # With associated data info = pd.DataFrame({ 'color': ['red', 'blue', 'green', 'yellow', 'purple'] }, index=['a', 'b', 'c', 'd', 'e']) result = from_contents(contents, data=info) print(result) ``` ### Response #### Success Response - **pandas.Series or pandas.DataFrame** - Data structured for UpSet plots, with a MultiIndex representing set memberships and columns for associated data if provided. ``` ```APIDOC ## from_indicators ### Description Loads category membership indicated by a boolean indicator matrix. Supports column names, DataFrames, or callable functions to derive indicators from data. ### Method `from_indicators` ### Parameters #### Arguments - **indicators** (dict, pandas.DataFrame, or callable) - Required - Data representing boolean membership in categories. ### Request Example ```python from upsetplot import from_indicators import pandas as pd # Using a dictionary of boolean indicators indicators = { "cat1": [True, False, True, False], "cat2": [False, True, False, False], "cat3": [True, True, False, False] } data = from_indicators(indicators) print(data) ``` ### Response #### Success Response - **pandas.Series or pandas.DataFrame** - Data structured for UpSet plots, with a MultiIndex representing set memberships. ``` -------------------------------- ### Load CSV data into a Pandas DataFrame Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Loads a CSV file into a Pandas DataFrame for further processing. Ensure the file path is correct. ```python import pandas as pd movies = pd.read_csv( "https://raw.githubusercontent.com/peetck/IMDB-Top1000-Movies/master/IMDB-Movie-Data.csv" ) movies.head() ``` -------------------------------- ### Add categorical plots to UpSet Source: https://context7.com/jnothman/upsetplot/llms.txt Integrates seaborn-style categorical plots like strip, box, or violin plots into UpSet intersections to visualize continuous variable distributions. ```python from upsetplot import UpSet from matplotlib import pyplot as plt import pandas as pd from sklearn.datasets import load_diabetes # Load sample data diabetes = load_diabetes() df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names) df["target"] = diabetes.target # Create boolean indicators for above-median values df["high_bmi"] = df["bmi"] > df["bmi"].median() df["high_bp"] = df["bp"] > df["bp"].median() df["high_s5"] = df["s5"] > df["s5"].median() # Set indicators as index df = df.set_index(["high_bmi", "high_bp", "high_s5"]) # Create UpSet with catplot upset = UpSet(df, subset_size="count", intersection_plot_elements=3) upset.add_catplot(value="target", kind="strip", color="blue") upset.add_catplot(value="bmi", kind="box", color="green") upset.plot() plt.suptitle("Distribution of target and BMI across subsets") plt.show() # Vertical orientation with violin plots upset = UpSet(df, subset_size="count", orientation="vertical") upset.add_catplot(value="target", kind="violin", color="purple") upset.plot() plt.show() ``` -------------------------------- ### Data Generation Functions Source: https://context7.com/jnothman/upsetplot/llms.txt Functions for generating artificial data for UpSet plots, useful for testing and demonstrations. ```APIDOC ## generate_counts ### Description Generates artificial counts corresponding to set intersections, useful for testing and demonstrations. Returns a pandas Series with a MultiIndex where each level is a boolean indicator for category membership. ### Method `generate_counts` ### Parameters #### Keyword Parameters - **seed** (int) - Optional - Seed for random number generation. - **n_samples** (int) - Optional - Number of samples to generate. - **n_categories** (int) - Optional - Number of categories to generate. ### Request Example ```python from upsetplot import generate_counts # Generate default counts with 3 categories and 10000 samples example = generate_counts() print(example) # Customize generation parameters counts = generate_counts(seed=42, n_samples=5000, n_categories=4) ``` ### Response #### Success Response - **pandas.Series** - A Series with a MultiIndex representing set intersections and their counts. ``` ```APIDOC ## generate_samples ### Description Generates artificial samples assigned to set intersections, returning a DataFrame with a 'value' column and boolean index levels. This is useful when you need the raw sample data rather than aggregated counts. ### Method `generate_samples` ### Parameters #### Keyword Parameters - **seed** (int) - Optional - Seed for random number generation. - **n_samples** (int) - Optional - Number of samples to generate. - **n_categories** (int) - Optional - Number of categories to generate. ### Request Example ```python from upsetplot import generate_samples # Generate sample data with 3 categories samples = generate_samples(seed=0, n_samples=10000, n_categories=3) print(samples.head()) # Access specific subset data print(samples.loc[(True, True, True)]) # Samples in all three categories ``` ### Response #### Success Response - **pandas.DataFrame** - A DataFrame with a MultiIndex representing set intersections and a 'value' column for sample data. ``` -------------------------------- ### Plot UpSet Diagram Source: https://github.com/jnothman/upsetplot/blob/master/README.rst Plots an UpSet diagram from a pandas Series containing subset counts. Requires matplotlib to display the plot. ```python from upsetplot import plot from matplotlib import pyplot # Assuming 'example' is a pandas Series generated by generate_counts() or similar # plot(example) # doctest: +SKIP # pyplot.show() # doctest: +SKIP ``` -------------------------------- ### Generate UpSet Plot from Genre Indicators Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Creates an UpSet plot by specifying a list of genre column names directly from a DataFrame that includes these indicators. This is a convenient way to visualize intersections of multiple categories. ```python UpSet( from_indicators( ["Drama", "Action", "Comedy", "Adventure"], data=movies_with_indicators ) ) ``` -------------------------------- ### query Source: https://context7.com/jnothman/upsetplot/llms.txt Transforms, filters, and aggregates categorised data. Returns a QueryResult object containing filtered data, subset sizes, category totals, and the grand total. ```APIDOC ## query ### Description Transforms, filters, and aggregates categorised data. Returns a QueryResult object containing filtered data, subset sizes, category totals, and the grand total. ### Parameters #### Query Parameters - **present** (str) - Optional - Categories that must be present. - **absent** (str) - Optional - Categories that must be absent. - **min_subset_size** (int/str) - Optional - Minimum size constraint. - **max_subset_size** (int/str) - Optional - Maximum size constraint. - **min_degree** (int) - Optional - Minimum number of categories in intersection. - **max_degree** (int) - Optional - Maximum number of categories in intersection. - **max_subset_rank** (int) - Optional - Limit to top N subsets by size. - **sort_by** (str) - Optional - Sorting criteria for subsets. - **sort_categories_by** (str) - Optional - Sorting criteria for categories. - **include_empty_subsets** (bool) - Optional - Whether to include empty subsets. ``` -------------------------------- ### Plot UpSet Plot from Sample Values (Sum Weights) Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Plots an UpSet plot from a pandas Series of sample values, weighting each subset's size by the sum of its corresponding series values. Use `subset_size='sum'` and `show_counts=True` to visualize weighted subset sizes with counts displayed. ```python from upsetplot import UpSet ax_dict = UpSet(example_values, subset_size="sum", show_counts=True).plot() ``` -------------------------------- ### Plot converted contents Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Plots the UpSet diagram for the converted categorical data. ```python from upsetplot import UpSet ax_dict = UpSet(animals, subset_size="count").plot() ``` -------------------------------- ### Customize UpSetPlot Figure and Axes Source: https://context7.com/jnothman/upsetplot/llms.txt Access individual plot components using the dictionary returned by the plot method. This allows for fine-grained control over labels, titles, and layout adjustments. ```python from upsetplot import UpSet, generate_counts from matplotlib import pyplot as plt example = generate_counts() fig = plt.figure(figsize=(12, 6)) upset = UpSet(example, show_counts=True) axes_dict = upset.plot(fig=fig) # Access individual axes # axes_dict contains: 'matrix', 'intersections', 'totals', 'shading' axes_dict['intersections'].set_ylabel("Custom Y Label") axes_dict['matrix'].set_title("Matrix Title") plt.suptitle("Customized UpSet Plot") plt.tight_layout() plt.show() ``` -------------------------------- ### Save Plot to File Source: https://github.com/jnothman/upsetplot/blob/master/doc/index.md Saves the currently generated matplotlib plot to a file in PDF or PNG format. Ensure the directory exists. ```python pyplot.savefig("/path/to/myplot.pdf") ``` ```python pyplot.savefig("/path/to/myplot.png") ``` -------------------------------- ### Style category total bars Source: https://context7.com/jnothman/upsetplot/llms.txt Customizes the appearance of category bars in an UpSet plot using facecolor, hatch, and edgecolor parameters. ```python upset = UpSet(example) upset.style_categories( ["cat2", "cat1"], bar_facecolor="aqua", bar_hatch="xx", bar_edgecolor="black" ) upset.plot() plt.suptitle("Custom category bar styling") plt.show() ``` -------------------------------- ### Plotting Upset Plot from Genre Data Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Use this to plot an upset plot from a DataFrame where one column contains comma-separated genres. Set `min_subset_size` to filter out small intersections and `show_counts` to display the number of elements in each subset. ```python from upsetplot import UpSet UpSet(movies_by_genre, min_subset_size=15, show_counts=True).plot() ``` -------------------------------- ### UpSet.add_stacked_bars Source: https://context7.com/jnothman/upsetplot/llms.txt Adds stacked bar charts showing the distribution of a categorical variable within each intersection. ```APIDOC ## UpSet.add_stacked_bars ### Description Adds stacked bar charts showing the distribution of a categorical variable within each intersection. ### Parameters #### Request Body - **by** (str) - Required - The categorical variable to group by. - **colors** (dict or colormap) - Optional - Color mapping for the categories. - **title** (str) - Optional - Title for the plot element. - **elements** (int) - Optional - Number of elements to display. ``` -------------------------------- ### Plot subset counts from DataFrame Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Plots the number of observations in each unique subset using the count method. ```python from upsetplot import UpSet ax_dict = UpSet(example_samples_df, subset_size="count").plot() ``` -------------------------------- ### Save UpSet Plot to File Source: https://github.com/jnothman/upsetplot/blob/master/README.rst Saves the currently displayed matplotlib plot to a file in various formats like PDF or PNG. Ensure a plot is active before calling this. ```python from matplotlib import pyplot # Assuming a plot has been generated # pyplot.savefig("/path/to/myplot.pdf") # doctest: +SKIP # pyplot.savefig("/path/to/myplot.png") # doctest: +SKIP ``` -------------------------------- ### Create Upset plot from category memberships Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Converts a column of comma-separated categories into a DataFrame suitable for upset plots. This function is useful when category membership is indicated in DataFrame columns. ```python from upsetplot import from_memberships movies_by_genre = from_memberships(movies.Genre.str.split(","), data=movies) movies_by_genre ``` -------------------------------- ### Query categorized data Source: https://context7.com/jnothman/upsetplot/llms.txt Filters and aggregates data based on category presence, subset size, degree, or rank, returning a QueryResult object. ```python from upsetplot import query, generate_samples data = generate_samples(n_samples=1000) # Filter by category presence result = query(data, present="cat1", absent="cat0") print(f"Total matching samples: {result.total}") print(f"Subset sizes:\n{result.subset_sizes}") print(f"Category totals:\n{result.category_totals}") # Filter by size constraints result = query(data, min_subset_size=50, max_subset_size=200) print(result.subset_sizes) # Filter by size as percentage result = query(data, min_subset_size="5%", max_subset_size="50%") # Filter by degree result = query(data, min_degree=2, max_degree=2) # Exactly 2 categories # Limit to top N subsets by size result = query(data, max_subset_rank=5) # Custom sorting result = query(data, sort_by="cardinality", sort_categories_by="-cardinality") # Access individual subset DataFrames result = query(data) for categories, subset_df in result.subsets.items(): print(f"Categories {categories}: {len(subset_df)} samples") # Include empty subsets (all possible combinations) result = query(data, include_empty_subsets=True) ``` -------------------------------- ### Create Upset Plot from Boolean Columns Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Use `pd.select_dtypes(bool)` to filter for boolean columns and then create an Upset plot. Requires the `upsetplot` library and `pandas`. ```python from upsetplot import UpSet, from_indicators import pandas as pd # Assuming movies_with_indicators is a pandas DataFrame with boolean columns # Example DataFrame creation (replace with your actual data) movies_with_indicators = pd.DataFrame({ 'col1': [True, False, True, True, False], 'col2': [False, True, True, False, True], 'col3': [True, True, False, True, True] }) UpSet( from_indicators(lambda df: df.select_dtypes(bool), data=movies_with_indicators), min_subset_size=15, show_counts=True, ) ``` -------------------------------- ### Plot sum of variables from DataFrame Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Plots the sum of a specific variable across unique subsets. ```python from upsetplot import UpSet ax_dict = UpSet(example_samples_df, sum_over="index", subset_size="sum").plot() ``` -------------------------------- ### Plot UpSet Plot from Sample Values (Count Subsets) Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Plots an UpSet plot from a pandas Series of sample values. It counts the number of observations in each unique subset and visualizes these counts. Use `subset_size='count'` when your Series contains individual data points. ```python from upsetplot import UpSet ax_dict = UpSet(example_values, subset_size="count").plot() ``` -------------------------------- ### Generate Upset Plot for Movie Genres Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb This snippet generates an upset plot from a DataFrame containing movie genre data. Ensure the DataFrame is correctly formatted before passing it to the UpSet function. ```python UpSet(movies_by_genre) ``` -------------------------------- ### UpSet.add_catplot Source: https://context7.com/jnothman/upsetplot/llms.txt Adds a seaborn categorical plot (strip, box, violin, etc.) over subsets to show distributions of continuous variables within each intersection. ```APIDOC ## UpSet.add_catplot ### Description Adds a seaborn categorical plot (strip, box, violin, etc.) over subsets to show distributions of continuous variables within each intersection. ### Parameters #### Request Body - **value** (str) - Required - The column name to plot. - **kind** (str) - Required - The type of seaborn plot (e.g., "strip", "box", "violin"). - **color** (str) - Optional - The color of the plot elements. ``` -------------------------------- ### Plot UpSet Plot from Counts Series Source: https://github.com/jnothman/upsetplot/blob/master/doc/formats.ipynb Plots an UpSet plot using a pandas Series that contains pre-calculated subset counts. The Series must have a MultiIndex indicating set memberships. ```python from upsetplot import UpSet ax_dict = UpSet(example_counts).plot() ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.