### Run Dash Application Server Manually

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.deployment.rst

This Python code demonstrates how to run a Dash application server manually. It is typically used within a main script to start the development server.

```python
# main.py
if __name__ == "__main__":
    app.run_server(debug=False, port=8050)
```

--------------------------------

### Install topic-wizard Package

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Installs the topic-wizard library from PyPI using pip. This is the first step to using the library for topic model visualization.

```python
%pip install topic-wizard

```

--------------------------------

### Create Dash App with Topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.deployment.rst

This snippet shows how to create a Dash application using the Topicwizard library. It requires a TopicData object as input and initializes the Dash app.

```python
# main.py
import topicwizard

app = topicwizard.get_dash_app(topic_data)
```

--------------------------------

### Install topicwizard with pip

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/index.rst

Installs the topicwizard Python package using pip. This is the primary method for acquiring the library.

```shell
pip install topic-wizard
```

--------------------------------

### Precompute UMAP Projections for Faster Cold Starts (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.deployment.html

This snippet shows how to precompute UMAP projections using `topicwizard.precompute_positions`. This optimization can significantly reduce cold start times for deployed topicwizard applications.

```python
topic_data_w_positions = topicwizard.precompute_positions(topic_data)
```

--------------------------------

### Clone HuggingFace Spaces Repository

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.deployment.rst

This bash command shows how to clone a HuggingFace Spaces repository using Git. This is the first step in deploying a Topicwizard application to HuggingFace Spaces.

```bash
git clone <link_to_space>
```

--------------------------------

### Deploy Topicwizard to HuggingFace Spaces

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.deployment.rst

These bash commands illustrate the process of deploying a Topicwizard application to HuggingFace Spaces after creating a Docker Space. It involves moving the deployment folder contents, staging, committing, and pushing the changes.

```bash
mv deployment/* /path/to/space_repo
cd path/to/space_repo
git add -A
git commit -m "Added deployment"
git push
```

--------------------------------

### Implement Turftopic Model Similar to Top2Vec for TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.compatibility.rst

Provides an example of creating a Turftopic ClusteringTopicModel configured to replicate the behavior of Top2Vec models. This model can be directly used with TopicWizard. Requires installation of turftopic, umap-learn, and scikit-learn.

```bash
pip install turftopic
pip install umap-learn
pip install scikit-learn>=1.3.0
```

```python
from turftopic import ClusteringTopicModel
from sklearn.cluster import HDBSCAN
import umap
import topicwizard

# This has the exact same behaviour as Top2Vec models.
top2vec = ClusteringTopicModel(
    dimensionality_reduction=umap.UMAP(
        n_neighbors=15,
        n_components=5,
        metric="cosine"
    ),
    clustering=HDBSCAN(
        min_cluster_size=15,
        metric="euclidean",
        cluster_selection_method="eom",
    ),
    feature_importance="centroid",
)

topicwizard.visualize(corpus, model=top2vec)
```

--------------------------------

### Create Rule-Based Classification Pipeline with Human-Learn

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

This example shows how to build a rule-based classification pipeline using the human-learn library. It first trains a topic pipeline, then defines a custom rule function to classify documents about 'corona', and finally combines the topic pipeline and classifier into a single scikit-learn pipeline. This is useful for labeling data when labeled examples are scarce.

```python
# Install human-learn from PyPI 
# pip install human-learn

from hulearn.classification import FunctionClassifier
from sklearn.pipeline import make_pipeline

topic_pipeline = make_topic_pipeline(vectorizer, model).fit(texts)

# Investigate topics
topicwizard.visualize(topic_pipeline)

# Creating rule for classifying something as a corona document
def corona_rule(df, threshold=0.5):
    is_about_corona = df["11_vaccine_pandemic_virus_coronavirus"] > threshold
    return is_about_corona.astype(int)
   
# Freezing topic pipeline
topic_pipeline.freeze = True
classifier = FunctionClassifier(corona_rule)
cls_pipeline = make_pipeline(topic_pipeline, classifier)
```

--------------------------------

### Create Turftopic Model Similar to Top2Vec for topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.compatibility.html

This snippet illustrates how to construct a Turftopic `ClusteringTopicModel` that mimics the behavior of Top2Vec models. It requires installing `turftopic`, `umap-learn`, and `scikit-learn`. This model can then be used directly with topicwizard.

```python
from turftopic import ClusteringTopicModel
from sklearn.cluster import HDBSCAN
import umap
import topicwizard

# This has the exact same behaviour as Top2Vec models.
top2vec = ClusteringTopicModel(
    dimensionality_reduction=umap.UMAP(
        n_neighbors=15,
        n_components=5,
        metric="cosine"
    ),
    clustering=HDBSCAN(
        min_cluster_size=15,
        metric="euclidean",
        cluster_selection_method="eom",
    ),
    feature_importance="centroid",
)

corpus = ["Some example text", "More text here"]
topicwizard.visualize(corpus, model=top2vec)
```

--------------------------------

### Load 20newsgroups Corpus

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Loads the 20newsgroups dataset from scikit-learn, commonly used for topic modeling examples. It fetches the data and extracts the corpus content.

```python
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data
```

--------------------------------

### Integrate BERTopic Model with topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.compatibility.html

This example shows how to wrap a BERTopic model using `BERTopicWrapper` from topicwizard for direct use in the web app or for producing `TopicData` objects. The BERTopic model can be fitted automatically if not pre-trained.

```python
from bertopic import BERTopic
from topicwizard.compatibility import BERTopicWrapper
import topicwizard

model = BERTopic(language="english")
wrapped_model = BERTopicWrapper(model)

corpus = ["Some example text", "More text here"]

# Start the web app immediately
topicwizard.visualize(corpus, model=wrapped_model)

# Or produce a TopicData object for persistence or figures.
topic_data = wrapped_model.prepare_topic_data(corpus)
```

--------------------------------

### Get Feature Names from TopicPipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Illustrates how to retrieve feature names (topic names) after fitting a TopicPipeline. This feature is useful for understanding and utilizing the inferred topic names in subsequent steps of a pipeline.

```python
topic_pipeline.fit(texts)
print(topic_pipeline.get_feature_names_out())
```

--------------------------------

### Exclude Pages from Topicwizard Visualization (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/application.html

This example shows how to customize the topicwizard visualization by excluding specific pages, such as 'documents' and 'words'. This is useful for performance optimization or when only specific views are needed, like a PyLDAvis replacement focusing on word importances.

```python
import topicwizard

# Assuming 'texts' and 'pipeline' are already defined
# topicwizard.visualize(texts, model=pipeline, exclude_pages=["documents", "words"])
```

--------------------------------

### Create and Run a Dash App with Topicwizard (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.deployment.html

This snippet shows how to create a Dash application using `topicwizard.get_dash_app` and then run the server manually. It's a foundational step for deploying topicwizard applications.

```python
import topicwizard

app = topicwizard.get_dash_app(topic_data)

if __name__ == "__main__":
    app.run_server(debug=False, port=8050)
```

--------------------------------

### Easy Deployment with Topicwizard (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.deployment.html

This Python code utilizes `topicwizard.easy_deploy` to create a Docker deployment folder. This function simplifies the process of packaging a topicwizard app with a fitted topic model, including a Dockerfile, main.py, and the topic_data.joblib file.

```python
import joblib
import topicwizard

# Load previously produced topic_data object
topic_data = joblib.load("topic_data.joblib")

topicwizard.easy_deploy(topic_data, dest_dir="deployment", port=7860)
```

--------------------------------

### Deploy Dash App with Gunicorn (Bash)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.deployment.html

This command demonstrates how to run a Dash application using Gunicorn, a production-ready WSGI server. This is recommended for deploying topicwizard in a production environment.

```bash
gunicorn main:app.server -b 8050
```

--------------------------------

### Visualize topic models with topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/index.rst

Launches the topicwizard web application to visualize a topic model. It requires the text data and the trained model as input.

```python
import topicwizard

topicwizard.visualize(texts, model=model)
```

--------------------------------

### Prepare TopicData with Contextual Models

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Illustrates preparing a TopicData object directly from contextually sensitive models like those from turftopic. The prepare_topic_data method on the model handles the necessary data transformations for visualization.

```python
from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(10)
topic_data = model.prepare_topic_data(corpus)
```

--------------------------------

### Deploy to HuggingFace Spaces (Bash)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.deployment.html

This series of bash commands outlines the process for deploying a topicwizard application to HuggingFace Spaces using a Docker Space. It involves cloning the space repository, moving the deployment files into it, committing the changes, and pushing them to the remote repository.

```bash
git clone <link_to_space>

mv deployment/* /path/to/space_repo
cd /path/to/space_repo
git add -A
git commit -m "Added deployment"
git push
```

--------------------------------

### Prepare TopicData with a TopicPipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Demonstrates how to prepare a TopicData object using a TopicPipeline, which encapsulates the vectorization and topic modeling steps. This object contains all information needed for TopicWizard visualization.

```python
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import NMF

# Assuming make_topic_pipeline is available or defined elsewhere
# from topicwizard.pipeline import make_topic_pipeline 

# Placeholder for make_topic_pipeline if not directly importable
def make_topic_pipeline(vectorizer, model):
    class MockPipeline:
        def prepare_topic_data(self, corpus):
            print("Mock prepare_topic_data called")
            # Simulate returning a TopicData-like structure
            return {"topics": [], "words": [], "documents": []}
    return MockPipeline()

pipeline = make_topic_pipeline(CountVectorizer(), NMF(10))
topic_data = pipeline.prepare_topic_data(corpus)
```

--------------------------------

### Create Scikit-learn NMF Model

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Initializes a scikit-learn Non-negative Matrix Factorization (NMF) model with 10 components. NMF is a fast algorithm often used for topic modeling.

```python
from sklearn.decomposition import NMF

model = NMF(n_components=10)
```

--------------------------------

### Build contextually sensitive topic model with Turftopic

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/index.rst

Initializes a contextually sensitive topic model using Turftopic's KeyNMF. This approach is suitable for capturing nuanced relationships in text data.

```python
from turftopic import KeyNMF

model = KeyNMF(n_components=10)
```

--------------------------------

### Initialize Topic Models (LDA, NMF, DMM)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Shows how to initialize various topic models compatible with Topicwizard. This includes Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) from scikit-learn, and Discrete Markov Model (DMM) from tweetopic for short texts. These models require a '.components_' attribute for topic-term importance.

```python
# LDA for long texts
from sklearn.decomposition import LatentDirichletAllocation

model = LatentDirichletAllocation(n_components=10)

# You can use NMF too
from sklearn.decomposition import NMF

model = NMF(n_components=10)

# Or tweetopic's DMM for short texts
# pip install tweetopic

from tweetopic import DMM

model = DMM(n_components=10)
```

--------------------------------

### Visualize Topic Model with topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/application.rst

Launches the topicwizard web application to visualize a trained topic model. It takes the corpus ('texts') and the fitted model ('pipeline') as input. The web app provides an interactive overview of the topic model.

```python
import topicwizard

topicwizard.visualize(texts, model=pipeline)
```

--------------------------------

### Visualize Topic Model with TopicWizard Web App

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Launches the topicwizard web application for interactive exploration of a fitted topic model. This function can optionally exclude the 'documents' page for faster loading and can incorporate predefined group labels for richer analysis.

```python
import topicwizard

topicwizard.visualize(corpus, pipeline=pipeline)

```

```python
topicwizard.visualize(corpus, pipeline=pipeline, exclude_pages=["documents"])

```

```python
topicwizard.visualize(corpus, pipeline=pipeline, exclude_pages=["documents"], group_labels=group_labels)

```

--------------------------------

### Initialize AutoEncodingTopicModel

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.compatibility.html

Initializes AutoEncodingTopicModel for zero-shot or combined topic modeling. This model replicates CTM behavior and is part of the Turftopic library.

```python
from turftopic import AutoEncodingTopicModel

zeroshot_tm = AutoEncodingTopicModel(10, combined=False)
combined_tm = AutoEncodingTopicModel(10, combined=True)
```

--------------------------------

### Build scikit-learn compatible topic pipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/index.rst

Constructs a topic modeling pipeline compatible with scikit-learn conventions. It utilizes CountVectorizer for text processing and NMF for topic decomposition.

```python
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from topicwizard.pipeline import make_topic_pipeline

bow_vectorizer = CountVectorizer()
nmf = NMF(n_components=10)
model = make_topic_pipeline(bow_vectorizer, nmf)
```

--------------------------------

### Fit Topic Pipeline and Visualize Model with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Demonstrates fitting a topic pipeline to a corpus and then visualizing the results using topicwizard.visualize. This is a core function for interpreting topic models.

```python
import topicwizard

topicwizard.visualize(corpus, model=topic_pipeline)
```

--------------------------------

### Create a Topicwizard TopicPipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Demonstrates the creation of a TopicPipeline using Topicwizard's `make_topic_pipeline` function. TopicPipelines offer enhanced convenience for downstream tasks and model interpretation compared to standard Scikit-learn pipelines.

```python
from topicwizard.pipeline import make_topic_pipeline

topic_pipeline = make_topic_pipeline(vectorizer, model)
```

--------------------------------

### Fit NMF Topic Model Pipeline with Scikit-learn

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Sets up and fits a scikit-learn pipeline for Nonnegative Matrix Factorization (NMF) topic modeling. The pipeline includes a CountVectorizer and the NMF model, configured with specified parameters for document processing and topic discovery.

```python
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

# Setting up topic modelling pipeline
vectorizer = CountVectorizer(max_df=0.8, min_df=10, stop_words="english")
# NMF topic model with 20 topics
nmf = NMF(n_components=20)
# Build a pipeline from the two components
pipeline = make_pipeline(vectorizer, nmf)

# Fit the pipeline to the data
pipeline.fit(corpus)

```

--------------------------------

### Initialize CountVectorizer for Text Vectorization

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Demonstrates how to initialize a CountVectorizer from scikit-learn, a common component for converting texts into bag-of-words vectors. This is a foundational step in many natural language processing pipelines.

```python
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
```

--------------------------------

### Integrate BERTopic Model with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Demonstrates how to use BERTopic models with TopicWizard by employing the BERTopicWrapper compatibility layer. This enables the visualization of BERTopic's contextual topic models.

```python
from bertopic import BERTopic
from topicwizard.compatibility import BERTopicWrapper

model = BERTopicWrapper(BERTopic(language="english"))
topicwizard.visualize(corpus, model=model)
```

--------------------------------

### Visualize SemanticSignalSeparation Model with topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/application.rst

Demonstrates visualizing a topic model trained with turftopic's SemanticSignalSeparation. It shows two methods: directly passing the model, or preparing TopicData first and then visualizing.

```python
from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(n_components=10)

topicwizard.visualize(texts, model=model)

## OR

topic_data = model.prepare_topic_data(texts)
topicwizard.visualize(topic_data=topic_data)
```

--------------------------------

### Visualize Turftopic Semantic Signal Separation Model - Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.contextual_models.html

Demonstrates how to prepare topic data from a corpus using Turftopic's SemanticSignalSeparation model and then visualize it with topicwizard. Alternatively, the web application can be run directly with the corpus and model.

```python
import topicwizard
from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(n_components=10)

# You can produce the topic data from a corpus before running the app
# This option should be prefered as the data can be saved and the app can be restarted
# Or you can use it for producing individual figures later.
topic_data = model.prepare_topic_data(corpus)
topicwizard.visualize(topic_data=topic_data)

# Or you can run the app directly with the model and a corpus
topicwizard.visualize(corpus, model=model)
```

--------------------------------

### Create a Scikit-learn Pipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Illustrates how to combine a vectorizer and a topic model into a standard Scikit-learn pipeline using `make_pipeline`. This pipeline can include additional transformations.

```python
from sklearn.pipeline import make_pipeline

topic_pipeline = make_pipeline(vectorizer, model)
```

--------------------------------

### Interpret turftopic Semantic Signal Separation Model with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Illustrates how to visualize a contextually sensitive topic model from turftopic using TopicWizard. The SemanticSignalSeparation model is prepared and then passed to topicwizard.visualize.

```python
import topicwizard
from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(n_components=10)
topicwizard.visualize(corpus, model=model)
```

--------------------------------

### Train NMF Topic Model with topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/application.rst

Trains a Non-negative Matrix Factorization (NMF) topic model using scikit-learn's CountVectorizer and NMF, then prepares it for topicwizard visualization. Assumes 'texts' is a pre-defined corpus.

```python
# Training a compatible topic model
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from topicwizard.pipeline import make_topic_pipeline

bow_vectorizer = CountVectorizer()
nmf = NMF(n_components=10)
pipeline = make_topic_pipeline(bow_vectorizer, nmf)
pipeline.fit(texts)
```

--------------------------------

### Visualize Topic Model with Group Labels using topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/application.rst

Trains an NMF topic model on the 20Newsgroups dataset and visualizes it using topicwizard, including custom group labels derived from the dataset's target names. Requires 'numpy' for label mapping.

```python
import topicwizard
from topicwizard.pipeline import make_topic_pipeline
from sklearn.datasets import fetch_20newsgroups
import numpy as np

newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data
# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = np.array(newsgroups.target_names)[newsgroups.target]

# Here we fit a topic model to the corpus
pipeline = make_topic_pipeline(
    CountVectorizer(stop_words="english"),
    NMF(n_components=30),
).fit(corpus)

# Notice that I'm passing the labels as the group_labels argument
topicwizard.visualize(corpus, model=pipeline, group_labels=group_labels)
```

--------------------------------

### Visualize Topic Model with Group Labels (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/application.html

This snippet demonstrates how to use the topicwizard library to visualize a topic model, incorporating optional group labels from a dataset like 20Newsgroups. It involves fetching data, creating a topic pipeline, and then calling the visualize function with corpus, model, and group labels.

```python
import topicwizard
from topicwizard.pipeline import make_topic_pipeline
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import NMF
import numpy as np

newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data
# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = np.array(newsgroups.target_names)[newsgroups.target]

# Here we fit a topic model to the corpus
pipeline = make_topic_pipeline(
    CountVectorizer(stop_words="english"),
    NMF(n_components=30),
).fit(corpus)

# Notice that I'm passing the labels as the group_labels argument
topicwizard.visualize(corpus, model=pipeline, group_labels=group_labels)
```

--------------------------------

### Create Gensim Pipeline for topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.compatibility.html

This snippet demonstrates how to create a scikit-learn compatible pipeline for Gensim models (LSI, LDA, NMF) to be used with topicwizard. It requires a pre-trained Gensim dictionary and topic model object.

```python
from gensim.corpora.dictionary import Dictionary
from gensim.models import LdaModel
import topicwizard
from topicwizard.compatibility import gensim_pipeline

texts: list[list[str]] = [
    ['computer', 'time', 'graph'],
    ['survey', 'response', 'eps'],
    ['human', 'system', 'computer'],
    ...
]
dictionary = Dictionary(texts)
bow_corpus = [dictionary.doc2bow(text) for text in texts]
lda = LdaModel(bow_corpus, num_topics=10)

pipeline = gensim_pipeline(dictionary, model=lda)
# Then you can use the pipeline as usual
corpus = [" ".join(text) for text in texts]
topicwizard.visualize(pipeline=pipeline, corpus=corpus)
```

--------------------------------

### Load 20newsgroups Dataset with Scikit-learn

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Loads the 20newsgroups dataset from scikit-learn, removing headers, footers, and quotes. It also maps integer labels back to their textual names. This data serves as the corpus for topic modeling.

```python
from sklearn.datasets import fetch_20newsgroups
import numpy as np

newsgroups = fetch_20newsgroups(subset="all", remove=("headers", "footers", "quotes"))
corpus = newsgroups.data
# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = np.array(newsgroups.target_names)[newsgroups.target]

```

--------------------------------

### Integrate Gensim LDA Model with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Shows how to use Gensim's LDA models with TopicWizard by wrapping them in a TopicPipeline. This allows visualization of topic distributions derived from Gensim's corpus and dictionary.

```python
from gensim.corpora.dictionary import Dictionary
from gensim.models import LdaModel
from topicwizard.compatibility import gensim_pipeline

texts: list[list[str]] = [
    ['computer', 'time', 'graph'],
    ['survey', 'response', 'eps'],
    ['human', 'system', 'computer'],
    ...
]

dictionary = Dictionary(texts)
bow_corpus = [dictionary.doc2bow(text) for text in texts]
lda = LdaModel(bow_corpus, num_topics=10)

pipeline = gensim_pipeline(dictionary, model=lda)
# Then you can use the pipeline as usual
corpus = [" ".join(text) for text in texts]
topicwizard.visualize(corpus, model=pipeline)
```

--------------------------------

### Load TopicData with joblib - Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.persistence.html

This snippet shows how to deserialize a TopicData object from a joblib file. It uses the topicwizard library for visualization and joblib for loading the data. The input is a 'topic_data.joblib' file, and the output is a TopicData object ready for visualization.

```python
import topicwizard
# We import this only for type checking
from topicwizard.data import TopicData
import joblib

topic_data: TopicData = joblib.load("topic_data.joblib")

topicwizard.visualize(topic_data=topic_data)
```

--------------------------------

### Implement Turftopic AutoEncodingTopicModel for TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.compatibility.rst

Demonstrates how to use Turftopic's AutoEncodingTopicModel, which can replicate the behavior of CTM models, for use with TopicWizard. Supports both zero-shot and combined modes.

```python
from turftopic import AutoEncodingTopicModel
import topicwizard

zeroshot_tm = AutoEncodingTopicModel(10, combined=False)
combined_tm = AutoEncodingTopicModel(10, combined=True)

topicwizard.visualize(corpus, model=zeroshot_tm)
```

--------------------------------

### Convert Existing Pipeline to TopicPipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Shows how to convert an existing Scikit-learn Pipeline into a TopicPipeline using the `TopicPipeline.from_pipeline()` class method. This allows leveraging TopicPipeline's features with previously created pipelines.

```python
from topicwizard.pipeline import TopicPipeline

topic_pipeline = TopicPipeline.from_pipeline(pipeline)
```

--------------------------------

### Theme Toggling JavaScript for topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/search.html

This JavaScript code snippet dynamically sets the theme for the topicwizard documentation based on user preference or system settings. It reads the 'theme' from local storage or defaults to 'auto', and applies it to the document's body dataset. This allows for light, dark, or automatic theme switching.

```javascript
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
```

--------------------------------

### Save TopicData with joblib - Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.persistence.html

This snippet demonstrates how to serialize a TopicData object into a joblib file. It requires the turftopic library for model preparation and joblib for dumping the data. The input is a corpus, and the output is a file named 'topic_data.joblib'.

```python
from turftopic import KeyNMF
import joblib

model = KeyNMF(10)
topic_data = model.prepare_topic_data(corpus)

joblib.dump(topic_data, "topic_data.joblib")
```

--------------------------------

### Visualize BERTopic Model with topicwizard Compatibility Layer - Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.contextual_models.html

Shows how to wrap a BERTopic model using topicwizard's BERTopicWrapper to make it compatible for visualization with the topicwizard web application. The BERTopic model can be fitted or unfitted before wrapping.

```python
from bertopic import BERTopic
from topicwizard.compatibility import BERTopicWrapper

# The model can be fitted or not.
model = BERTopic()
wrapped_model = BERTopicWrapper(model)

topicwizard.visualize(corpus, model=wrapped_model)
```

--------------------------------

### Serialize and Deserialize Topic Data with Joblib

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

This section illustrates how to save (serialize) and load (deserialize) topic data using the joblib library. This is crucial for sharing or persisting topic modeling results across sessions or machines. It highlights the importance of version compatibility between the saving and loading environments.

```python
import joblib
from topicwizard.data import TopicData

# Assuming 'topic_data' is already prepared

# Save the data
joblib.dump(topic_data, "topic_data.joblib")

# Load the data
# (The type annotation is just for type checking, it doesn't do anything)
topic_data: TopicData = joblib.load("topic_data.joblib")
```

--------------------------------

### Visualize Topic Data using topicwizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/topic_data.rst

Demonstrates how to use the TopicData object with topicwizard's visualization utilities, including topic maps and the web application. The TopicData object allows for the reproduction of interpretive visualizations.

```python
import topicwizard
from topicwizard.figures import topic_map

# Usage with figures
topic_map(topic_data)

# Usage with web app
# Beware that topic_data is a keyword argument
topicwizard.visualize(topic_data=topic_data)
```

--------------------------------

### Display Document Topic Timeline (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/figures.rst

Generates a line chart illustrating the topic distribution over time within a single document. It accepts topic data and the document text. Window and step sizes can be adjusted for resolution control. This is useful for tracking topic evolution.

```python
from topicwizard.figures import document_topic_timeline

document_topic_timeline(
    topic_data,
    "New cure against type 2 diabetes in development.",
)
```

--------------------------------

### Generate Group-Topic Barcharts with TopicWizard Figures API

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Generates interactive barchart plots illustrating the relevance of topics to predefined group labels using the topicwizard figures API. This visualization helps in understanding how topics align with external categorical data.

```python
from topicwizard.figures import group_topic_barcharts

group_topic_barcharts(corpus, group_labels, pipeline=pipeline, top_n=5)

```

--------------------------------

### Create Scikit-learn CountVectorizer

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Initializes a scikit-learn CountVectorizer for text processing. It is configured to ignore terms that appear in less than 5 documents or more than 80% of the documents, and to remove English stop words.

```python
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(min_df=5, max_df=0.8, stop_words="english")
```

--------------------------------

### Visualize Topic Data with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

This snippet demonstrates how to prepare topic data using BERTopicWrapper and then visualize it using the topicwizard library. It assumes 'corpus' is a pre-defined variable containing the text data. The 'topic_data' object is central to both visualization and serialization.

```python
from bertopic import BERTopic
from topicwizard.wrappers import BERTopicWrapper

# Assuming 'corpus' is your list of documents
# corpus = ["document 1", "document 2", ...]

model = BERTopicWrapper(BERTopic())
topic_data = model.prepare_topic_data(corpus)

import topicwizard
topicwizard.visualize(topic_data=topic_data)
```

--------------------------------

### Generate Group Word Clouds with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Creates word clouds for each group label, considering only word counts and not relevance. It requires the corpus, group labels, and a pipeline.

```python
from topicwizard.figures import group_wordclouds

group_wordclouds(corpus, group_labels, pipeline=pipeline)
```

--------------------------------

### Visualize Embeddings with Topicwizard (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/application.html

This snippet illustrates how to use topicwizard to visualize embeddings, for instance, generated by LSI. It demonstrates disabling other pages like 'documents' and 'topics' to focus solely on the embedding visualization.

```python
import topicwizard

# Assuming 'texts' and 'pipeline' are already defined
# topicwizard.visualize(texts, model=pipeline, exclude_pages=["documents", "topics"])
```

--------------------------------

### Generate Word Map with TopicWizard Figures API

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Creates an interactive word map visualization using the topicwizard figures API. This plot illustrates the relationships and proximity between different words based on their co-occurrence within topics.

```python
from topicwizard.figures import word_map

word_map(corpus, pipeline=pipeline)

```

--------------------------------

### Implement Custom Contextual Model Interface

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.compatibility.html

Implements the interface for custom contextual topic models in TopicWizard. Models must be able to produce TopicData objects and follow the TopicModel protocol.

```python
from topicwizard.model_interface import TopicModel
from topicwizard.data import TopicData

# TopicModel is only a Protocol, the model inferits no behaviour,
# it just provides static checks
class CustomTopicModel(TopicModel):
   def prepare_topic_data(
       self,
       corpus: list[str],
   ) -> TopicData:
       pass
```

--------------------------------

### Wrap BERTopic Model for TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.compatibility.rst

Shows how to wrap a BERTopic model using BERTopicWrapper to make it compatible with TopicWizard. This allows direct usage in the web app or preparation of a TopicData object for persistence and figures.

```python
from bertopic import BERTopic
from topicwizard.compatibility import BERTopicWrapper
import topicwizard

model = BERTopic(language="english")
wrapped_model = BERTopicWrapper(model)

# Start the web app immediately
topicwizard.visualize(corpus, model=wrapped_model)

# Or produce a TopicData object for persistance or figures.
topic_data = wrapped_model.prepare_topic_data(corpus)
```

--------------------------------

### Visualize Document Topic Timeline with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Displays topic distribution over time within a single document using a line chart. It can also analyze an entire corpus if texts are joined. Users can specify window and step sizes for token analysis.

```python
from topicwizard.figures import document_topic_timeline

document_topic_timeline(
    topic_data,
    "New cure against type 2 diabetes in development."
)
```

--------------------------------

### Generate Word Clouds - topicwizard Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Produces a joint word cloud plot for all topics, visualizing word relevance. The 'alpha' parameter can be used to specify the relevance metric for word sizing.

```python
from topicwizard.figures import topic_wordclouds

topic_wordclouds(topic_data)
```

--------------------------------

### Display Word Map - topicwizard Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Visualizes word relationships in a 2D space, offering an alternative to the interactive app. Words can be labeled based on a Z-value cutoff, and coloring indicates the most relevant topic. UMAP can be used for automatic axis discovery, or specific topics can be defined as axes.

```python
from topicwizard.figures import word_map

word_map(topic_data)
word_map(
  topic_data,
  topic_axes=(
     "9_api_apis_register_automatedsarcasmgenerator",
     "4_study_studying_assessments_exams"
  )
)
```

--------------------------------

### Display Word Barplots - topicwizard Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Creates a joint plot displaying word importances across all topics as a bar chart. Allows customization of relevance metric via the 'alpha' parameter and controls the number of words displayed using 'top_n'.

```python
from topicwizard.figures import topic_barcharts

topic_barcharts(topic_data)
topic_barcharts(topic_data, top_n=5)
```

--------------------------------

### Python Base Topic Model Structure with BaseEstimator

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.compatibility.rst

This Python code defines the basic structure for a custom topic model inheriting from `BaseEstimator`. It mandates the implementation of a `transform` method to process vectorized documents and return topic distributions, and a `components_` property to expose topic-word distributions.

```python
from sklearn.base import BaseEstimator
import numpy as np

# Same thing, BaseEstimator is a good thing to have
class CustomTopicModel(BaseEstimator):

    # All topic models should have a transform method, that takes
    # the vectorized documents and returns a sparse or dense array of
    # topic distributions with shape (n_docs, n_topics)
    def transform(self, X):
        pass

    # All topic models should have a property or attribute named
    # components_, that should be a dense or sparse array of topic-word
    # distributions of shape (n_topics, n_features)
    @property
    def components_(self) -> np.ndarray:
        pass
```

--------------------------------

### Generate Topic Barcharts with TopicWizard Figures API

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/examples/basic_usage.ipynb

Generates interactive barchart plots showing the most important words for each discovered topic using the topicwizard figures API. This plot helps in understanding the thematic content of each topic.

```python
from topicwizard.figures import topic_barcharts

topic_barcharts(corpus, pipeline=pipeline, top_n=5)

```

--------------------------------

### Configure TopicPipeline for Pandas DataFrame Output

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Demonstrates two methods for configuring a TopicPipeline to output pandas DataFrames. This is beneficial for analyzing topic content at the document level, especially when dealing with sparse outputs from vectorizers that pandas cannot directly handle.

```python
# Set a parameter
pipeline = make_topic_pipeline(vectorizer, model, pandas_out=True)

# Or use set_output API
pipeline = make_topic_pipeline(vectorizer, model).set_output(transform="pandas")
```

--------------------------------

### Create Group Topic Barcharts with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Generates a joint plot displaying the topic content of all groups as bar charts. This function requires the corpus, group labels, and optionally a pipeline and the number of top topics to display.

```python
from topicwizard.figures import group_topic_barcharts

group_topic_barcharts(corpus, group_labels, pipeline=pipeline, top_n=5)
```

--------------------------------

### Visualize Word Association Barchart with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Generates a bar chart visualizing the most relevant topics for a given set of words. It takes topic data and a list of words as input. Associations are not selected by default.

```python
from topicwizard.figures import word_association_barchart

word_association_barchart(topic_data, ["supreme", "court"])
```

--------------------------------

### Visualize Document Topics with Plotly

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

This snippet demonstrates how to use Plotly Express to visualize document-topic relationships as a heatmap. It assumes a pre-existing `pipeline` object that has been transformed with a list of texts. The output is a heatmap displayed interactively.

```python
import plotly.express as px

texts = [
   "Coronavirus killed 50000 people today.",
   "Donald Trump's presidential campaing is going very well",
   "Protests against police brutality have been going on all around the US.",
]
topic_df = pipeline.transform(texts)
topic_df.index = texts
px.imshow(topic_df).show()
```

--------------------------------

### Define Custom Topic Model for TopicPipeline

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/usage.compatibility.html

Defines a custom topic model component for TopicWizard's TopicPipeline. It must inherit from BaseEstimator and implement transform and components_ properties.

```python
# Same thing, BaseEstimator is a good thing to have
class CustomTopicModel(BaseEstimator):

    # All topic models should have a transform method, that takes
    # the vectorized documents and returns a sparse or dense array of
    # topic distributions with shape (n_docs, n_topics)
    def transform(self, X):
        pass

    # All topic models should have a property or attribute named
    # components_, that should be a dense or sparse array of topic-word
    # distributions of shape (n_topics, n_features)
    @property
    def components_(self) -> np.ndarray:
        pass
```

--------------------------------

### Visualize word relationships with topicwizard word_map

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/index.rst

Generates an interactive visualization of word relationships within a topic model using the `word_map` function from `topicwizard.figures`. This requires pre-processed topic data.

```python
from topicwizard.figures import word_map

topic_data = topic_pipeline.prepare_topic_data(corpus)

word_map(topic_data)
```

--------------------------------

### Generate Word Map (UMAP Discovery) - Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/figures.rst

Displays a word map where UMAP discovers the axes and projects words into 2D space. This is useful for exploring word distances, relations, and potential clusters. It takes a TopicData object as input.

```python
from topicwizard.figures import word_map

word_map(topic_data)
```

--------------------------------

### Display Topic Map - topicwizard Python

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Generates a semantic map of topics within your topic model. Requires a TopicData object as input. This function visualizes the relationships between topics in a semantic space.

```python
from topicwizard.figures import topic_map

topic_map(topic_data)
```

--------------------------------

### Show Document Topic Distribution with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/figures.html

Displays topic distributions for a given document or list of documents as a bar chart. It requires topic data and the document content as input.

```python
from topicwizard.figures import document_topic_distribution

document_topic_distribution(
    topic_data,
    "New cure against type 2 diabetes in development."
)
```

--------------------------------

### Freeze Topic Pipeline Components for Downstream Training

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/usage.pipelines.rst

Explains how to freeze the vectorizer and topic model components within a TopicPipeline. Freezing prevents these components from being retrained when `fit()` or `partial_fit()` is called on an outer pipeline, which is useful for multi-stage training processes.

```python
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression

topic_pipeline = make_topic_pipeline(vectorizer, model).fit(texts)

# Investigate topics
topicwizard.visualize(topic_pipeline)

# Freezing topic pipeline
topic_pipeline.freeze = True
# Constructing classification pipeline
cls_pipeline = make_pipeline(topic_pipeline, LogisticRegression())
cls_pipeline.fit(X, y)
```

--------------------------------

### Generate Interactive Topic Figures with TopicWizard

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

This code imports various plotting functions from the topicwizard.figures module, allowing for the creation of customizable and interactive plots such as word maps, document-topic timelines, topic wordclouds, and word association bar charts. These figures are generated from a TopicData object.

```python
from topicwizard.figures import (
    word_map,
    document_topic_timeline,
    topic_wordclouds,
    word_association_barchart
)

# Assuming 'topic_data' is loaded or prepared

# Example usage:
word_map(topic_data)
document_topic_timeline(topic_data, "Joe Biden takes over presidential office from Donald Trump.")
topic_wordclouds(topic_data)
word_association_barchart(topic_data, ["supreme", "court"])
```

--------------------------------

### TopicData Class

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/_build/html/topic_data.html

The TopicData type is the main abstraction in topicwizard. It's a dictionary-like object at runtime, providing static type checking for interoperability. It holds all necessary information to reproduce visualizations and inference results.

```APIDOC
## TopicData Class

### Description
Inference data used to produce visualizations in the application and figures. This type is a Python TypedDict, behaving like a dictionary at runtime while offering static type checking.

### Method
N/A (Class definition)

### Endpoint
N/A (Class definition)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```json
{
  "topic_data": {
    "corpus": [
      "This is the first document.",
      "This is the second document."
    ],
    "vocab": ["this", "is", "the", "first", "document", "second"],
    "document_term_matrix": [[1, 1, 1, 1, 1, 0], [1, 1, 1, 0, 1, 1]],
    "document_topic_matrix": [[0.8, 0.2], [0.3, 0.7]],
    "topic_term_matrix": [[0.6, 0.4, 0.0, 0.0, 0.0, 0.0], [0.1, 0.1, 0.2, 0.3, 0.3, 0.0]],
    "document_representation": [[0.1, 0.2], [0.3, 0.4]],
    "topic_names": ["Topic A", "Topic B"]
  }
}
```

### Response
#### Success Response (200)
N/A (Class definition)

#### Response Example
N/A (Class definition)

### Attributes

- **corpus** (`list` of `str`) - The corpus on which inference was run.
- **vocab** (`ndarray` of `shape (n_vocab,)`) - Array of all words in the vocabulary of the topic model.
- **document_term_matrix** (`ndarray` of `shape (n_documents`, `n_vocab)`) - Bag-of-words document representations. Elements are word importances/frequencies for given documents.
- **document_topic_matrix** (`ndarray` of `shape (n_documents`, `n_topics)`) - Topic importances for each document.
- **topic_term_matrix** (`ndarray` of `shape (n_topics`, `n_vocab)`) - Importances of each term for each topic in a matrix.
- **document_representation** (`ndarray` of `shape (n_documents`, `n_dimensions)`) - Embedded representations for documents. Can also be a sparse BoW matrix for classical models.
- **transform** (`(list[str]) -> ndarray`, optional) - Function that transforms documents to document-topic matrices. Can be `None` for transductive models.
- **topic_names** (`list` of `str`) - Names or topic descriptions inferred for topics by the model.
```

--------------------------------

### Visualize Topics with Excluded Pages (Python)

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/docs/application.rst

Demonstrates how to use the topicwizard.visualize function to display topic visualizations while excluding specific pages such as 'documents' and 'words'. This is useful for customizing the output based on the data or analysis method.

```python
topicwizard.visualize(texts, model=pipeline, exclude_pages=["documents", "words"])
```

```python
topicwizard.visualize(texts, model=pipeline, exclude_pages=["documents", "topics"])
```

--------------------------------

### Customize TopicWizard Visualization by Excluding Pages

Source: https://github.com/x-tabdeveloping/topicwizard/blob/main/README.md

Shows how to exclude specific pages (e.g., 'documents') from the TopicWizard visualization to speed up preprocessing, especially for large corpora. This allows for focused visualization of desired components.

```python
# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(corpus, pipeline=topic_pipeline, exclude_pages=["documents"])
```