### Install Turftopic

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/index.md

Installs the turftopic library using pip. This is the first step to using Turftopic in your Python projects.

```bash
pip install turftopic
```

--------------------------------

### Basic KeyNMF Model Usage with Turftopic

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/index.md

Demonstrates how to use the KeyNMF model from Turftopic. It fetches the 20Newsgroups dataset, trains a KeyNMF model, and prints the topics. This example assumes familiarity with scikit-learn.

```python
from turftopic import KeyNMF
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(
    subset="all",
    remove=("headers", "footers", "quotes"),
)
corspus = newsgroups.data
model = KeyNMF(20).fit(corpus)
model.print_topics()
```

--------------------------------

### Install Development Dependencies with Pip

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/CONTRIBUTING.md

Installs all necessary development dependencies for the Turftopic project. This command assumes you have pip installed and are in the project's root directory.

```console
pip install turftopic[dev]
```

--------------------------------

### Install Turftopic and Dependencies

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/ideologies.md

Installs the Turftopic library along with Plotly for visualization and the 'datasets' library for fetching data from Hugging Face Hub. This is a prerequisite for running the tutorial.

```bash
pip install datasets plotly pandas turftopic

```

--------------------------------

### Install Turftopic and Dependencies

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/reviews.md

Installs the turftopic library along with necessary packages like SpaCy, Plotly, and Pandas. It also downloads a small English language model for SpaCy.

```shell
pip install turftopic[spacy] plotly pandas
python -m spacy download en_core_web_sm
```

--------------------------------

### Install Turftopic and Dependencies

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/arxiv_ml.md

Installs the necessary Python libraries for the project, including `datasets`, `plotly`, and `turftopic` with optional dependencies for UMAP-based clustering and datamapplot.

```bash
pip install datasets plotly turftopic[umap-learn, datamapplot]
```

--------------------------------

### Run Full Test Suite with Pytest

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/CONTRIBUTING.md

Executes the entire test suite for the Turftopic project using pytest. Ensure you have pytest installed and are in the project's root directory.

```console
pytest tests/
```

--------------------------------

### Install Turftopic

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Installs the turftopic library from PyPI. Includes optional dependencies for specific functionalities like CTMs (using Pyro) or clustering models (using UMAP).

```bash
pip install turftopic

```

```bash
pip install "turftopic[pyro-ppl]"

```

```bash
pip install "turftopic[umap-learn]"

```

--------------------------------

### Install MTEB and Initialize KeyNMF with MTEB Encoder

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/multimodal.md

Installs the MTEB library and initializes the KeyNMF model using an encoder compatible with the MTEB multimodal encoder interface. This enables topic modeling on multimodal data. Requires installation of MTEB.

```bash
pip install "mteb<2.0.0"
```

```python
from turftopic import KeyNMF
import mteb

encoder = mteb.get_model("kakaobrain/align-base")

multimodal_keynmf = KeyNMF(10, encoder="clip-ViT-B-32")
```

--------------------------------

### Install Turftopic with Datamapplot (Bash)

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_interpretation.md

Installs the turftopic library with the datamapplot extra, enabling interactive cluster visualizations. This command is run in the terminal.

```bash
pip install turftopic[datamapplot]
```

--------------------------------

### Asymmetric Example with e5-large-v2

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/encoders.md

An example demonstrating the setup for an asymmetric encoding scenario using the 'intfloat/e5-large-v2' model with KeyNMF. It configures prompts for 'query' and 'passage' and sets 'query' as the default prompt name.

```python
encoder = SentenceTransformer(
    "intfloat/e5-large-v2",
    prompts={
        "query": "query: "
        "passage": "passage: "
    },
    # Make sure to set default prompt to query!
    default_prompt_name="query",
)
model = KeyNMF(10, encoder=encoder)
```

--------------------------------

### Install Turftopic with Jieba for Chinese Tokenization

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/chinese.md

Installs the Turftopic library with the necessary dependencies for Chinese text processing, specifically the jieba tokenizer.

```bash
pip install turftopic[jieba]
```

--------------------------------

### Install Turftopic with datamapplot and OpenAI support

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Installs the Turftopic library with necessary dependencies for datamapplot visualizations and OpenAI integration for topic naming. This allows for advanced topic analysis and visualization.

```bash
pip install "turftopic[datamapplot, openai]"
```

--------------------------------

### Install turftopic with Topic Wizard

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/religious.md

Installs the turftopic library along with the topic-wizard extra, which might include additional utilities or dependencies for enhanced functionality. This command is executed in a bash environment.

```bash
pip install turftopic[topic-wizard]
```

--------------------------------

### Install Turftopic with UMAP-learn

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/clustering.md

Installs the Turftopic library along with the umap-learn dependency, which is often required for topic modeling functionalities.

```bash
pip install turftopic[umap-learn]
```

--------------------------------

### Install Turftopic with topic-wizard support

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Installs the Turftopic library with the 'topic-wizard' extra, enabling integration with the topicwizard library for interactive topic model visualization. This is a simple way to explore topic models.

```bash
pip install "turftopic[topic-wizard]"
```

--------------------------------

### Install topic-wizard for Interactive Visualization - Bash

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_interpretation.md

Installs the topic-wizard library using pip. This library is used in conjunction with Turftopic for interactive exploration and visualization of topic models.

```bash
pip install topic-wizard
```

--------------------------------

### Install Turftopic with Pyro Support

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/ctm.md

Installs the Turftopic library with the necessary dependencies for Pyro-PSL, which is required for using Autoencoding Topic Models. This is a prerequisite for utilizing these advanced topic modeling features.

```bash
pip install turftopic[pyro-ppl]
```

--------------------------------

### Install turftopic with pyro-ppl support

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb

Installs or upgrades the turftopic Python package to the latest version, including the pyro-ppl backend for probabilistic modeling. This command requires pip and ensures all necessary dependencies are met.

```python
%pip install --upgrade turftopic[pyro-ppl]
```

--------------------------------

### Install Plotly for Dynamic Topic Modeling

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/dynamic.md

This command installs the Plotly library, which is a required dependency for visualizations when working with dynamic topic models in Turftopic. Ensure you have pip installed and configured correctly.

```bash
pip install plotly
```

--------------------------------

### Load and Prepare ML Paper Dataset

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/arxiv_ml.md

Loads a dataset of machine learning papers from HuggingFace Hub using the `datasets` library. It then subsamples the dataset to 10,000 examples for faster processing and extracts the abstracts.

```python
from datasets import load_dataset

ds = load_dataset("CShorten/ML-ArXiv-Papers", split="train")
# Subsampling dataset
ds = ds.train_test_split(seed=42, test_size=10_000)["test"]
abstracts = ds["abstract"]
```

--------------------------------

### Install Keyphrase-Vectorizers

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/keyphrase.md

Installs the Keyphrase-Vectorizers library, which is required for extracting keyphrases from text. This library relies on SpaCy for POS-tagging to identify noun phrases.

```bash
pip install keyphrase-vectorizers

```

--------------------------------

### Visualization with datamapplot

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/clustering.md

Explains how to install datamapplot and use it for interactive cluster visualization within Turftopic.

```APIDOC
## Visualization

You can interactively explore clusters using [datamapplot](https://github.com/TutteInstitute/datamapplot) directly in Turftopic! You will first have to install `datamapplot` for this to work:

```bash
pip install turftopic[datamapplot]
```

```python
from turftopic import ClusteringTopicModel
from turftopic.analyzers import OpenAIAnalyzer

model = ClusteringTopicModel(feature_importance="centroid").fit(corpus)

analyzer = OpenAIAnalyzer("gpt-5-nano")
analysis_res = model.analyze_topics(analyzer)

fig = model.plot_clusters_datamapplot()
fig.save("clusters_visualization.html")
fig
```

_See Figure 1_

!!! info
    If you are not running Turftopic from a Jupyter notebook, make sure to call `fig.show()`. This will open up a new browser tab with the interactive figure.
```

--------------------------------

### Speed Up Models with ONNX Backend

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/encoders.md

Illustrates how to leverage the ONNX backend for significantly faster model inference with sentence transformers, starting from version 3.2.0. This requires installing the 'onnx' or 'onnx-gpu' package and specifying 'backend="onnx"' when initializing the SentenceTransformer.

```bash
pip install sentence-transformers[onnx, onnx-gpu]
```

```python
from turftopic import SemanticSignalSeparation
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

model = SemanticSignalSeparation(10, encoder=encoder)
```

--------------------------------

### Initialize OpenAI Analyzer

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/analyzers.md

Initializes an OpenAIAnalyzer for topic analysis using the OpenAI API. Requires the 'turftopic[openai]' package to be installed and the OPENAI_API_KEY environment variable to be set. The default model used is 'gpt-5-nano'.

```bash
pip install turftopic[openai]
export OPENAI_API_KEY="sk-<your key goes here>"
```

```python
from turftopic.analyzers import OpenAIAnalyzer

analyzer = OpenAIAnalyzer('gpt-5-nano')
```

--------------------------------

### Python Example for File Copying

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

This Python snippet illustrates how to copy a file from one location to another. It utilizes the `shutil` module, a standard library for high-level file operations. This is useful for tasks like creating backups or staging files for further processing.

```python
import shutil
import os

def copy_file(source_path, destination_path):
    """Copies a file from source to destination."""
    try:
        if not os.path.exists(source_path):
            print(f"Error: Source file not found at {source_path}")
            return
            
        shutil.copy2(source_path, destination_path) # copy2 preserves metadata
        print(f"File copied from {source_path} to {destination_path}")
    except Exception as e:
        print(f"An error occurred during file copy: {e}")

# Example usage:
# source = 'path/to/original/file.txt'
# destination = 'path/to/destination/file.txt'
# copy_file(source, destination)

```

--------------------------------

### Run Code Formatting Checks

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/CONTRIBUTING.md

Performs code formatting checks using either Black or Ruff. These commands should be run from the project's root directory to ensure consistency.

```console
black .
## or
ruff .
```

--------------------------------

### Python: Dynamic Online Topic Modeling Setup

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/KeyNMF.md

Sets up the time bins required for dynamic online topic modeling. This involves defining a list of datetime objects that represent the boundaries for the time periods over which the corpus is to be analyzed. The model cannot infer these bins automatically.

```python
from datetime import datetime

# We will bin by years in a period of 2020-2030
bins = [datetime(year=y, month=1, day=1) for y in range(2020, 2030 + 2, 1)]
```

--------------------------------

### Install and Use LemmaCountVectorizer with SpaCy for Lemmatization

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/vectorizers.md

Demonstrates the installation of Turftopic with SpaCy support and a SpaCy model, followed by the usage of 'LemmaCountVectorizer' for lemmatizing words before topic modeling. This method relies on a SpaCy pipeline for accurate lemmatization.

```bash
pip install turftopic[spacy]
python -m spacy download en_core_web_sm
```

```python
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import LemmaCountVectorizer

model = KeyNMF(10, vectorizer=LemmaCountVectorizer("en_core_web_sm"))
model.fit(corpus)
model.print_topics()
```

--------------------------------

### Initialize KeyNMF with SentenceTransformers Encoder

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/multimodal.md

Initializes the KeyNMF model with a specified multimodal encoder from SentenceTransformers. This allows for topic modeling on corpora containing both text and images. Ensure SentenceTransformers is installed.

```python
from turftopic import KeyNMF

multimodal_keynmf = KeyNMF(10, encoder="clip-ViT-B-32")
```

--------------------------------

### KeyNMF with Lemma Extraction

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Trains a KeyNMF model using LemmaCountVectorizer for extracting lemmas as features. This requires spaCy installation and a downloaded model. The output includes topics with their highest-ranking lemmas.

```bash
pip install turftopic[spacy]
python -m spacy download "en_core_web_sm"
```

```python
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import LemmaCountVectorizer

model = KeyNMF(10, vectorizer=LemmaCountVectorizer("en_core_web_sm"))
model.fit(corpus)
model.print_topics()
```

--------------------------------

### Initialize T5 Analyzer

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/analyzers.md

Initializes a T5Analyzer for topic analysis. T5 models are generally less resource-intensive than causal language models but may produce lower-quality results, requiring potential tuning. This example uses the 'google/flan-t5-large' model.

```python
from turftopic import T5Analyzer

model = T5Analyzer("google/flan-t5-large")
```

--------------------------------

### Interactive Cluster Visualization with datamapplot

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/clustering.md

Visualizes topic clusters interactively using datamapplot. Requires the `datamapplot` package to be installed. This snippet first fits a model, analyzes topics using an OpenAI analyzer, and then generates an HTML visualization of the clusters.

```bash
pip install turftopic[datamapplot]
```

```python
from turftopic import ClusteringTopicModel
from turftopic.analyzers import OpenAIAnalyzer

model = ClusteringTopicModel(feature_importance="centroid").fit(corpus)

analyzer = OpenAIAnalyzer("gpt-5-nano")
analysis_res = model.analyze_topics(analyzer)

fig = model.plot_clusters_datamapplot()
fig.save("clusters_visualization.html")
fig
```

--------------------------------

### Print Representative Documents for a Topic

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Shows documents that are most representative of a specific topic. This requires the fitted model, the corpus, and the document-topic matrix. It helps to see real-world examples of content related to a topic.

```python
# Print highest ranking documents for topic 0
model.print_representative_documents(0, corpus, document_topic_matrix)
```

--------------------------------

### Automated Topic Naming with OpenAI

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Assigns human-readable names to topics automatically using an OpenAI language model. Requires `turftopic[openai]` installation and an OpenAI API key. It fits a model and then renames topics.

```python
from turftopic import KeyNMF
from turftopic.analyzers import OpenAIAnalyzer

model = KeyNMF(10).fit(corpus)

namer = OpenAIAnalyzer("gpt-4o-mini")
model.rename_topics(namer)
model.print_topics()
```

--------------------------------

### Python Script for Data Processing

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

This Python script demonstrates a common pattern for processing data, likely involving file I/O and data manipulation. It serves as a foundational example for data-related tasks within the turftopic project. Specific input/output formats and error handling might vary.

```python
import pandas as pd

def process_data(input_file, output_file):
    """Reads data from a CSV, performs some transformations, and saves to a new CSV."""
    try:
        df = pd.read_csv(input_file)
        # Example transformation: Add a new column based on existing ones
        if 'col1' in df.columns and 'col2' in df.columns:
            df['new_col'] = df['col1'] * df['col2']
        else:
            print("Warning: 'col1' or 'col2' not found for transformation.")
        
        df.to_csv(output_file, index=False)
        print(f"Data processed successfully and saved to {output_file}")
    except FileNotFoundError:
        print(f"Error: Input file not found at {input_file}")
    except Exception as e:
        print(f"An error occurred during data processing: {e}")

# Example usage:
# process_data('input_data.csv', 'processed_data.csv')

```

--------------------------------

### Initialize KeyNMF Model and Prepare Topic Data - Python

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_interpretation.md

Initializes a KeyNMF model with a specified number of topics and prepares the topic data from a corpus. This is a prerequisite for most interpretation and visualization functionalities.

```python
from turftopic import KeyNMF

model = KeyNMF(10)
topic_data = model.prepare_topic_data(corpus)
```

--------------------------------

### Set up and Use OpenAI Analyzer for Topic Analysis

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Configures and utilizes the OpenAIAnalyzer for topic analysis, requiring the `openai` package and an API key. This method allows for topic analysis using specified OpenAI models, with an option to enable document summaries. It also includes a sample of how topic results might be presented.

```bash
pip install openai
export OPENAI_API_KEY="sk-<your key goes here>"
```

```python
from turftopic.analyzers import OpenAIAnalyzer

# We enable document summaries for topic analysis
analyzer = OpenAIAnalyzer("gpt-5-nano", use_summaries=True)

analysis_res = model.analyze_topics(analyzer)
model.print_topics()
```

--------------------------------

### Launch topicwizard Web App with Model and Documents (Python)

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_interpretation.md

Launches the topicwizard web app for interactive topic model exploration using a model object and a representative sample of documents. Requires the topicwizard library and the documents and model objects.

```python
import topicwizard

topicwizard.visualize(corpus=documents, model=model)
```

--------------------------------

### Install and Use KeyphraseCountVectorizer for Keyphrase Extraction

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/vectorizers.md

This snippet shows how to install the 'keyphrase-vectorizers' package and use 'KeyphraseCountVectorizer' with 'KeyNMF' for extracting keyphrases from a corpus. It bypasses the need for SpaCy's dependency parser for potentially faster processing.

```bash
pip install keyphrase-vectorizers
```

```python
from keyphrase_vectorizers import KeyphraseCountVectorizer

vectorizer = KeyphraseCountVectorizer()
model = KeyNMF(10, vectorizer=vectorizer).fit(corpus)
```

--------------------------------

### Initialize KeyNMF Topic Model

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Initializes a KeyNMF topic model with a specified number of components and top N words per topic. This is a foundational step for topic modeling in Turftopic.

```python
from turftopic import KeyNMF

model = KeyNMF(n_components=10, top_n=15)
```

--------------------------------

### Install and Use StemmingCountVectorizer with Snowball for Stemming

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/vectorizers.md

This snippet covers the installation of Turftopic with Snowball Stemmer support and its subsequent use with 'StemmingCountVectorizer' for aggressive word stemming. It's an alternative to lemmatization, useful for speed or specific stemming needs, using the Snowball Stemmer library.

```bash
pip install turftopic[snowball]
```

```python
from turftopic import KeyNMF
from turftopic.vectorizers.snowball import StemmingCountVectorizer

model = KeyNMF(10, vectorizer=StemmingCountVectorizer(language="english"))
model.fit(corpus)
model.print_topics()
```

--------------------------------

### Dimensionality Reduction with UMAP in Python

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/clustering.md

Sets up Turftopic's ClusteringTopicModel to utilize UMAP for dimensionality reduction. UMAP is a versatile non-linear technique, often preferred for topic discovery due to its speed and better preservation of global data structures compared to TSNE. Installation via `pip install umap-learn` is required.

```python
from umap import UMAP
from turftopic import ClusteringTopicModel

model = ClusteringTopicModel(dimensionality_reduction=UMAP(n_components=2, metric="cosine"))
```

--------------------------------

### Get Selected Indices

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

Retrieves the set of currently selected data indices from the DataSelectionManager.

```javascript
getSelectedIndices(){return this.dataSelectionManager.getSelectedIndices();}
```

--------------------------------

### Prepare Topic Data for Interpretation with prepare_topic_data()

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Shows how to use the `prepare_topic_data()` method to fit a model (if not already fitted) and prepare data structures essential for model interpretation and visualization. It returns a `TopicData` object containing various attributes like corpus, vocabulary, and topic distributions.

```python
corpus: list[str] = ["this is a a document", "this is yet another document", ...]

topic_data = model.prepare_topic_data(corpus)
# print to see what attributes you can access.
print(topic_data)
```

--------------------------------

### Fit and Print Topics with turftopic KeyNMF Model (Python)

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb

This snippet shows how to initialize and train a KeyNMF model using provided corpus and embeddings. After fitting, it calls the print_topics() method to display the extracted topics. Dependencies include the turftopic library and potentially an encoder like 'trf'.

```python
from turftopic import KeyNMF

model = KeyNMF(20, encoder=trf).fit(corpus,  embeddings=embeddings)
model.print_topics()
```

--------------------------------

### Fitting Multimodal Models

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/multimodal.md

Provides examples of how to use the `fit_multimodal` method with various Turftopic models, including SemanticSignalSeparation, KeyNMF, Clustering Models, GMM, and AutoEncodingTopicModel.

```APIDOC
## Basic Usage: Fitting Multimodal Models

### Description
All multimodal models in Turftopic provide a `fit_multimodal()` or `fit_transform_multimodal()` method to discover topics within multimodal corpora (text and images). After fitting, `plot_multimodal_topics()` can be used for visualization.

### Method
`fit_multimodal(texts: list[str], images: list[PIL.Image.Image])` or `fit_transform_multimodal(texts: list[str], images: list[PIL.Image.Image])`

### Endpoint
N/A (Client-side Python code)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
from turftopic import (
    SemanticSignalSeparation,
    KeyNMF,
    ClusteringTopicModel,
    GMM,
    AutoEncodingTopicModel
)
from PIL import Image

# Sample data (replace with your actual data)
texts = ["text 1", "text 2"]
images = [Image.new('RGB', (60, 30), color = 'red'), Image.new('RGB', (60, 30), color = 'blue')]

# SemanticSignalSeparation
model_sss = SemanticSignalSeparation(n_topics=12, encoder="clip-ViT-B-32")
model_sss.fit_multimodal(texts, images=images)
# model_sss.plot_multimodal_topics()

# KeyNMF
model_knmf = KeyNMF(n_topics=12, encoder="clip-ViT-B-32")
model_knmf.fit_multimodal(texts, images=images)
# model_knmf.plot_multimodal_topics()

# Clustering Models (BERTopic-style)
model_cluster_bt = ClusteringTopicModel(encoder="clip-ViT-B-32", feature_importance="c-tf-idf")
model_cluster_bt.fit_multimodal(texts, images=images)
# model_cluster_bt.plot_multimodal_topics()

# Clustering Models (Top2Vec-style)
model_cluster_t2v = ClusteringTopicModel(encoder="clip-ViT-B-32", feature_importance="centroid")
model_cluster_t2v.fit_multimodal(texts, images=images)
# model_cluster_t2v.plot_multimodal_topics()

# GMM
model_gmm = GMM(n_topics=12, encoder="clip-ViT-B-32")
model_gmm.fit_multimodal(texts, images=images)
# model_gmm.plot_multimodal_topics()

# AutoEncodingTopicModel (CombinedTM)
model_aetm_combined = AutoEncodingTopicModel(n_topics=12, combined=True, encoder="clip-ViT-B-32")
model_aetm_combined.fit_multimodal(texts, images=images)
# model_aetm_combined.plot_multimodal_topics()

# AutoEncodingTopicModel (ZeroShotTM)
model_aetm_zs = AutoEncodingTopicModel(n_topics=12, combined=False, encoder="clip-ViT-B-32")
model_aetm_zs.fit_multimodal(texts, images=images)
# model_aetm_zs.plot_multimodal_topics()
```

### Response
#### Success Response (200)
N/A (This is a client-side method call, not an API endpoint. Success is indicated by the method completing without errors.)

#### Response Example
N/A
```

--------------------------------

### Plotting Topics Over Time with KeyNMF

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/KeyNMF.md

Generates an interactive HTML figure to visualize topic trends over time. Requires the 'plotly' library to be installed. Hovering over terms reveals their importance.

```bash
pip install plotly
```

```python
model.plot_topics_over_time()
```

--------------------------------

### Load 20 Newsgroups Dataset

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/keyphrase.md

Loads a subset of the 20 Newsgroups dataset using scikit-learn. It removes headers, footers, and quotes, and filters for specific categories relevant to the demonstration.

```python
from sklearn.datasets import fetch_20newsgroups

corpus = fetch_20newsgroups(
    subset="all",
    remove=("headers", "footers", "quotes"),
    categories=[
        "comp.os.ms-windows.misc",
        "comp.sys.ibm.pc.hardware",
        "talk.religion.misc",
        "alt.atheism",
    ],
).data

```

--------------------------------

### Fit Dynamic Topic Model with KeyNMF

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/dynamic.md

Demonstrates how to fit a dynamic topic model using the KeyNMF algorithm. This involves initializing the model and then calling the `fit_transform_dynamic` method with a corpus and a list of timestamps. The `bins` parameter controls the number of time slices for analysis.

```python
from datetime import datetime

from turftopic import KeyNMF

corpus: list[str] = []
timestamps: list[datetime] = []

model = KeyNMF(5, top_n=5, random_state=42)
document_topic_matrix = model.fit_transform_dynamic(
    corpus, timestamps=timestamps, bins=10
)
```

--------------------------------

### Get Initial Viewport Size in JavaScript

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/images/cluster_datamapplot.html

Retrieves the initial viewport size from the document's client width and height. This function is used to determine the initial dimensions for the DeckGL map.

```javascript
function getInitialViewportSize() {
  const width = document.documentElement.clientWidth;
  const height = document.documentElement.clientHeight;
  return { viewportWidth: width, viewportHeight: height };
}
```

--------------------------------

### Finetune KeyNMF Models on New Corpus

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/finetuning.md

Enables finetuning of pre-trained KeyNMF models on new, unseen text data using the `partial_fit()` method. The finetuned model can then be saved to disk.

```python
from turftopic import load_model

model = load_model("pretrained_keynmf_model")

print(type(model))
# turftopic.models.keynmf.KeyNMF

new_corpus: list[str] = [...] 
# Finetune the model to the new corpus
model.partial_fit(new_corpus)

model.to_disk("finetuned_model/")
```

--------------------------------

### Finetuning KeyNMF Model with New Data

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/online.md

Shows how to finetune a pre-trained KeyNMF model on a novel corpus. This allows the model's topics to adapt to new data without retraining from scratch. The process involves loading a saved model and then calling `partial_fit` with the new data.

```python
from turftopic import load_model

model = load_model("pretrained_keynmf_model")

new_corpus: list[str] = [...]  # New data
# Finetune the model to the new corpus
model.partial_fit(new_corpus)

model.to_disk("finetuned_model/")
```

--------------------------------

### Configure LLMAnalyzer with Context

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/analyzers.md

Initializes an `LLMAnalyzer` with custom context to guide the analysis process. This allows the LLM to focus on specific domains or tasks, such as analyzing financial documents. Dependencies include `turftopic.analyzers.LLMAnalyzer`.

```python
from turftopic.analyzers import LLMAnalyzer

analyzer = LLMAnalyzer(context="Analyze topical content in financial documents published by the central bank.")
```

--------------------------------

### Map Layer Order Management

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

Defines the order of map layers and provides a function to get a layer's index based on its ID. This helps in managing the z-index of different map components.

```javascript
LAYER_ORDER=['dataPointLayer','boundaryLayer','LabelLayer'];
function getLayerIndex(object){
  return LAYER_ORDER.indexOf(object.id);
}
```

--------------------------------

### Fit and Interpret KeyNMF Model in Python

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/KeyNMF.md

Demonstrates the basic usage of KeyNMF for topic modeling. It initializes the model with a specified number of topics and an encoder, fits it to a corpus, and then prints the discovered topics. Requires the turftopic library.

```python
from turftopic import KeyNMF

model = KeyNMF(10, encoder="paraphrase-MiniLM-L3-v2")
model.fit(corpus)

model.print_topics()
```

--------------------------------

### KeyNMF with Multilingual Tokenization (Arabic)

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Configures and trains a KeyNMF model for Arabic text using TokenCountVectorizer and a multilingual sentence transformer encoder. This setup allows for topic modeling on non-English corpora.

```python
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import TokenCountVectorizer

# CountVectorizer for Arabic
vectorizer = TokenCountVectorizer("ar", min_df=10)

model = KeyNMF(
    n_components=10,
    vectorizer=vectorizer,
    encoder="Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet"
)
model.fit(corpus)
```

--------------------------------

### Launch topicwizard Web App with TopicData (Python)

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_interpretation.md

Launches the topicwizard web app for interactive topic model exploration using a TopicData object. This method requires a pre-existing TopicData object.

```python
topic_data.visualize_topicwizard()
```

--------------------------------

### Accessing TopicData Attributes

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/topic_data.md

Demonstrates that TopicData objects are dict-like and allow attribute access using dot notation. This example shows the equivalence of accessing the shape of the document_term_matrix using both dictionary key access and attribute access.

```python
# They are the same
assert topic_data["document_term_matrix"].shape == topic_data.document_term_matrix.shape
```

--------------------------------

### Use Instruct Models with KeyNMF for Keyword Retrieval

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/encoders.md

Shows how to use instruction-tuned embedding models like Microsoft's E5 with KeyNMF for keyword retrieval. It highlights the importance of using prompts for these models and setting the default prompt name to 'query'. Documents act as queries and words as passages.

```python
from turftopic import KeyNMF
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer(
    "intfloat/multilingual-e5-large-instruct",
    prompts={
        "query": "Instruct: Retrieve relevant keywords from the given document. Query: "
        "passage": "Passage: "
    },
    # Make sure to set default prompt to query!
    default_prompt_name="query",
)
model = KeyNMF(10, encoder=encoder)
```

--------------------------------

### Load and Prepare Chinese Text Data

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/chinese.md

Loads the ThuNews corpus from the Chinese MTEB dataset and subsamples it for faster processing. Requires the `datasets` library.

```python
import itertools
import random

from datasets import load_dataset

# Loads the dataset
ds = load_dataset("C-MTEB/ThuNewsClusteringP2P", split="test")
# Wrangles the dataset from a list of lists to a single list
corpus = list(itertools.chain.from_iterable(ds["sentences"]))
# Subsampling the corpus so that the script runs faster
random.seed(42)
corpus = random.sample(corpus, 10000)
```

--------------------------------

### Viewport and Map Calculation Utilities

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

Contains functions to get the initial viewport size and calculate the appropriate zoom level and center coordinates for the map based on provided bounds. These are essential for initializing the map view.

```javascript
function getInitialViewportSize(){
  const width=document.documentElement.clientWidth;
  const height=document.documentElement.clientHeight;
  return{viewportWidth:width,viewportHeight:height};
}
function calculateZoomLevel(bounds,viewportWidth,viewportHeight,padding=0){
  const lngRange=bounds[1]-bounds[0];
  const latRange=bounds[3]-bounds[2];
  const centerLng=(bounds[0]+bounds[1])/2;
  const centerLat=(bounds[2]+bounds[3])/2;
  const zoomX=Math.log2(360/(lngRange/(viewportWidth/256)));
  const zoomY=Math.log2(180/(latRange/(viewportHeight/256)));
  const zoom=Math.min(zoomX,zoomY)-padding;
  return{zoomLevel:zoom,dataCenter:[centerLng,centerLat]};
}
```

--------------------------------

### Visualize Topics using Turftopic and OpenAI Analyzer

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Demonstrates fitting a ClusteringTopicModel, renaming topics using OpenAIAnalyzer, and visualizing the model with datamapplot. Requires the 'turftopic' library and optionally 'openai'. Outputs an interactive figure.

```python
from turftopic import ClusteringTopicModel
from turftopic.analyzers import OpenAIAnalyzer

model = ClusteringTopicModel(feature_importance="centroid").fit(corpus)

namer = OpenAIAnalyzer("gpt-5-nano")
model.rename_topics(namer)

fig = model.plot_clusters_datamapplot()
fig.show()
```

--------------------------------

### Automated Topic Naming with OpenAIAnalyzer

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_interpretation.md

Analyzes topics using an OpenAI language model to generate meaningful names and descriptions. It requires a fitted topic model and an initialized OpenAIAnalyzer. The results are then printed.

```python
from turftopic import KeyNMF
from turftopic.namers import OpenAIAnalyzer

analyzer = OpenAIAnalyzer("gpt-5-nano")
analysis_res = model.analyze_topics(analyzer)

model.print_topics()
```

--------------------------------

### Use Local LLM Analyzer for Topic Analysis

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Integrates a local LLM analyzer to generate topic names and descriptions. It requires the `turftopic.analyzers.LLMAnalyzer` and enables document summaries for richer analysis. The output provides topic names derived from the model's analysis.

```python
from turftopic.analyzers import LLMAnalyzer

# We enable document summaries for topic analysis
analyzer = LLMAnalyzer(use_summaries=True)

analysis_res = model.analyze_topics(analyzer)
print(analysis_res.topic_names)
```

--------------------------------

### Load 20 Newsgroups Dataset using Scikit-learn

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb

This snippet demonstrates how to load the 20 Newsgroups dataset, a collection of approximately 20,000 newsgroup documents, suitable for topic modeling and text classification tasks. It utilizes the `fetch_20newsgroups` function from Scikit-learn to retrieve the data.

```python
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data
```

--------------------------------

### Precomputing Embeddings for Large Corpora

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/online.md

An example script to precompute sentence embeddings for a large corpus using `SentenceTransformers`. These embeddings can then be used with `partial_fit` to speed up the KeyNMF model training process, especially when dealing with very large text datasets.

```python
import numpy as np
from sentence_transformers import SentenceTransformers

# Assuming utils.py contains load_corpus function
from utils import load_corpus

corpus = load_corpus()

trf = SentenceTransformers("all-MiniLM-L6-v2")
embeddings = trf.encode(corpus)

np.save("embeddings.npy", embeddings)
```

--------------------------------

### Add Metadata and Tooltip Functionality

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

Adds metadata to the visualization, enabling tooltips and click event handling. It configures the deck.gl instance with a tooltip function and an optional click handler. It also preprocesses metadata for searching.

```javascript
addMetaData(metaData,{tooltipFunction=({index})=>this.metaData.hover_text[index],onClickFunction=null,searchField=null,}){this.metaData=metaData;this.tooltipFunction=tooltipFunction;this.onClickFunction=onClickFunction;this.searchField=searchField;if(this.metaData.hasOwnProperty('hover_text')){this.deckgl.setProps({getTooltip:this.tooltipFunction,});} if(this.onClickFunction){this.deckgl.setProps({onClick:this.onClickFunction,});} if(this.searchField){this.searchArray=this.metaData[this.searchField].map(d=>d.toLowerCase());}}
```

--------------------------------

### KeyNMF with Noun Phrase Extraction

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/model_definition_and_training.md

Trains a KeyNMF model using NounPhraseCountVectorizer for extracting noun phrases as features. Requires installing Turftopic with spaCy support and downloading a spaCy model. Outputs a table of topics with their highest-ranking terms.

```bash
pip install turftopic[spacy]
python -m spacy download "en_core_web_sm"
```

```python
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer

model = KeyNMF(10, vectorizer=NounPhraseCountVectorizer("en_core_web_sm"))
model.fit(corpus)
model.print_topics()
```

--------------------------------

### Estimate word importance with Clustering model in Python

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/finetuning.md

Shows how to fit a ClusteringTopicModel with a specified feature importance method and then print the topics. This illustrates the process of obtaining topic representations with different importance estimations.

```python
from turftopic import ClusteringTopicModel

model = ClusteringTopicModel(n_reduce_to=5, feature_importance="soft-c-tf-idf").fit(corpus)
model.print_topics()
```

--------------------------------

### Use TokenCountVectorizer with SpaCy for Non-English Languages

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/vectorizers.md

Shows how to install Turftopic with SpaCy and use 'TokenCountVectorizer' for vectorizing text in non-English languages, leveraging SpaCy's language-specific tokenization without requiring a full SpaCy pipeline. This is beneficial for languages where default tokenization is insufficient.

```bash
pip install turftopic[spacy]
```

```python
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import TokenCountVectorizer

# CountVectorizer for Arabic
vectorizer = TokenCountVectorizer("ar", min_df=10)

model = KeyNMF(
    n_components=10,
    vectorizer=vectorizer,
    encoder="Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet"
)
model.fit(corpus)
```

--------------------------------

### Fit AutoEncodingTopicModel and Print Topics

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb

This Python snippet demonstrates how to initialize an AutoEncodingTopicModel with a specified number of topics and an encoder. It then fits the model to a given corpus and embeddings, and finally prints the extracted topics. Dependencies include the AutoEncodingTopicModel class from turftopic and a suitable encoder (e.g., 'trf'). Inputs are the corpus and embeddings, and the output is the printed representation of topics.

```python
from turftopic import AutoEncodingTopicModel

model = AutoEncodingTopicModel(20, encoder=trf).fit(corpus,  embeddings=embeddings)
model.print_topics()
```

--------------------------------

### Define Custom Prompts for LLMAnalyzer in Python

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/analyzers.md

This snippet shows how to define custom system, namer, description, and summary prompts for the LLMAnalyzer. Prompts are formatted using Python's `str.format()`, expecting templated content within curly brackets. The analyzer is then instantiated with these custom prompts.

```python
from turftopic.analyzers import LLMAnalyzer

system_prompt = """
You are a topic analyzer.
Follow instructions closely and exactly.
"""

namer_prompt = """
Please provide a human-readable name for a topic.
The topic is described by the following set of keywords: {keywords}.
"""

description_prompt = """
Describe the following topic in a couple of sentences.
The topic is described by the following set of keywords: {keywords}.
"""

summary_prompt = """
Summarize the following document: {document}
"""

namer = LLMAnalyzer(
    system_prompt=system_prompt,
    namer_prompt=namer_prompt,
    description_prompt=description_prompt,
    summary_prompt=summary_prompt
)
```

--------------------------------

### DataMap Initialization and Configuration (JavaScript)

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/images/cluster_datamapplot.html

Initializes a DataMap instance with a container, geographical bounds, and item IDs for search and selection. This sets up the primary visualization component for the project.

```javascript
const container = document.getElementById('deck-container');
const datamap = new DataMap({
  container: container,
  bounds: [-8.976655139923096, 8.372218265533448, -8.90875467300415, 9.608116474151611],
  searchItemId: searchItemId,
  lassoSelectionItemId: selectionItemId,
});
```

--------------------------------

### Plot Concept Compass with Plotly

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/s3.md

Generates a concept compass plot to visualize the relationship between terms and semantic axes using Plotly. Requires the 'plotly' library to be installed. This function plots terms based on their scores along two specified topics (axes).

```bash
pip install plotly
```

```python
fig = model.plot_concept_compass(topic_x=1, topic_y=4)
fig.show()
```

--------------------------------

### Topic Modeling with Noun Phrase Vectorization using spaCy

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/README.md

Performs topic modeling using BERTopic with a custom `NounPhraseCountVectorizer` from spaCy. Requires `turftopic[spacy]` installation and a spaCy language model. This vectorizer focuses on noun phrases for topic representation.

```python
from turftopic import BERTopic
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer

model = BERTopic(
    n_components=10,
    vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
)
model.fit(corpus)
model.print_topics()
```

--------------------------------

### JavaScript: Initialize Web Workers for Data Processing

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

This snippet demonstrates the initialization of two Web Workers using a pre-defined `parsingWorkerBlob`. These workers are intended for handling 'label data' and 'point data' asynchronously, suggesting a pipeline for processing large datasets for visualization or analysis.

```javascript
const searchItemId = "text-search";
const histogramItemId = "d3histogram-container";
const selectionItemId = "lasso-select";
const searchItem = document.getElementById(searchItemId);
let histogramItem = null;
const container = document.getElementById('deck-container');
const labelDataWorker = new Worker(workerUrl);
const pointDataWorker = new Worker(workerUrl);

```

--------------------------------

### Shell Command for Directory Operations

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

This entry showcases a common shell command used for managing directories, specifically for creating a new directory. This is a fundamental operation often required during project setup or in build scripts. It does not involve complex logic but is essential for file system management.

```bash
mkdir my_new_directory

```

--------------------------------

### Initiate Data Layer Loading - JavaScript

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/tutorials/images/arxiv_ml_datamapplot.html

Calls the functions to load the point, label, and meta-data layers sequentially. These functions are responsible for initiating the data fetching and rendering process within the Turftopic project.

```javascript
loadPointDataLayer();
loadLabelDataLayer();
loadMetaData();
```

--------------------------------

### Noun Phrase Vectorization with SpaCy

Source: https://github.com/x-tabdeveloping/turftopic/blob/main/docs/vectorizers.md

This snippet shows how to use Turftopic's NounPhraseCountVectorizer, which leverages SpaCy for extracting noun phrases as features. It requires `turftopic[spacy]` and a SpaCy language model (e.g., 'en_core_web_sm'). Installation instructions are provided. Model fitting can be slower but may yield higher quality results.

```bash
pip install turftopic[spacy]
```

```bash
python -m spacy download en_core_web_sm
```

```python
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer

model = KeyNMF(
    n_components=10,
    vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
)
model.fit(corpus)
model.print_topics()
```