### Development Setup

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Steps to clone the repository, install dependencies, compile, and test the gem locally for development.

```sh
git clone --recursive https://github.com/ankane/tomoto-ruby.git
cd tomoto-ruby
bundle install
bundle exec rake compile
bundle exec rake test
```

--------------------------------

### Getting Started with Tomoto-Ruby

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md

Demonstrates the basic workflow of creating an LDA model, adding documents, training, and retrieving topic words. Ensure the 'tomoto' gem is required.

```ruby
require "tomoto"

# Create a model
model = Tomoto::LDA.new(k: 10)

# Add documents
model.add_doc(["machine", "learning"])
model.add_doc(["deep", "networks"])

# Train
model.train(100)

# Get results
puts model.topic_words(0, top_n: 10)
```

--------------------------------

### Example: Listing Live Topics and Levels

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

An example that iterates through all topics, checks if they are live, and prints their ID and level. This provides an overview of the active topics in the hierarchy.

```ruby
model.k.times do |topic_id|
  if model.live_topic?(topic_id)
    level = model.level(topic_id)
    puts "Topic #{topic_id} is live (level #{level})"
  end
end
```

--------------------------------

### Example: Displaying Parent Topic

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

An example showing how to get and display the parent topic for topic ID 5, including a check for whether the topic is live. This illustrates navigating upwards in the hierarchy.

```ruby
parent = model.parent_topic(5)
if parent >= 0
  puts "Topic 5 parent: Topic #{parent}"
else
  puts "Topic 5 is not live"
end
```

--------------------------------

### Example: Displaying Document Counts per Topic

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

An example that iterates through live topics and prints the number of documents assigned to each. This helps in understanding the distribution of documents across topics.

```ruby
model.k.times do |topic_id|
  next unless model.live_topic?(topic_id)
  count = model.num_docs_of_topic(topic_id)
  puts "Topic #{topic_id}: #{count} documents"
end
```

--------------------------------

### Example: Displaying Child Topics

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

An example demonstrating how to fetch and display the child topics for topic ID 0. This helps visualize the hierarchical relationships.

```ruby
children = model.children_topics(0)
puts "Topic 0 has children: #{children.inspect}"
```

--------------------------------

### LLDA Usage Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/llda.md

A comprehensive example demonstrating the creation, training, and inference process for an LLDA model. Includes adding labeled documents, training for a specified number of iterations, and analyzing results.

```ruby
# Create model with 5 topics
model = Tomoto::LLDA.new(k: 5, min_cf: 2)

# Add labeled documents
model.add_doc(["machine", "learning", "algorithm"], labels: ["ml", "ai"])
model.add_doc(["deep", "neural", "network"], labels: ["ml", "ai"])
model.add_doc(["politics", "election", "vote"], labels: ["politics"])
model.add_doc(["government", "policy", "law"], labels: ["politics"])

# Train
model.train(100)

# View results
puts model.summary(topic_word_top_n: 10)

model.k.times do |k|
  puts "Topic ##{k}"
  model.topic_words(k).each { |word, prob| puts "  #{word}: #{prob}" }
end

# Inference on new documents
new_doc = model.make_doc(["learning", "algorithm"])
topics, ll = model.infer(new_doc)
puts "Topic distribution: #{topics.inspect}"
```

--------------------------------

### Check Tomoto Gem Installation

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md

Verify if the tomoto gem is installed on your system. This is a prerequisite for using the gem.

```bash
gem list tomoto
```

--------------------------------

### Example: Identifying Root-Level Topics

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

An example that iterates through all topics to identify and display root-level topics (level 0) that are also live. This helps in understanding the top-level structure of the model.

```ruby
root_topics = []
model.k.times do |topic_id|
  if model.live_topic?(topic_id) && model.level(topic_id) == 0
    root_topics << topic_id
  end
end
puts "Root-level topics: #{root_topics.inspect}"
```

--------------------------------

### Full PLDA Usage Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/plda.md

A comprehensive example demonstrating the complete lifecycle of a PLDA model, including initialization, adding labeled and unlabeled documents, setting burn-in period, training, viewing summary, inspecting topics, and performing inference. This illustrates how PLDA learns from partially labeled data.

```ruby
# Create model
model = Tomoto::PLDA.new(latent_topics: 2, min_cf: 2)

# Add documents - some with labels, some without
labeled_docs = [
  ["apple", "orange", "banana"],
  ["car", "truck", "bicycle"]
]

model.add_doc(labeled_docs[0], labels: ["fruit"])
model.add_doc(labeled_docs[1], labels: ["vehicle"])

# Add unlabeled documents
model.add_doc(["cherry", "mango", "grape"])
model.add_doc(["bus", "train", "subway"])

# Train
model.burn_in = 50
model.train(100)

# View results
puts model.summary

# Check inferred structure
topics = model.k
topics.times do |i|
  words = model.topic_words(i, top_n: 10)
  puts "Topic #{i}: #{words.keys.join(", ")}"
end

# Inference - the model learns to associate documents with labels
doc = model.make_doc(["strawberry", "blueberry"])
dist, ll = model.infer(doc)
puts "Topic distribution: #{dist.inspect}"
```

--------------------------------

### Install tomoto Gem

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Add the tomoto gem to your application's Gemfile to install it.

```ruby
gem "tomoto"
```

--------------------------------

### Example HLDA Model Initialization

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

An example of initializing an HLDA model with a specific depth, alpha, gamma, and a random seed for reproducibility. This demonstrates a common use case for setting up the model.

```ruby
model = Tomoto::HLDA.new(depth: 3, alpha: 0.1, gamma: 0.1, seed: 42)
```

--------------------------------

### GDMR Usage Example: Document Length and Publication Year

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/gdmr.md

A comprehensive example demonstrating GDMR model creation, document addition with numeric metadata (length and year), model training, and summary output. It also shows how to analyze topic word distributions.

```ruby
# Create model with two numeric metadata dimensions
# Degree 2 for document length (quadratic), degree 1 for year (linear)
model = Tomoto::GDMR.new(k: 10, degrees: [2, 1], min_cf: 2)

documents = [
  {text: ["machine", "learning", "algorithm"], length: 150, year: 2020},
  {text: ["deep", "neural", "network"], length: 200, year: 2021},
  {text: ["classification", "prediction"], length: 80, year: 2019},
  {text: ["regression", "statistical"], length: 120, year: 2020},
  {text: ["clustering", "unsupervised"], length: 160, year: 2021}
]

documents.each do |doc|
  model.add_doc(doc[:text], numeric_metadata: [doc[:length], doc[:year]])
end

model.burn_in = 50
model.train(100)

puts model.summary(topic_word_top_n: 10)

# Analyze how metadata influences topics
model.k.times do |k|
  words = model.topic_words(k, top_n: 5)
  puts "Topic #{k}: #{words.keys.join(", ")}"
end
```

--------------------------------

### Full MGLDA Training and Summary Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/mglda.md

Demonstrates the complete workflow of initializing an MGLDA model, adding documents with sentence delimiters, training the model, and printing a summary of the topics. Use this as a template for your own MGLDA projects.

```ruby
model = Tomoto::MGLDA.new(
  k_g: 3,      # 3 global topics
  k_l: 5,      # 5 local (phrase) topics
  t: 3,        # 3-word window for grains
  min_cf: 2
)

# Documents with sentence-level structure
documents = [
  ["machine", "learning", "is", "important", ".",
   "deep", "networks", "are", "powerful", "."],
  
  ["natural", "language", "processing", "uses", "deep",
   "learning", ".",
   "text", "classification", "is", "useful", "."],
  
  ["computer", "vision", "tasks", "include", "classification",
   ".",
   "image", "segmentation", "is", "challenging", "."]
]

documents.each { |doc| model.add_doc(doc, delimiter: ".") }

model.train(100)

puts model.summary(topic_word_top_n: 10)

# View topics (combines both global and local)
model.k.times do |topic_id|
  words = model.topic_words(topic_id, top_n: 8)
  puts "Topic #{topic_id}: #{words.keys.join(", ")}"
end
```

--------------------------------

### SLDA Usage Example: Single Response Variable (Product Rating)

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md

Trains an SLDA model with a single linear response variable (product rating) and displays the summary and topics. This example shows the end-to-end process from initialization to training and output.

```ruby
# Create model with one linear response variable (rating scale 1-10)
model = Tomoto::SLDA.new(k: 5, vars: "l", min_cf: 2)

# Add reviews with ratings
model.add_doc(["excellent", "quality", "product"], y: [9.0])
model.add_doc(["good", "service", "value"], y: [8.0])
model.add_doc(["poor", "broken", "disappointed"], y: [2.0])
model.add_doc(["terrible", "waste", "money"], y: [1.0])

model.train(100)

puts model.summary(topic_word_top_n: 10)

# View topics
model.k.times do |i|
  puts "Topic ##{i}"
  model.topic_words(i).each do |word, prob|
    puts "  #{word}: #{prob}"
  end
end
```

--------------------------------

### Full HPA Model Usage Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hpa.md

Demonstrates the complete workflow for HPA, including model creation, document addition, training, and analysis of hierarchical topic distributions. Ensure documents are preprocessed and tokenized before adding.

```ruby
# Create hierarchical structure with two levels
# First level: 4 broad topics
# Second level: 6 specific topics under each
model = Tomoto::HPA.new(k1: 4, k2: 6, min_cf: 2)

documents = [
  # Machine Learning
  ["supervised", "classification", "regression"],
  ["unsupervised", "clustering", "dimensionality"],
  ["deep", "neural", "networks"],
  
  # Natural Language
  ["text", "processing", "tokenization"],
  ["language", "modeling", "prediction"],
  ["translation", "sequence", "sequence"],
  
  # Computer Vision
  ["image", "classification", "recognition"],
  ["object", "detection", "localization"],
  ["segmentation", "semantic", "instance"],
  
  # Other
  ["reinforcement", "learning", "agent"],
  ["graph", "neural", "networks"]
]

documents.each { |doc| model.add_doc(doc) }

model.burn_in = 50
model.train(100)

puts model.summary

# Analyze hierarchical structure
puts "\nHierarchical topic distribution:"
model.k.times do |topic_id|
  words = model.topic_words(topic_id, top_n: 5)
  count = model.count_by_topics[topic_id]
  puts "Topic #{topic_id} (#{count} tokens): #{words.keys.join(", ")}"
end
```

--------------------------------

### Author-Based Metadata Usage Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/dmr.md

Demonstrates a complete workflow for DMR, including model initialization, adding documents with author metadata, training, and analyzing the influence of author on topics.

```ruby
# Create model where topic distribution depends on author
model = Tomoto::DMR.new(k: 10, min_cf: 2, sigma: 1.0)

documents = [
  {text: ["machine", "learning", "classification"], author: "alice"},
  {text: ["deep", "neural", "networks"], author: "alice"},
  {text: ["poetry", "emotion", "verse"], author: "bob"},
  {text: ["literature", "novel", "fiction"], author: "bob"},
  {text: ["quantum", "physics", "mechanics"], author: "charlie"},
  {text: ["relativity", "space", "time"], author: "charlie"}
]

documents.each do |doc|
  model.add_doc(doc[:text], metadata: doc[:author])
end

model.burn_in = 50
model.train(100)

puts model.summary(topic_word_top_n: 10)

# Examine how metadata affects topics
lambdas = model.lambdas
lambdas.each_with_index do |topic_lambdas, topic_id|
  puts "Topic #{topic_id} - Author influence (lambdas): #{topic_lambdas.inspect}"
end
```

--------------------------------

### Initialize LLDA Model with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/llda.md

Example of creating an LLDA model with a specific number of topics (k=20), alpha, eta, and a random seed for reproducibility.

```ruby
model = Tomoto::LLDA.new(k: 20, alpha: 0.1, eta: 0.01, seed: 42)
```

--------------------------------

### Initialize CT Model with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md

Example of initializing a CT model with a specified number of topics, alpha, eta, and a random seed for reproducibility.

```ruby
model = Tomoto::CT.new(k: 15, alpha: 0.1, eta: 0.01, seed: 42)
```

--------------------------------

### Train a Basic LDA Model

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md

Initializes an LDA model with specified parameters and adds documents for training. Ensure you have the 'tomoto' gem installed.

```ruby
require "tomoto"

model = Tomoto::LDA.new(k: 20, min_cf: 3, seed: 42)

documents = [
  ["machine", "learning", "classification"],
  ["deep", "neural", "networks"],
  ["supervised", "training", "data"]
]

documents.each { |doc| model.add_doc(doc) }

model.burn_in = 50
model.train(200)

puts model.summary(topic_word_top_n: 10)
```

--------------------------------

### PLDA Workflow: Add Documents and Train

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/plda.md

Demonstrates the typical workflow for a PLDA model: initializing the model, adding both labeled and unlabeled documents, and then training the model. This example showcases how to handle mixed-label datasets.

```ruby
model = Tomoto::PLDA.new(latent_topics: 3)

# Labeled documents
model.add_doc(["sports", "baseball", "team"], labels: ["sports"])
model.add_doc(["music", "concert", "song"], labels: ["music"])

# Unlabeled documents (model will infer labels)
model.add_doc(["game", "score", "players"])
model.add_doc("tennis tournament match")

# Train with partial labels
model.train(100)
```

--------------------------------

### Initialize SLDA Model with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md

Model with one linear and one binary response variable. This example shows how to set the number of topics, specify response variable types, and set a random seed for reproducibility.

```ruby
# Model with one linear and one binary response variable
model = Tomoto::SLDA.new(k: 10, vars: "lb", alpha: 0.1, seed: 42)
```

--------------------------------

### Get Vocabulary

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Retrieve the entire vocabulary of words used in the trained model.

```ruby
model.vocabs
```

--------------------------------

### Get Model Summary

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Retrieve a summary of the trained topic model.

```ruby
model.summary
```

--------------------------------

### Compare Initial K with Final Topic Count

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md

This example illustrates how the HDP model automatically determines the final number of topics, which may differ from the initial capacity specified. It prints the initial, final, and live topic counts, along with the token count for each topic.

```ruby
# HDP automatically grows topics as needed
model = Tomoto::HDP.new(initial_k: 3, gamma: 0.1)

documents = [
  ["sports", "athlete", "competition"],
  ["sports", "game", "match"],
  ["politics", "election", "vote"],
  ["politics", "government", "policy"],
  ["technology", "software", "computer"],
  ["technology", "internet", "digital"],
  ["food", "recipe", "cooking"],
  ["food", "restaurant", "meal"]
]

documents.each { |doc| model.add_doc(doc) }

model.train(100)

puts "Initial K specified: 3"
puts "Final K value: #{model.k}"
puts "Live (active) topics: #{model.live_k}"
puts "Inactive topics: #{model.k - model.live_k}"

# These inactive topics exist but have no document assignments
model.k.times do |topic_id|
  status = model.live_topic?(topic_id) ? "ACTIVE" : "INACTIVE"
  count = model.count_by_topics[topic_id]
  puts "Topic #{topic_id}: #{status} (#{count} tokens)"
end
```

--------------------------------

### Analyze Quarterly Business Reports

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/dt.md

This example shows how to analyze business topics across 8 quarters using the Tomoto DT model. It configures the model for 5 topics and 8 time periods, adds business-related documents for each quarter, trains the model after a burn-in period, and then prints the representative words for each identified business topic.

```ruby
# Analyze business topics over 8 quarters
model = Tomoto::DT.new(
  k: 5,
  t: 8,
  min_cf: 3,
  alpha_var: 0.1,
  eta_var: 0.1,
  phi_var: 0.1
)

quarters = [
  {period: "Q1 2021", docs: [["revenue", "sales", "growth"], ["costs", "expenses", "reduction"]}},
  {period: "Q2 2021", docs: [["acquisition", "expansion", "market"], ["profit", "margins", "strong"]]},
  # ... more quarters ...
]

quarters.each_with_index do |quarter, idx|
  quarter[:docs].each do |doc|
    model.add_doc(doc, timepoint: idx)
  end
end

model.burn_in = 30
model.train(100)

# Analyze business themes
puts "Business Topics Over Time:"
model.k.times do |topic_id|
  words = model.topic_words(topic_id, top_n: 5)
  puts "\nTopic #{topic_id}: #{words.keys.join(", ")}"
end
```

--------------------------------

### Chinese Restaurant Process Metaphor Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md

This snippet demonstrates the Chinese Restaurant Process (CRP) metaphor used by HDP to determine topics. It shows how to initialize the model, add documents, train, and then report the total number of tables (CRP clusters), live topics, and initial capacity.

```ruby
model = Tomoto::HDP.new(initial_k: 10)

documents.each { |doc| model.add_doc(doc) }
model.train(200)

puts "Total tables (CRP clusters): #{model.num_tables}"
puts "Live topics: #{model.live_k}"
puts "Initial capacity: #{model.k}"
```

--------------------------------

### Create HPA Model with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hpa.md

Example of creating an HPA model with specific topic counts (k1, k2), alpha, and a random seed for reproducibility.

```ruby
model = Tomoto::HPA.new(k1: 5, k2: 10, alpha: 0.1, seed: 42)
```

--------------------------------

### Initialize MGLDA Model with Default Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/mglda.md

Creates a new MGLDA model instance with default parameters. Useful for starting a new MGLDA analysis.

```ruby
Tomoto::MGLDA.new(
  tw: :one,
  min_cf: 0,
  min_df: 0,
  rm_top: 0,
  k_g: 1,
  k_l: 1,
  t: 3,
  alpha_g: 0.1,
  alpha_l: 0.1,
  alpha_mg: 0.1,
  alpha_ml: 0.1,
  eta_g: 0.01
)
```

--------------------------------

### HDP Automatic Topic Discovery Example

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md

Demonstrates how to use the HDP model to automatically discover the number of topics from a collection of documents. It includes adding documents, training the model, and inspecting the results, including live topic counts and word distributions for active topics.

```ruby
# Create HDP model without specifying number of topics
model = Tomoto::HDP.new(initial_k: 5, min_cf: 2, gamma: 0.1)

# Add documents
documents = [
  ["machine", "learning", "classification"],
  ["deep", "neural", "networks"],
  ["sports", "basketball", "team"],
  ["football", "game", "score"],
  ["python", "programming", "code"],
  ["java", "software", "development"]
]

documents.each { |doc| model.add_doc(doc) }

puts "Initial K: 5"
puts "Before training - K: #{model.k}, Live K: #{model.live_k}"

# Train
model.burn_in = 50
100.times do |i|
  model.train(10)
  if i % 20 == 0
    puts "Iteration: #{(i + 1) * 10}, Live topics: #{model.live_k}, LL/word: #{model.ll_per_word}"
  end
end

puts "\nAfter training - K: #{model.k}, Live K: #{model.live_k}"

# Only show words for active topics
puts model.summary(topic_word_top_n: 10)

# View only active topics
model.k.times do |topic_id|
  next unless model.live_topic?(topic_id)
  puts "Topic ##{topic_id} (ACTIVE)"
  model.topic_words(topic_id, top_n: 8).each { |word, prob| puts "  #{word}: #{prob}" }
end
```

--------------------------------

### Initialize LDA Model with Default Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Creates a new LDA model instance using default parameters. Useful for quick setup when default values are suitable.

```ruby
Tomoto::LDA.new(
  tw: :one,
  min_cf: 0,
  min_df: 0,
  rm_top: 0,
  k: 1,
  alpha: 0.1,
  eta: 0.01,
  seed: nil
)
```

--------------------------------

### Explore Hierarchical Topic Structure

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

Initializes an HLDA model and demonstrates how to find root topics, traverse the hierarchy to get descendants, and display direct children with their top words. Useful for detailed analysis of topic relationships.

```ruby
model = Tomoto::HLDA.new(depth: 4, gamma: 0.1)

# ... add documents and train ...

# Find all root topics (level 0)
roots = (0...model.k).select { |t| model.live_topic?(t) && model.level(t) == 0 }
puts "Root topics: #{roots.inspect}"

# For each root, traverse hierarchy
roots.each do |root_id|
  puts "\nRoot Topic #{root_id}:"
  words = model.topic_words(root_id, top_n: 5)
  puts "  Top words: #{words.keys.join(", ")}"
  
  # Get all descendants
  def get_descendants(model, topic_id)
    children = model.children_topics(topic_id)
    all = children.dup
    children.each { |c| all.concat(get_descendants(model, c)) }
    all.uniq
  end
  
descendants = get_descendants(model, root_id)
  puts "  #{descendants.length} descendant topics"
  
  # Show first level children
  children = model.children_topics(root_id)
  puts "  Direct children: #{children.inspect}"
  children.each do |child|
    child_words = model.topic_words(child, top_n: 3)
    puts "    Topic #{child}: #{child_words.keys.join(", ")}"
  end
end
```

--------------------------------

### Analyze Prior Covariance Matrix

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md

Example of accessing and interpreting the prior covariance matrix, including its dimensions and the variances of individual topics (diagonal elements).

```ruby
model.train(100)
cov = model.prior_cov
puts "Covariance matrix dimensions: #{cov.length} × #{cov[0].length}"

# Print diagonal elements (variances)
cov.each_with_index do |row, i|
  puts "Variance of topic #{i}: #{row[i].round(4)}"
end
```

--------------------------------

### Get Topic Correlations

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md

Retrieves the correlations between topics. Specify a `topic_id` to get correlations for that specific topic, or call without an argument to get all pairwise correlations.

```ruby
corr = model.correlations(0)
all_corr = model.correlations()
```

--------------------------------

### Topic Correlation Network Analysis

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md

A comprehensive example demonstrating how to build a CT model, train it on documents, and then analyze and visualize the correlations between topics, highlighting related topics and their key words.

```ruby
model = Tomoto::CT.new(k: 8, min_cf: 2)

documents = [
  ["machine", "learning", "algorithms"],
  ["deep", "neural", "networks"],
  ["natural", "language", "processing"],
  ["computer", "vision", "images"],
  ["reinforcement", "learning", "agent"],
  ["supervised", "training", "data"],
  ["text", "documents", "analysis"],
  ["image", "recognition", "classification"]
]

documents.each { |doc| model.add_doc(doc) }

model.burn_in = 50
model.train(100)

# Analyze topic correlations
puts "Topic Correlation Network:"
puts "=" * 50

model.k.times do |topic_id|
  corr = model.correlations(topic_id)
  top_corr = corr.sort_by { |_, v| -v.abs }.first(3)
  
  words = model.topic_words(topic_id, top_n: 3).keys
  puts "\nTopic #{topic_id}: #{words.join(", ")}"
  
  top_corr.each do |other_topic, correlation|
    other_words = model.topic_words(other_topic, top_n": 3).keys
    status = correlation > 0 ? "+" : "-"
    puts "  #{status} Topic #{other_topic} (#{correlation.round(3)}): #{other_words.join(", ")}"
  end
end
```

--------------------------------

### Create PLDA Model with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/plda.md

Instantiates a PLDA model with a specific number of latent topics, alpha value, and a random seed for reproducibility. This is a common starting point for experiments.

```ruby
model = Tomoto::PLDA.new(latent_topics: 5, alpha: 0.1, seed: 42)
```

--------------------------------

### Annual Topic Evolution Analysis

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/dt.md

Models and analyzes topic evolution over several years using the DT model. This example demonstrates adding documents per year, training the model, and then inspecting topic distributions and evolution.

```ruby
# Model topic evolution over 5 years
model = Tomoto::DT.new(k: 10, t: 5, min_cf: 2)

documents = [
  # 2019 documents (timepoint 0)
  {text: ["machine", "learning", "basics"], year: 0},
  {text: ["neural", "network", "simple"], year: 0},
  
  # 2020 documents (timepoint 1)
  {text: ["deep", "learning", "advanced"], year: 1},
  {text: ["transformer", "models", "popular"], year: 1},
  
  # 2021 documents (timepoint 2)
  {text: ["large", "language", "models"], year: 2},
  {text: ["transformer", "scale", "performance"], year: 2},
  
  # 2022 documents (timepoint 3)
  {text: ["diffusion", "models", "generation"], year: 3},
  {text: ["generative", "ai", "creative"], year: 3},
  
  # 2023 documents (timepoint 4)
  {text: ["multimodal", "learning", "vision"], year: 4},
  {text: ["vision", "language", "integration"], year: 4}
]

documents.each { |doc| model.add_doc(doc[:text], timepoint: doc[:year]) }

puts "Documents per time period:"
model.k.times do |topic_id|
  count = model.count_by_topics[topic_id]
  puts "  Topic #{topic_id}: #{count} tokens"
end

model.burn_in = 50
model.train(100)

puts model.summary(topic_word_top_n: 10)

# Analyze topic evolution over time
puts "\nTopic Evolution:"
model.k.times do |topic_id|
  puts "\nTopic #{topic_id}:"
  
  # In DT, topics are indexed by time-topic pairs
  # This is a simplified view; actual time-specific analysis requires
  # accessing the underlying C++ structures
  words = model.topic_words(topic_id, top_n: 5)
  puts "  Top words: #{words.keys.join(", ")}"
end
```

--------------------------------

### PA Model Training and Summary

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/pa.md

Demonstrates creating a PA model, adding documents, training the model, and printing a summary. The total number of topics is k1 * k2.

```ruby
# Create model with 5 super-topics and 8 sub-topics per super-topic
# Total of 40 topics (5 * 8)
model = Tomoto::PA.new(k1: 5, k2: 8, min_cf: 2)

documents = [
  ["machine", "learning", "classification"],
  ["deep", "neural", "networks"],
  ["supervised", "regression", "prediction"],
  ["unsupervised", "clustering", "kmeans"],
  ["sports", "basketball", "game"],
  ["football", "soccer", "team"],
  ["tennis", "racket", "match"],
  ["politics", "election", "voting"],
  ["government", "policy", "law"],
  ["legislation", "congress", "bill"]
]

documents.each { |doc| model.add_doc(doc) }

model.burn_in = 50
model.train(100)

puts model.summary(topic_word_top_n: 10)

# The model has k = k1 * k2 total topics
puts "Total topics: #{model.k} (#{model.k1} super-topics × #{model.k2} sub-topics)"

# View topics at different levels
puts "\nTop-level structure:"
model.k.times do |topic_id|
  # In PA, topic structure reflects hierarchy
  super_topic = topic_id / model.k2
  sub_topic = topic_id % model.k2
  
  words = model.topic_words(topic_id, top_n: 5)
  puts "Super-topic #{super_topic}, Sub-topic #{sub_topic}: #{words.keys.join(", ")}"
end
```

--------------------------------

### optim_interval

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Gets or sets the parameter optimization interval for the LDA model.

```APIDOC
## optim_interval (getter/setter)

### Description
Gets or sets the parameter optimization interval.

### Method
```ruby
model.optim_interval = value
puts model.optim_interval
```

### Parameters
#### Path Parameters
- **value** (Integer) - The optimization interval to set.

### Returns
Integer representing the optimization interval.
```

--------------------------------

### Get Topic Words

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Retrieve the words associated with each topic in the model.

```ruby
model.topic_words
```

--------------------------------

### Configure Parallel Training Options

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md

Demonstrates various options for parallel training, including disabling parallelization, using copy-merge or partition algorithms, and specifying the number of worker threads.

```ruby
# Default: automatic parallelization
model.train(100)

# No parallelization (single-threaded)
model.train(100, parallel: :none)

# Copy-merge algorithm (good for small models)
model.train(100, parallel: :copy_merge)

# Partition algorithm (good for large models)
model.train(100, parallel: :partition)

# Specify worker count (0 = number of CPU cores)
model.train(100, workers: 4)
```

--------------------------------

### Configure SLDA Variables

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/errors.md

Demonstrates valid and invalid configurations for the `vars` parameter in `Tomoto::SLDA.new()`, handling potential RuntimeErrors for unknown variable types.

```ruby
begin
  # Valid: one linear variable
  model = Tomoto::SLDA.new(k: 10, vars: "l")
  
  # Valid: linear and binary
  model = Tomoto::SLDA.new(k: 10, vars: "lb")
  
  # Invalid: unknown variable type
  model = Tomoto::SLDA.new(k: 10, vars: "lx")
rescue RuntimeError => e
  puts "Invalid variable configuration: #{e.message}"
end
```

--------------------------------

### Get Model Perplexity

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the perplexity score of the LDA model. Returns a float.

```ruby
perp = model.perplexity
```

--------------------------------

### Get Filtered Vocabulary

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the list of words remaining in the vocabulary after any filtering has been applied.

```ruby
vocab = model.used_vocabs
puts "Vocabulary size: #{vocab.length}"
puts "First 10 words: #{vocab.take(10).inspect}"
```

--------------------------------

### Get Number of Documents

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the total count of documents added to the LDA model.

```ruby
count = model.num_docs
```

--------------------------------

### Get Number of Topics (k)

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the configured number of topics in the LDA model.

```ruby
num_topics = model.k
```

--------------------------------

### Initialize PA Model with Default Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/pa.md

Creates a new PA model instance with default parameters. Adjust parameters like k1, k2, alpha, and eta for specific modeling needs.

```ruby
Tomoto::PA.new(
  tw: :one,
  min_cf: 0,
  min_df: 0,
  rm_top: 0,
  k1: 1,
  k2: 1,
  alpha: 0.1,
  eta: 0.01,
  seed: nil
)
```

--------------------------------

### Get Eta Value

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the Dirichlet prior value for per-topic word distributions.

```ruby
eta_value = model.eta
```

--------------------------------

### Get Alpha Values

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the Dirichlet prior values for per-document topic distributions.

```ruby
alpha_values = model.alpha
puts alpha_values.inspect
```

```ruby
model = Tomoto::LDA.new(k: 10, alpha: 0.1)
alpha_values = model.alpha
puts alpha_values.inspect
```

--------------------------------

### Load Model from File

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Load a previously saved topic model from a binary file.

```ruby
model = Tomoto::LDA.load("model.bin")
```

--------------------------------

### Set and Get Optimization Interval

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Sets or retrieves the interval in iterations between parameter optimizations.

```ruby
model.optim_interval = 5
puts model.optim_interval  # => 5
```

--------------------------------

### Count Words by Topic

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Get the total number of words assigned to each topic in the model.

```ruby
model.count_by_topics
```

--------------------------------

### Initialize and Train LDA Model

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Initializes an LDA model, adds documents, trains it, and then prints its summary.

```ruby
model = Tomoto::LDA.new(k: 5)
model.add_doc(["word1", "word2", "word3"])
model.train(100)
puts model.summary
```

--------------------------------

### Get Topic Probabilities for a Document

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Access a document from the model and retrieve its topic distribution.

```ruby
doc = model.docs[0]
doc.topics
```

--------------------------------

### Get All Documents

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves all documents currently stored in the model. Returns an array of Document objects.

```ruby
documents = model.docs
```

--------------------------------

### Initialize SLDA Model with Default Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md

Creates a new SLDA model instance with specified parameters. Use this to configure the model's behavior, including the number of topics and variable types.

```ruby
Tomoto::SLDA.new(
  tw: :one,
  min_cf: 0,
  min_df: 0,
  rm_top: 0,
  k: 1,
  vars: "l",
  alpha: 0.1,
  eta: 0.01,
  mu: [],
  nu_sq: [],
  glm_param: [],
  seed: nil
)
```

--------------------------------

### Get Word Frequencies (Complete Vocabulary)

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the frequencies of words in the complete vocabulary, before any filtering.

```ruby
freqs = model.vocab_freq
```

--------------------------------

### Get Word Frequencies (Filtered Vocabulary)

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the frequencies of words within the filtered vocabulary.

```ruby
freqs = model.used_vocab_freq
```

--------------------------------

### Get Complete Vocabulary

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the complete list of all unique words encountered before any vocabulary filtering.

```ruby
vocab = model.vocabs
```

--------------------------------

### Initialize CT Model with Default Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md

Creates a new CT model instance with default parameters. Adjust parameters like `k` (number of topics), `alpha` (prior mean), and `eta` (prior Dirichlet) for your specific needs.

```ruby
Tomoto::CT.new(
  tw: :one,
  min_cf: 0,
  min_df: 0,
  rm_top: 0,
  k: 1,
  alpha: 0.1,
  eta: 0.01,
  seed: nil
)
```

--------------------------------

### Get Vocabulary Size (Before Filtering)

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the total size of the vocabulary, including words that may have been filtered out.

```ruby
size = model.num_vocabs
```

--------------------------------

### Get Total Number of Words

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the total count of all word tokens across all documents in the model.

```ruby
count = model.num_words
```

--------------------------------

### Initialize and Train SLDA Model

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md

Initializes an SLDA model with specified parameters, adds documents with associated response variables, and trains the model. Use this to build a supervised topic model.

```ruby
model = Tomoto::SLDA.new(k: 8, vars: "lb", alpha: 0.1, eta: 0.01, seed: 42)

reviews = [
  {text: ["great", "buy", "again"], rating: 9.0, purchased: 1},
  {text: ["okay", "decent"], rating: 6.0, purchased: 1},
  {text: ["awful", "regret"], rating: 2.0, purchased: 0},
  {text: ["amazing", "recommended"], rating: 10.0, purchased: 1}
]

reviews.each do |review|
  model.add_doc(review[:text], y: [review[:rating], review[:purchased]])
end

model.burn_in = 50
model.train(200)
```

--------------------------------

### Get Removed Words from Vocabulary Filtering

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves an array of words that were removed from the vocabulary due to filtering, such as `rm_top`.

```ruby
model = Tomoto::LDA.new(k: 10, rm_top: 20)
# ... train ...
removed = model.removed_top_words
puts "Removed: #{removed.inspect}"
```

--------------------------------

### Add Documents, Train, and Inspect Topics

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Adds two documents to the model, trains the model, and then iterates through each document to print its assigned topics. This demonstrates adding data and analyzing results.

```ruby
model.add_doc(["word1", "word2"])
model.add_doc(["word3", "word4"])
model.train(100)

docs = model.docs
docs.each_with_index do |doc, idx|
  puts "Document #{idx}: #{doc.topics.inspect}"
end
```

--------------------------------

### Get Log Likelihood Per Word

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Calculate and retrieve the log likelihood per word for the trained model.

```ruby
model.ll_per_word
```

--------------------------------

### Get Global Training Step

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the current training iteration number (global step) of the model. Returns an integer.

```ruby
step = model.global_step
```

--------------------------------

### Get Document Frequency for All Vocabulary

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the document frequency for all words in the model's vocabulary. Returns an array of integers.

```ruby
df = model.vocab_df
```

--------------------------------

### Set and Get Burn-in Iterations

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Sets or retrieves the number of initial training iterations whose statistics are discarded as burn-in.

```ruby
model.burn_in = 100
puts model.burn_in  # => 100
```

```ruby
model = Tomoto::LDA.new(k: 10)
model.add_doc(["word1", "word2"])
model.burn_in = 50
model.train(100)
```

--------------------------------

### MGLDA Initialization for Scientific Papers

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/mglda.md

Initializes an MGLDA model with parameters tailored for analyzing scientific papers, setting a higher number of global and local topics and a larger context window for grains. Adjust alpha priors for global and local topics as needed.

```ruby
model = Tomoto::MGLDA.new(
  k_g: 10,     # 10 global research topics
  k_l: 20,     # 20 local method topics
  t: 4,        # 4-word context
  alpha_g: 0.1,
  alpha_l: 0.05
)
```

--------------------------------

### Perform Inference on Unseen Document

Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md

Create a new document and perform inference to get its topic distribution and log likelihood.

```ruby
doc = model.make_doc(["unseen", "doc"])
topic_dist, ll = model.infer(doc)
```

--------------------------------

### Get Term Weighting Scheme

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the term weighting scheme used by the model. Returns a symbol, which can be :one, :idf, or :pmi.

```ruby
weight = model.tw
```

--------------------------------

### Initialize PA Model with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/pa.md

Creates a new PA model instance with specified super-topics (k1), sub-topics per super-topic (k2), alpha, and a random seed for reproducibility.

```ruby
model = Tomoto::PA.new(k1: 5, k2: 10, alpha: 0.1, seed: 42)
```

--------------------------------

### Initialize and Train GDMR Model with Two Numeric Metadata Dimensions

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/gdmr.md

Initializes a GDMR model with 8 topics and linear relationships for two metadata dimensions (popularity and time). Documents are added with their respective metadata, and the model is trained.

```ruby
model = Tomoto::GDMR.new(
  k: 8,
  degrees: [1, 1],  # both linear
  sigma: 1.5,
  sigma0: 3.0
)

documents = [
  {text: ["viral", "trending", "social"], popularity: 9.5, time: 1.0},
  {text: ["viral", "popular", "share"], popularity: 8.8, time: 1.2},
  {text: ["niche", "obscure", "rare"], popularity: 2.3, time: 0.5},
  {text: ["niche", "unknown", "hidden"], popularity: 1.9, time: 0.3}
]

documents.each do |doc|
  model.add_doc(doc[:text], numeric_metadata: [doc[:popularity], doc[:time]])
end

model.train(100)

# Topics now reflect how popularity and time affect topic distributions
puts model.summary
```

--------------------------------

### Get Document Frequency for Used Vocabulary

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md

Retrieves the document frequency for words present in the filtered vocabulary. Returns an array of integers.

```ruby
df = model.used_vocab_df
```

--------------------------------

### Retrieving Topic Words

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/types.md

Get the top N words for a specific topic or all topics. The result is a hash mapping words to their probabilities.

```ruby
model = Tomoto::LDA.new(k: 5)
# ... add documents and train ...

# Get top 10 words for topic 0
words = model.topic_words(0, top_n: 10)
words.each { |word, prob| puts "#{word}: #{prob}" }
```

```ruby
# Get all topics
all_topics = model.topic_words(top_n: 10)  # Array of Hashes
```

--------------------------------

### Get Number of Documents for a Topic

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

Returns the count of documents assigned to a specific topic. This metric helps in assessing the prevalence of a topic.

```ruby
doc_count = model.num_docs_of_topic(0)
```

--------------------------------

### Train HLDA and Print Hierarchy

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md

Initializes an HLDA model, adds documents, trains the model, and then recursively prints the discovered topic hierarchy. Use this to visualize the discovered topic structure.

```ruby
model = Tomoto::HLDA.new(depth: 3, min_cf: 2)

documents = [
  ["machine", "learning", "classification"],
  ["deep", "neural", "networks"],
  ["computer", "vision", "image"],
  ["natural", "language", "processing"],
  ["sports", "basketball", "team"],
  ["sports", "football", "game"],
  ["politics", "election", "vote"],
  ["politics", "government", "policy"]
]

documents.each { |doc| model.add_doc(doc) }

model.burn_in = 50
model.train(100)

puts "Discovered hierarchy with depth: #{model.depth}"

# Print hierarchy structure
def print_hierarchy(model, topic_id, indent = 0)
  return unless model.live_topic?(topic_id)
  
  words = model.topic_words(topic_id, top_n: 5).keys.join(", ")
  puts "#{"  " * indent}Topic #{topic_id} (Level #{model.level(topic_id)}): #{words}"
  
  model.children_topics(topic_id).sort.each do |child|
    print_hierarchy(model, child, indent + 1)
  end
end

print_hierarchy(model, 0)
```

--------------------------------

### HDP Constructor with Custom Parameters

Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md

Initializes an HDP model with specific parameters for controlling topic discovery and prior distributions. Set a random seed for reproducible results.

```ruby
model = Tomoto::HDP.new(initial_k: 5, alpha: 0.1, gamma: 0.1, seed: 42)
```