### Development Setup Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Steps to clone the repository, install dependencies, compile, and test the gem locally for development. ```sh git clone --recursive https://github.com/ankane/tomoto-ruby.git cd tomoto-ruby bundle install bundle exec rake compile bundle exec rake test ``` -------------------------------- ### Getting Started with Tomoto-Ruby Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md Demonstrates the basic workflow of creating an LDA model, adding documents, training, and retrieving topic words. Ensure the 'tomoto' gem is required. ```ruby require "tomoto" # Create a model model = Tomoto::LDA.new(k: 10) # Add documents model.add_doc(["machine", "learning"]) model.add_doc(["deep", "networks"]) # Train model.train(100) # Get results puts model.topic_words(0, top_n: 10) ``` -------------------------------- ### Example: Listing Live Topics and Levels Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md An example that iterates through all topics, checks if they are live, and prints their ID and level. This provides an overview of the active topics in the hierarchy. ```ruby model.k.times do |topic_id| if model.live_topic?(topic_id) level = model.level(topic_id) puts "Topic #{topic_id} is live (level #{level})" end end ``` -------------------------------- ### Example: Displaying Parent Topic Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md An example showing how to get and display the parent topic for topic ID 5, including a check for whether the topic is live. This illustrates navigating upwards in the hierarchy. ```ruby parent = model.parent_topic(5) if parent >= 0 puts "Topic 5 parent: Topic #{parent}" else puts "Topic 5 is not live" end ``` -------------------------------- ### Example: Displaying Document Counts per Topic Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md An example that iterates through live topics and prints the number of documents assigned to each. This helps in understanding the distribution of documents across topics. ```ruby model.k.times do |topic_id| next unless model.live_topic?(topic_id) count = model.num_docs_of_topic(topic_id) puts "Topic #{topic_id}: #{count} documents" end ``` -------------------------------- ### Example: Displaying Child Topics Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md An example demonstrating how to fetch and display the child topics for topic ID 0. This helps visualize the hierarchical relationships. ```ruby children = model.children_topics(0) puts "Topic 0 has children: #{children.inspect}" ``` -------------------------------- ### LLDA Usage Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/llda.md A comprehensive example demonstrating the creation, training, and inference process for an LLDA model. Includes adding labeled documents, training for a specified number of iterations, and analyzing results. ```ruby # Create model with 5 topics model = Tomoto::LLDA.new(k: 5, min_cf: 2) # Add labeled documents model.add_doc(["machine", "learning", "algorithm"], labels: ["ml", "ai"]) model.add_doc(["deep", "neural", "network"], labels: ["ml", "ai"]) model.add_doc(["politics", "election", "vote"], labels: ["politics"]) model.add_doc(["government", "policy", "law"], labels: ["politics"]) # Train model.train(100) # View results puts model.summary(topic_word_top_n: 10) model.k.times do |k| puts "Topic ##{k}" model.topic_words(k).each { |word, prob| puts " #{word}: #{prob}" } end # Inference on new documents new_doc = model.make_doc(["learning", "algorithm"]) topics, ll = model.infer(new_doc) puts "Topic distribution: #{topics.inspect}" ``` -------------------------------- ### Check Tomoto Gem Installation Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md Verify if the tomoto gem is installed on your system. This is a prerequisite for using the gem. ```bash gem list tomoto ``` -------------------------------- ### Example: Identifying Root-Level Topics Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md An example that iterates through all topics to identify and display root-level topics (level 0) that are also live. This helps in understanding the top-level structure of the model. ```ruby root_topics = [] model.k.times do |topic_id| if model.live_topic?(topic_id) && model.level(topic_id) == 0 root_topics << topic_id end end puts "Root-level topics: #{root_topics.inspect}" ``` -------------------------------- ### Full PLDA Usage Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/plda.md A comprehensive example demonstrating the complete lifecycle of a PLDA model, including initialization, adding labeled and unlabeled documents, setting burn-in period, training, viewing summary, inspecting topics, and performing inference. This illustrates how PLDA learns from partially labeled data. ```ruby # Create model model = Tomoto::PLDA.new(latent_topics: 2, min_cf: 2) # Add documents - some with labels, some without labeled_docs = [ ["apple", "orange", "banana"], ["car", "truck", "bicycle"] ] model.add_doc(labeled_docs[0], labels: ["fruit"]) model.add_doc(labeled_docs[1], labels: ["vehicle"]) # Add unlabeled documents model.add_doc(["cherry", "mango", "grape"]) model.add_doc(["bus", "train", "subway"]) # Train model.burn_in = 50 model.train(100) # View results puts model.summary # Check inferred structure topics = model.k topics.times do |i| words = model.topic_words(i, top_n: 10) puts "Topic #{i}: #{words.keys.join(", ")}" end # Inference - the model learns to associate documents with labels doc = model.make_doc(["strawberry", "blueberry"]) dist, ll = model.infer(doc) puts "Topic distribution: #{dist.inspect}" ``` -------------------------------- ### Install tomoto Gem Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Add the tomoto gem to your application's Gemfile to install it. ```ruby gem "tomoto" ``` -------------------------------- ### Example HLDA Model Initialization Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md An example of initializing an HLDA model with a specific depth, alpha, gamma, and a random seed for reproducibility. This demonstrates a common use case for setting up the model. ```ruby model = Tomoto::HLDA.new(depth: 3, alpha: 0.1, gamma: 0.1, seed: 42) ``` -------------------------------- ### GDMR Usage Example: Document Length and Publication Year Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/gdmr.md A comprehensive example demonstrating GDMR model creation, document addition with numeric metadata (length and year), model training, and summary output. It also shows how to analyze topic word distributions. ```ruby # Create model with two numeric metadata dimensions # Degree 2 for document length (quadratic), degree 1 for year (linear) model = Tomoto::GDMR.new(k: 10, degrees: [2, 1], min_cf: 2) documents = [ {text: ["machine", "learning", "algorithm"], length: 150, year: 2020}, {text: ["deep", "neural", "network"], length: 200, year: 2021}, {text: ["classification", "prediction"], length: 80, year: 2019}, {text: ["regression", "statistical"], length: 120, year: 2020}, {text: ["clustering", "unsupervised"], length: 160, year: 2021} ] documents.each do |doc| model.add_doc(doc[:text], numeric_metadata: [doc[:length], doc[:year]]) end model.burn_in = 50 model.train(100) puts model.summary(topic_word_top_n: 10) # Analyze how metadata influences topics model.k.times do |k| words = model.topic_words(k, top_n: 5) puts "Topic #{k}: #{words.keys.join(", ")}" end ``` -------------------------------- ### Full MGLDA Training and Summary Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/mglda.md Demonstrates the complete workflow of initializing an MGLDA model, adding documents with sentence delimiters, training the model, and printing a summary of the topics. Use this as a template for your own MGLDA projects. ```ruby model = Tomoto::MGLDA.new( k_g: 3, # 3 global topics k_l: 5, # 5 local (phrase) topics t: 3, # 3-word window for grains min_cf: 2 ) # Documents with sentence-level structure documents = [ ["machine", "learning", "is", "important", ".", "deep", "networks", "are", "powerful", "."], ["natural", "language", "processing", "uses", "deep", "learning", ".", "text", "classification", "is", "useful", "."], ["computer", "vision", "tasks", "include", "classification", ".", "image", "segmentation", "is", "challenging", "."] ] documents.each { |doc| model.add_doc(doc, delimiter: ".") } model.train(100) puts model.summary(topic_word_top_n: 10) # View topics (combines both global and local) model.k.times do |topic_id| words = model.topic_words(topic_id, top_n: 8) puts "Topic #{topic_id}: #{words.keys.join(", ")}" end ``` -------------------------------- ### SLDA Usage Example: Single Response Variable (Product Rating) Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md Trains an SLDA model with a single linear response variable (product rating) and displays the summary and topics. This example shows the end-to-end process from initialization to training and output. ```ruby # Create model with one linear response variable (rating scale 1-10) model = Tomoto::SLDA.new(k: 5, vars: "l", min_cf: 2) # Add reviews with ratings model.add_doc(["excellent", "quality", "product"], y: [9.0]) model.add_doc(["good", "service", "value"], y: [8.0]) model.add_doc(["poor", "broken", "disappointed"], y: [2.0]) model.add_doc(["terrible", "waste", "money"], y: [1.0]) model.train(100) puts model.summary(topic_word_top_n: 10) # View topics model.k.times do |i| puts "Topic ##{i}" model.topic_words(i).each do |word, prob| puts " #{word}: #{prob}" end end ``` -------------------------------- ### Full HPA Model Usage Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hpa.md Demonstrates the complete workflow for HPA, including model creation, document addition, training, and analysis of hierarchical topic distributions. Ensure documents are preprocessed and tokenized before adding. ```ruby # Create hierarchical structure with two levels # First level: 4 broad topics # Second level: 6 specific topics under each model = Tomoto::HPA.new(k1: 4, k2: 6, min_cf: 2) documents = [ # Machine Learning ["supervised", "classification", "regression"], ["unsupervised", "clustering", "dimensionality"], ["deep", "neural", "networks"], # Natural Language ["text", "processing", "tokenization"], ["language", "modeling", "prediction"], ["translation", "sequence", "sequence"], # Computer Vision ["image", "classification", "recognition"], ["object", "detection", "localization"], ["segmentation", "semantic", "instance"], # Other ["reinforcement", "learning", "agent"], ["graph", "neural", "networks"] ] documents.each { |doc| model.add_doc(doc) } model.burn_in = 50 model.train(100) puts model.summary # Analyze hierarchical structure puts "\nHierarchical topic distribution:" model.k.times do |topic_id| words = model.topic_words(topic_id, top_n: 5) count = model.count_by_topics[topic_id] puts "Topic #{topic_id} (#{count} tokens): #{words.keys.join(", ")}" end ``` -------------------------------- ### Author-Based Metadata Usage Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/dmr.md Demonstrates a complete workflow for DMR, including model initialization, adding documents with author metadata, training, and analyzing the influence of author on topics. ```ruby # Create model where topic distribution depends on author model = Tomoto::DMR.new(k: 10, min_cf: 2, sigma: 1.0) documents = [ {text: ["machine", "learning", "classification"], author: "alice"}, {text: ["deep", "neural", "networks"], author: "alice"}, {text: ["poetry", "emotion", "verse"], author: "bob"}, {text: ["literature", "novel", "fiction"], author: "bob"}, {text: ["quantum", "physics", "mechanics"], author: "charlie"}, {text: ["relativity", "space", "time"], author: "charlie"} ] documents.each do |doc| model.add_doc(doc[:text], metadata: doc[:author]) end model.burn_in = 50 model.train(100) puts model.summary(topic_word_top_n: 10) # Examine how metadata affects topics lambdas = model.lambdas lambdas.each_with_index do |topic_lambdas, topic_id| puts "Topic #{topic_id} - Author influence (lambdas): #{topic_lambdas.inspect}" end ``` -------------------------------- ### Initialize LLDA Model with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/llda.md Example of creating an LLDA model with a specific number of topics (k=20), alpha, eta, and a random seed for reproducibility. ```ruby model = Tomoto::LLDA.new(k: 20, alpha: 0.1, eta: 0.01, seed: 42) ``` -------------------------------- ### Initialize CT Model with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md Example of initializing a CT model with a specified number of topics, alpha, eta, and a random seed for reproducibility. ```ruby model = Tomoto::CT.new(k: 15, alpha: 0.1, eta: 0.01, seed: 42) ``` -------------------------------- ### Train a Basic LDA Model Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md Initializes an LDA model with specified parameters and adds documents for training. Ensure you have the 'tomoto' gem installed. ```ruby require "tomoto" model = Tomoto::LDA.new(k: 20, min_cf: 3, seed: 42) documents = [ ["machine", "learning", "classification"], ["deep", "neural", "networks"], ["supervised", "training", "data"] ] documents.each { |doc| model.add_doc(doc) } model.burn_in = 50 model.train(200) puts model.summary(topic_word_top_n: 10) ``` -------------------------------- ### PLDA Workflow: Add Documents and Train Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/plda.md Demonstrates the typical workflow for a PLDA model: initializing the model, adding both labeled and unlabeled documents, and then training the model. This example showcases how to handle mixed-label datasets. ```ruby model = Tomoto::PLDA.new(latent_topics: 3) # Labeled documents model.add_doc(["sports", "baseball", "team"], labels: ["sports"]) model.add_doc(["music", "concert", "song"], labels: ["music"]) # Unlabeled documents (model will infer labels) model.add_doc(["game", "score", "players"]) model.add_doc("tennis tournament match") # Train with partial labels model.train(100) ``` -------------------------------- ### Initialize SLDA Model with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md Model with one linear and one binary response variable. This example shows how to set the number of topics, specify response variable types, and set a random seed for reproducibility. ```ruby # Model with one linear and one binary response variable model = Tomoto::SLDA.new(k: 10, vars: "lb", alpha: 0.1, seed: 42) ``` -------------------------------- ### Get Vocabulary Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Retrieve the entire vocabulary of words used in the trained model. ```ruby model.vocabs ``` -------------------------------- ### Get Model Summary Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Retrieve a summary of the trained topic model. ```ruby model.summary ``` -------------------------------- ### Compare Initial K with Final Topic Count Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md This example illustrates how the HDP model automatically determines the final number of topics, which may differ from the initial capacity specified. It prints the initial, final, and live topic counts, along with the token count for each topic. ```ruby # HDP automatically grows topics as needed model = Tomoto::HDP.new(initial_k: 3, gamma: 0.1) documents = [ ["sports", "athlete", "competition"], ["sports", "game", "match"], ["politics", "election", "vote"], ["politics", "government", "policy"], ["technology", "software", "computer"], ["technology", "internet", "digital"], ["food", "recipe", "cooking"], ["food", "restaurant", "meal"] ] documents.each { |doc| model.add_doc(doc) } model.train(100) puts "Initial K specified: 3" puts "Final K value: #{model.k}" puts "Live (active) topics: #{model.live_k}" puts "Inactive topics: #{model.k - model.live_k}" # These inactive topics exist but have no document assignments model.k.times do |topic_id| status = model.live_topic?(topic_id) ? "ACTIVE" : "INACTIVE" count = model.count_by_topics[topic_id] puts "Topic #{topic_id}: #{status} (#{count} tokens)" end ``` -------------------------------- ### Analyze Quarterly Business Reports Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/dt.md This example shows how to analyze business topics across 8 quarters using the Tomoto DT model. It configures the model for 5 topics and 8 time periods, adds business-related documents for each quarter, trains the model after a burn-in period, and then prints the representative words for each identified business topic. ```ruby # Analyze business topics over 8 quarters model = Tomoto::DT.new( k: 5, t: 8, min_cf: 3, alpha_var: 0.1, eta_var: 0.1, phi_var: 0.1 ) quarters = [ {period: "Q1 2021", docs: [["revenue", "sales", "growth"], ["costs", "expenses", "reduction"]}}, {period: "Q2 2021", docs: [["acquisition", "expansion", "market"], ["profit", "margins", "strong"]]}, # ... more quarters ... ] quarters.each_with_index do |quarter, idx| quarter[:docs].each do |doc| model.add_doc(doc, timepoint: idx) end end model.burn_in = 30 model.train(100) # Analyze business themes puts "Business Topics Over Time:" model.k.times do |topic_id| words = model.topic_words(topic_id, top_n: 5) puts "\nTopic #{topic_id}: #{words.keys.join(", ")}" end ``` -------------------------------- ### Chinese Restaurant Process Metaphor Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md This snippet demonstrates the Chinese Restaurant Process (CRP) metaphor used by HDP to determine topics. It shows how to initialize the model, add documents, train, and then report the total number of tables (CRP clusters), live topics, and initial capacity. ```ruby model = Tomoto::HDP.new(initial_k: 10) documents.each { |doc| model.add_doc(doc) } model.train(200) puts "Total tables (CRP clusters): #{model.num_tables}" puts "Live topics: #{model.live_k}" puts "Initial capacity: #{model.k}" ``` -------------------------------- ### Create HPA Model with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hpa.md Example of creating an HPA model with specific topic counts (k1, k2), alpha, and a random seed for reproducibility. ```ruby model = Tomoto::HPA.new(k1: 5, k2: 10, alpha: 0.1, seed: 42) ``` -------------------------------- ### Initialize MGLDA Model with Default Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/mglda.md Creates a new MGLDA model instance with default parameters. Useful for starting a new MGLDA analysis. ```ruby Tomoto::MGLDA.new( tw: :one, min_cf: 0, min_df: 0, rm_top: 0, k_g: 1, k_l: 1, t: 3, alpha_g: 0.1, alpha_l: 0.1, alpha_mg: 0.1, alpha_ml: 0.1, eta_g: 0.01 ) ``` -------------------------------- ### HDP Automatic Topic Discovery Example Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md Demonstrates how to use the HDP model to automatically discover the number of topics from a collection of documents. It includes adding documents, training the model, and inspecting the results, including live topic counts and word distributions for active topics. ```ruby # Create HDP model without specifying number of topics model = Tomoto::HDP.new(initial_k: 5, min_cf: 2, gamma: 0.1) # Add documents documents = [ ["machine", "learning", "classification"], ["deep", "neural", "networks"], ["sports", "basketball", "team"], ["football", "game", "score"], ["python", "programming", "code"], ["java", "software", "development"] ] documents.each { |doc| model.add_doc(doc) } puts "Initial K: 5" puts "Before training - K: #{model.k}, Live K: #{model.live_k}" # Train model.burn_in = 50 100.times do |i| model.train(10) if i % 20 == 0 puts "Iteration: #{(i + 1) * 10}, Live topics: #{model.live_k}, LL/word: #{model.ll_per_word}" end end puts "\nAfter training - K: #{model.k}, Live K: #{model.live_k}" # Only show words for active topics puts model.summary(topic_word_top_n: 10) # View only active topics model.k.times do |topic_id| next unless model.live_topic?(topic_id) puts "Topic ##{topic_id} (ACTIVE)" model.topic_words(topic_id, top_n: 8).each { |word, prob| puts " #{word}: #{prob}" } end ``` -------------------------------- ### Initialize LDA Model with Default Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Creates a new LDA model instance using default parameters. Useful for quick setup when default values are suitable. ```ruby Tomoto::LDA.new( tw: :one, min_cf: 0, min_df: 0, rm_top: 0, k: 1, alpha: 0.1, eta: 0.01, seed: nil ) ``` -------------------------------- ### Explore Hierarchical Topic Structure Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md Initializes an HLDA model and demonstrates how to find root topics, traverse the hierarchy to get descendants, and display direct children with their top words. Useful for detailed analysis of topic relationships. ```ruby model = Tomoto::HLDA.new(depth: 4, gamma: 0.1) # ... add documents and train ... # Find all root topics (level 0) roots = (0...model.k).select { |t| model.live_topic?(t) && model.level(t) == 0 } puts "Root topics: #{roots.inspect}" # For each root, traverse hierarchy roots.each do |root_id| puts "\nRoot Topic #{root_id}:" words = model.topic_words(root_id, top_n: 5) puts " Top words: #{words.keys.join(", ")}" # Get all descendants def get_descendants(model, topic_id) children = model.children_topics(topic_id) all = children.dup children.each { |c| all.concat(get_descendants(model, c)) } all.uniq end descendants = get_descendants(model, root_id) puts " #{descendants.length} descendant topics" # Show first level children children = model.children_topics(root_id) puts " Direct children: #{children.inspect}" children.each do |child| child_words = model.topic_words(child, top_n: 3) puts " Topic #{child}: #{child_words.keys.join(", ")}" end end ``` -------------------------------- ### Analyze Prior Covariance Matrix Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md Example of accessing and interpreting the prior covariance matrix, including its dimensions and the variances of individual topics (diagonal elements). ```ruby model.train(100) cov = model.prior_cov puts "Covariance matrix dimensions: #{cov.length} × #{cov[0].length}" # Print diagonal elements (variances) cov.each_with_index do |row, i| puts "Variance of topic #{i}: #{row[i].round(4)}" end ``` -------------------------------- ### Get Topic Correlations Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md Retrieves the correlations between topics. Specify a `topic_id` to get correlations for that specific topic, or call without an argument to get all pairwise correlations. ```ruby corr = model.correlations(0) all_corr = model.correlations() ``` -------------------------------- ### Topic Correlation Network Analysis Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md A comprehensive example demonstrating how to build a CT model, train it on documents, and then analyze and visualize the correlations between topics, highlighting related topics and their key words. ```ruby model = Tomoto::CT.new(k: 8, min_cf: 2) documents = [ ["machine", "learning", "algorithms"], ["deep", "neural", "networks"], ["natural", "language", "processing"], ["computer", "vision", "images"], ["reinforcement", "learning", "agent"], ["supervised", "training", "data"], ["text", "documents", "analysis"], ["image", "recognition", "classification"] ] documents.each { |doc| model.add_doc(doc) } model.burn_in = 50 model.train(100) # Analyze topic correlations puts "Topic Correlation Network:" puts "=" * 50 model.k.times do |topic_id| corr = model.correlations(topic_id) top_corr = corr.sort_by { |_, v| -v.abs }.first(3) words = model.topic_words(topic_id, top_n: 3).keys puts "\nTopic #{topic_id}: #{words.join(", ")}" top_corr.each do |other_topic, correlation| other_words = model.topic_words(other_topic, top_n": 3).keys status = correlation > 0 ? "+" : "-" puts " #{status} Topic #{other_topic} (#{correlation.round(3)}): #{other_words.join(", ")}" end end ``` -------------------------------- ### Create PLDA Model with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/plda.md Instantiates a PLDA model with a specific number of latent topics, alpha value, and a random seed for reproducibility. This is a common starting point for experiments. ```ruby model = Tomoto::PLDA.new(latent_topics: 5, alpha: 0.1, seed: 42) ``` -------------------------------- ### Annual Topic Evolution Analysis Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/dt.md Models and analyzes topic evolution over several years using the DT model. This example demonstrates adding documents per year, training the model, and then inspecting topic distributions and evolution. ```ruby # Model topic evolution over 5 years model = Tomoto::DT.new(k: 10, t: 5, min_cf: 2) documents = [ # 2019 documents (timepoint 0) {text: ["machine", "learning", "basics"], year: 0}, {text: ["neural", "network", "simple"], year: 0}, # 2020 documents (timepoint 1) {text: ["deep", "learning", "advanced"], year: 1}, {text: ["transformer", "models", "popular"], year: 1}, # 2021 documents (timepoint 2) {text: ["large", "language", "models"], year: 2}, {text: ["transformer", "scale", "performance"], year: 2}, # 2022 documents (timepoint 3) {text: ["diffusion", "models", "generation"], year: 3}, {text: ["generative", "ai", "creative"], year: 3}, # 2023 documents (timepoint 4) {text: ["multimodal", "learning", "vision"], year: 4}, {text: ["vision", "language", "integration"], year: 4} ] documents.each { |doc| model.add_doc(doc[:text], timepoint: doc[:year]) } puts "Documents per time period:" model.k.times do |topic_id| count = model.count_by_topics[topic_id] puts " Topic #{topic_id}: #{count} tokens" end model.burn_in = 50 model.train(100) puts model.summary(topic_word_top_n: 10) # Analyze topic evolution over time puts "\nTopic Evolution:" model.k.times do |topic_id| puts "\nTopic #{topic_id}:" # In DT, topics are indexed by time-topic pairs # This is a simplified view; actual time-specific analysis requires # accessing the underlying C++ structures words = model.topic_words(topic_id, top_n: 5) puts " Top words: #{words.keys.join(", ")}" end ``` -------------------------------- ### PA Model Training and Summary Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/pa.md Demonstrates creating a PA model, adding documents, training the model, and printing a summary. The total number of topics is k1 * k2. ```ruby # Create model with 5 super-topics and 8 sub-topics per super-topic # Total of 40 topics (5 * 8) model = Tomoto::PA.new(k1: 5, k2: 8, min_cf: 2) documents = [ ["machine", "learning", "classification"], ["deep", "neural", "networks"], ["supervised", "regression", "prediction"], ["unsupervised", "clustering", "kmeans"], ["sports", "basketball", "game"], ["football", "soccer", "team"], ["tennis", "racket", "match"], ["politics", "election", "voting"], ["government", "policy", "law"], ["legislation", "congress", "bill"] ] documents.each { |doc| model.add_doc(doc) } model.burn_in = 50 model.train(100) puts model.summary(topic_word_top_n: 10) # The model has k = k1 * k2 total topics puts "Total topics: #{model.k} (#{model.k1} super-topics × #{model.k2} sub-topics)" # View topics at different levels puts "\nTop-level structure:" model.k.times do |topic_id| # In PA, topic structure reflects hierarchy super_topic = topic_id / model.k2 sub_topic = topic_id % model.k2 words = model.topic_words(topic_id, top_n: 5) puts "Super-topic #{super_topic}, Sub-topic #{sub_topic}: #{words.keys.join(", ")}" end ``` -------------------------------- ### optim_interval Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Gets or sets the parameter optimization interval for the LDA model. ```APIDOC ## optim_interval (getter/setter) ### Description Gets or sets the parameter optimization interval. ### Method ```ruby model.optim_interval = value puts model.optim_interval ``` ### Parameters #### Path Parameters - **value** (Integer) - The optimization interval to set. ### Returns Integer representing the optimization interval. ``` -------------------------------- ### Get Topic Words Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Retrieve the words associated with each topic in the model. ```ruby model.topic_words ``` -------------------------------- ### Configure Parallel Training Options Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/README.md Demonstrates various options for parallel training, including disabling parallelization, using copy-merge or partition algorithms, and specifying the number of worker threads. ```ruby # Default: automatic parallelization model.train(100) # No parallelization (single-threaded) model.train(100, parallel: :none) # Copy-merge algorithm (good for small models) model.train(100, parallel: :copy_merge) # Partition algorithm (good for large models) model.train(100, parallel: :partition) # Specify worker count (0 = number of CPU cores) model.train(100, workers: 4) ``` -------------------------------- ### Configure SLDA Variables Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/errors.md Demonstrates valid and invalid configurations for the `vars` parameter in `Tomoto::SLDA.new()`, handling potential RuntimeErrors for unknown variable types. ```ruby begin # Valid: one linear variable model = Tomoto::SLDA.new(k: 10, vars: "l") # Valid: linear and binary model = Tomoto::SLDA.new(k: 10, vars: "lb") # Invalid: unknown variable type model = Tomoto::SLDA.new(k: 10, vars: "lx") rescue RuntimeError => e puts "Invalid variable configuration: #{e.message}" end ``` -------------------------------- ### Get Model Perplexity Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the perplexity score of the LDA model. Returns a float. ```ruby perp = model.perplexity ``` -------------------------------- ### Get Filtered Vocabulary Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the list of words remaining in the vocabulary after any filtering has been applied. ```ruby vocab = model.used_vocabs puts "Vocabulary size: #{vocab.length}" puts "First 10 words: #{vocab.take(10).inspect}" ``` -------------------------------- ### Get Number of Documents Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the total count of documents added to the LDA model. ```ruby count = model.num_docs ``` -------------------------------- ### Get Number of Topics (k) Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the configured number of topics in the LDA model. ```ruby num_topics = model.k ``` -------------------------------- ### Initialize PA Model with Default Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/pa.md Creates a new PA model instance with default parameters. Adjust parameters like k1, k2, alpha, and eta for specific modeling needs. ```ruby Tomoto::PA.new( tw: :one, min_cf: 0, min_df: 0, rm_top: 0, k1: 1, k2: 1, alpha: 0.1, eta: 0.01, seed: nil ) ``` -------------------------------- ### Get Eta Value Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the Dirichlet prior value for per-topic word distributions. ```ruby eta_value = model.eta ``` -------------------------------- ### Get Alpha Values Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the Dirichlet prior values for per-document topic distributions. ```ruby alpha_values = model.alpha puts alpha_values.inspect ``` ```ruby model = Tomoto::LDA.new(k: 10, alpha: 0.1) alpha_values = model.alpha puts alpha_values.inspect ``` -------------------------------- ### Load Model from File Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Load a previously saved topic model from a binary file. ```ruby model = Tomoto::LDA.load("model.bin") ``` -------------------------------- ### Set and Get Optimization Interval Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Sets or retrieves the interval in iterations between parameter optimizations. ```ruby model.optim_interval = 5 puts model.optim_interval # => 5 ``` -------------------------------- ### Count Words by Topic Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Get the total number of words assigned to each topic in the model. ```ruby model.count_by_topics ``` -------------------------------- ### Initialize and Train LDA Model Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Initializes an LDA model, adds documents, trains it, and then prints its summary. ```ruby model = Tomoto::LDA.new(k: 5) model.add_doc(["word1", "word2", "word3"]) model.train(100) puts model.summary ``` -------------------------------- ### Get Topic Probabilities for a Document Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Access a document from the model and retrieve its topic distribution. ```ruby doc = model.docs[0] doc.topics ``` -------------------------------- ### Get All Documents Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves all documents currently stored in the model. Returns an array of Document objects. ```ruby documents = model.docs ``` -------------------------------- ### Initialize SLDA Model with Default Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md Creates a new SLDA model instance with specified parameters. Use this to configure the model's behavior, including the number of topics and variable types. ```ruby Tomoto::SLDA.new( tw: :one, min_cf: 0, min_df: 0, rm_top: 0, k: 1, vars: "l", alpha: 0.1, eta: 0.01, mu: [], nu_sq: [], glm_param: [], seed: nil ) ``` -------------------------------- ### Get Word Frequencies (Complete Vocabulary) Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the frequencies of words in the complete vocabulary, before any filtering. ```ruby freqs = model.vocab_freq ``` -------------------------------- ### Get Word Frequencies (Filtered Vocabulary) Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the frequencies of words within the filtered vocabulary. ```ruby freqs = model.used_vocab_freq ``` -------------------------------- ### Get Complete Vocabulary Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the complete list of all unique words encountered before any vocabulary filtering. ```ruby vocab = model.vocabs ``` -------------------------------- ### Initialize CT Model with Default Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/ct.md Creates a new CT model instance with default parameters. Adjust parameters like `k` (number of topics), `alpha` (prior mean), and `eta` (prior Dirichlet) for your specific needs. ```ruby Tomoto::CT.new( tw: :one, min_cf: 0, min_df: 0, rm_top: 0, k: 1, alpha: 0.1, eta: 0.01, seed: nil ) ``` -------------------------------- ### Get Vocabulary Size (Before Filtering) Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the total size of the vocabulary, including words that may have been filtered out. ```ruby size = model.num_vocabs ``` -------------------------------- ### Get Total Number of Words Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the total count of all word tokens across all documents in the model. ```ruby count = model.num_words ``` -------------------------------- ### Initialize and Train SLDA Model Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/slda.md Initializes an SLDA model with specified parameters, adds documents with associated response variables, and trains the model. Use this to build a supervised topic model. ```ruby model = Tomoto::SLDA.new(k: 8, vars: "lb", alpha: 0.1, eta: 0.01, seed: 42) reviews = [ {text: ["great", "buy", "again"], rating: 9.0, purchased: 1}, {text: ["okay", "decent"], rating: 6.0, purchased: 1}, {text: ["awful", "regret"], rating: 2.0, purchased: 0}, {text: ["amazing", "recommended"], rating: 10.0, purchased: 1} ] reviews.each do |review| model.add_doc(review[:text], y: [review[:rating], review[:purchased]]) end model.burn_in = 50 model.train(200) ``` -------------------------------- ### Get Removed Words from Vocabulary Filtering Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves an array of words that were removed from the vocabulary due to filtering, such as `rm_top`. ```ruby model = Tomoto::LDA.new(k: 10, rm_top: 20) # ... train ... removed = model.removed_top_words puts "Removed: #{removed.inspect}" ``` -------------------------------- ### Add Documents, Train, and Inspect Topics Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Adds two documents to the model, trains the model, and then iterates through each document to print its assigned topics. This demonstrates adding data and analyzing results. ```ruby model.add_doc(["word1", "word2"]) model.add_doc(["word3", "word4"]) model.train(100) docs = model.docs docs.each_with_index do |doc, idx| puts "Document #{idx}: #{doc.topics.inspect}" end ``` -------------------------------- ### Get Log Likelihood Per Word Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Calculate and retrieve the log likelihood per word for the trained model. ```ruby model.ll_per_word ``` -------------------------------- ### Get Global Training Step Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the current training iteration number (global step) of the model. Returns an integer. ```ruby step = model.global_step ``` -------------------------------- ### Get Document Frequency for All Vocabulary Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the document frequency for all words in the model's vocabulary. Returns an array of integers. ```ruby df = model.vocab_df ``` -------------------------------- ### Set and Get Burn-in Iterations Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Sets or retrieves the number of initial training iterations whose statistics are discarded as burn-in. ```ruby model.burn_in = 100 puts model.burn_in # => 100 ``` ```ruby model = Tomoto::LDA.new(k: 10) model.add_doc(["word1", "word2"]) model.burn_in = 50 model.train(100) ``` -------------------------------- ### MGLDA Initialization for Scientific Papers Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/mglda.md Initializes an MGLDA model with parameters tailored for analyzing scientific papers, setting a higher number of global and local topics and a larger context window for grains. Adjust alpha priors for global and local topics as needed. ```ruby model = Tomoto::MGLDA.new( k_g: 10, # 10 global research topics k_l: 20, # 20 local method topics t: 4, # 4-word context alpha_g: 0.1, alpha_l: 0.05 ) ``` -------------------------------- ### Perform Inference on Unseen Document Source: https://github.com/ankane/tomoto-ruby/blob/master/README.md Create a new document and perform inference to get its topic distribution and log likelihood. ```ruby doc = model.make_doc(["unseen", "doc"]) topic_dist, ll = model.infer(doc) ``` -------------------------------- ### Get Term Weighting Scheme Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the term weighting scheme used by the model. Returns a symbol, which can be :one, :idf, or :pmi. ```ruby weight = model.tw ``` -------------------------------- ### Initialize PA Model with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/pa.md Creates a new PA model instance with specified super-topics (k1), sub-topics per super-topic (k2), alpha, and a random seed for reproducibility. ```ruby model = Tomoto::PA.new(k1: 5, k2: 10, alpha: 0.1, seed: 42) ``` -------------------------------- ### Initialize and Train GDMR Model with Two Numeric Metadata Dimensions Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/gdmr.md Initializes a GDMR model with 8 topics and linear relationships for two metadata dimensions (popularity and time). Documents are added with their respective metadata, and the model is trained. ```ruby model = Tomoto::GDMR.new( k: 8, degrees: [1, 1], # both linear sigma: 1.5, sigma0: 3.0 ) documents = [ {text: ["viral", "trending", "social"], popularity: 9.5, time: 1.0}, {text: ["viral", "popular", "share"], popularity: 8.8, time: 1.2}, {text: ["niche", "obscure", "rare"], popularity: 2.3, time: 0.5}, {text: ["niche", "unknown", "hidden"], popularity: 1.9, time: 0.3} ] documents.each do |doc| model.add_doc(doc[:text], numeric_metadata: [doc[:popularity], doc[:time]]) end model.train(100) # Topics now reflect how popularity and time affect topic distributions puts model.summary ``` -------------------------------- ### Get Document Frequency for Used Vocabulary Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/lda.md Retrieves the document frequency for words present in the filtered vocabulary. Returns an array of integers. ```ruby df = model.used_vocab_df ``` -------------------------------- ### Retrieving Topic Words Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/types.md Get the top N words for a specific topic or all topics. The result is a hash mapping words to their probabilities. ```ruby model = Tomoto::LDA.new(k: 5) # ... add documents and train ... # Get top 10 words for topic 0 words = model.topic_words(0, top_n: 10) words.each { |word, prob| puts "#{word}: #{prob}" } ``` ```ruby # Get all topics all_topics = model.topic_words(top_n: 10) # Array of Hashes ``` -------------------------------- ### Get Number of Documents for a Topic Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md Returns the count of documents assigned to a specific topic. This metric helps in assessing the prevalence of a topic. ```ruby doc_count = model.num_docs_of_topic(0) ``` -------------------------------- ### Train HLDA and Print Hierarchy Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hlda.md Initializes an HLDA model, adds documents, trains the model, and then recursively prints the discovered topic hierarchy. Use this to visualize the discovered topic structure. ```ruby model = Tomoto::HLDA.new(depth: 3, min_cf: 2) documents = [ ["machine", "learning", "classification"], ["deep", "neural", "networks"], ["computer", "vision", "image"], ["natural", "language", "processing"], ["sports", "basketball", "team"], ["sports", "football", "game"], ["politics", "election", "vote"], ["politics", "government", "policy"] ] documents.each { |doc| model.add_doc(doc) } model.burn_in = 50 model.train(100) puts "Discovered hierarchy with depth: #{model.depth}" # Print hierarchy structure def print_hierarchy(model, topic_id, indent = 0) return unless model.live_topic?(topic_id) words = model.topic_words(topic_id, top_n: 5).keys.join(", ") puts "#{" " * indent}Topic #{topic_id} (Level #{model.level(topic_id)}): #{words}" model.children_topics(topic_id).sort.each do |child| print_hierarchy(model, child, indent + 1) end end print_hierarchy(model, 0) ``` -------------------------------- ### HDP Constructor with Custom Parameters Source: https://github.com/ankane/tomoto-ruby/blob/master/_autodocs/api-reference/hdp.md Initializes an HDP model with specific parameters for controlling topic discovery and prior distributions. Set a random seed for reproducible results. ```ruby model = Tomoto::HDP.new(initial_k: 5, alpha: 0.1, gamma: 0.1, seed: 42) ```