Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Add Docs
NexusCart MBA Engine
https://github.com/njdc/labrep2
Admin
NexusCart MBA Engine is a self-learning Market Basket Analysis system that uses FP-Growth algorithm
...
Tokens:
6,997
Snippets:
44
Trust Score:
2.1
Update:
2 months ago
Context
Skills
Chat
Benchmark
82.9
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# NexusCart MBA Engine NexusCart MBA Engine is a self-learning Market Basket Analysis system designed for e-commerce platforms. It uses the FP-Growth algorithm to discover frequent itemsets and association rules from transaction data, then automatically converts these patterns into actionable business recommendations including homepage rankings, cross-sell suggestions, bundle deals, and promotional campaigns. The engine differentiates itself from traditional Apriori-based solutions by compressing the entire transaction database into an FP-Tree structure using only two database passes, making it significantly faster for dense e-commerce baskets. The system features intelligent self-learning mechanisms that adapt over time: an AutoThresholdTuner that binary-searches for optimal support/confidence parameters, a RuleStabilityTracker that monitors which rules persist across data iterations (labeling them as Stable/Emerging/New), and a DriftDetector that flags significant changes in buying patterns. These components work together to ensure the recommendation engine continuously improves without manual intervention, automatically adjusting to seasonal trends, new product launches, and shifting customer preferences. ## MBAEngine - Core Orchestrator The MBAEngine class is the central orchestrator that wires together FP-Growth mining, self-learning mechanisms, and recommendation generation into a single pipeline. It accepts a product catalog and processes transactions through multiple iterations, automatically tuning thresholds and tracking pattern stability across each run. ```python from engine.mba_engine import MBAEngine from engine.data_generator import GAMETECH_PRODUCTS, generate_all_batches # Initialize engine with product catalog products = GAMETECH_PRODUCTS engine = MBAEngine(products) # Generate transaction batches (simulating data growth over time) batches = generate_all_batches("A") # Returns {1: [...], 2: [...], 3: [...]} # Run 3 iterations of self-learning for iteration in range(1, 4): result = engine.run(batches[iteration], iteration=iteration) # Access key outputs print(f"Iteration {iteration}:") print(f" Transactions: {result['stats']['total_transactions']}") print(f" Auto-tuned min_support: {result['config']['min_support']}") print(f" Auto-tuned min_confidence: {result['config']['min_confidence']}") print(f" Frequent itemsets found: {result['stats']['frequent_itemsets']}") print(f" Association rules: {result['stats']['rules_count']}") print(f" Stability report: {result['stability_report']}") print(f" Drift events: {len(result['drift_report'])}") # Access recommendations homepage = result['homepage_ranking'][:5] bundles = result['bundles'][:3] cross_sell = result['cross_sell'] promos = result['promos'] insights = result['insights'] # Output structure includes: # - config: algorithm settings and tuning history # - stats: transaction and pattern statistics # - frequent_itemsets: top 40 itemsets with support # - rules: association rules with all metrics # - stability_report: {stable: N, emerging: N, new: N} # - drift_report: rules with significant support changes # - homepage_ranking, frequently_bought_together, cross_sell, bundles, promos, insights ``` ## fp_growth - Frequent Pattern Mining The fp_growth function implements the FP-Growth algorithm from scratch, mining frequent itemsets without candidate generation. It builds a compressed FP-Tree in two database passes and recursively extracts patterns through conditional pattern bases, achieving O(n × avg_basket × tree_depth) complexity compared to Apriori's O(n × 2^k). ```python from engine.fp_growth import fp_growth, generate_rules # Sample transaction data (each list = one basket) transactions = [ ["G001", "G002", "G003"], # Laptop + Mouse + Keyboard ["G001", "G004"], # Laptop + Monitor ["G010", "G013"], # PS5 + Controller ["G007", "G008", "G009"], # GPU + SSD + RAM ["G001", "G002", "G015"], # Laptop + Mouse + USB Hub ["G010", "G006", "G013"], # PS5 + Headset + Controller ["G003", "G002"], # Keyboard + Mouse ["G001", "G002", "G003", "G004"], # Full PC setup # ... more transactions ] # Mine frequent itemsets (min_support = 20%) min_support = 0.02 frequent_itemsets = fp_growth(transactions, min_support) # Returns dict: {frozenset({'G001', 'G002'}): 4, frozenset({'G010'}): 2, ...} for itemset, count in sorted(frequent_itemsets.items(), key=lambda x: -x[1])[:10]: support = count / len(transactions) print(f"Itemset: {sorted(itemset)}, Count: {count}, Support: {support:.2%}") # Generate association rules from frequent itemsets n_transactions = len(transactions) min_confidence = 0.5 rules = generate_rules(frequent_itemsets, n_transactions, min_confidence) # Each rule includes: antecedents, consequents, support, confidence, lift, leverage, conviction for rule in rules[:5]: print(f"Rule: {rule['antecedents']} -> {rule['consequents']}") print(f" Support: {rule['support']:.4f}") print(f" Confidence: {rule['confidence']:.4f}") print(f" Lift: {rule['lift']:.4f}") print(f" Leverage: {rule['leverage']:.4f}") print(f" Conviction: {rule['conviction']:.4f}") ``` ## AutoThresholdTuner - Self-Tuning Support/Confidence The AutoThresholdTuner automatically finds optimal min_support and min_confidence values using binary search to produce a target number of high-quality rules (8-25 rules with quality score >= 0.35). Quality is computed as 50% normalized lift + 30% confidence + 20% normalized support. ```python from engine.self_learner import AutoThresholdTuner from engine.fp_growth import fp_growth, generate_rules # Initialize tuner with custom ranges tuner = AutoThresholdTuner( sup_range=(0.01, 0.30), # Support search range conf_range=(0.30, 0.90), # Confidence search range max_steps=14 # Maximum binary search iterations ) # Define mining function that tuner will call def mine_fn(min_sup, min_conf): itemsets = fp_growth(transactions, min_sup) return generate_rules(itemsets, len(transactions), min_conf) # Run auto-tuning min_sup, min_conf, rules, history, reason = tuner.tune(mine_fn) print(f"Optimal min_support: {min_sup}") print(f"Optimal min_confidence: {min_conf}") print(f"Rules found: {len(rules)}") print(f"Tuning reason: {reason}") # Inspect tuning history (each step of binary search) for step in history: print(f"Step {step['step']}: sup={step['min_sup']:.4f}, conf={step['min_conf']:.4f}, " f"total_rules={step['total_rules']}, quality_rules={step['quality_rules']}") # Get composite quality score for any rule for rule in rules[:3]: score = tuner.composite_score(rule) print(f"Rule {rule['antecedents']} -> {rule['consequents']}: quality={score}") ``` ## RuleStabilityTracker - Cross-Iteration Monitoring The RuleStabilityTracker monitors which association rules appear consistently across multiple data iterations, labeling them as Stable (>=66% of iterations), Emerging (33-65%), or New (current iteration only). This helps identify reliable patterns versus transient correlations. ```python from engine.self_learner import RuleStabilityTracker # Initialize tracker tracker = RuleStabilityTracker() # Simulate 3 iterations of rule discovery iteration_1_rules = [ {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.15, "confidence": 0.70, "lift": 2.5}, {"antecedents": ["G010"], "consequents": ["G013"], "support": 0.12, "confidence": 0.80, "lift": 3.2}, ] iteration_2_rules = [ {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.18, "confidence": 0.72, "lift": 2.6}, {"antecedents": ["G010"], "consequents": ["G013"], "support": 0.14, "confidence": 0.82, "lift": 3.4}, {"antecedents": ["G007"], "consequents": ["G008"], "support": 0.10, "confidence": 0.65, "lift": 2.1}, ] iteration_3_rules = [ {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.20, "confidence": 0.75, "lift": 2.8}, {"antecedents": ["G007"], "consequents": ["G008"], "support": 0.11, "confidence": 0.68, "lift": 2.3}, {"antecedents": ["G005"], "consequents": ["G014"], "support": 0.08, "confidence": 0.60, "lift": 1.9}, ] # Process each iteration for i, rules in enumerate([iteration_1_rules, iteration_2_rules, iteration_3_rules], 1): tracker.update(rules, iteration=i) annotated_rules = tracker.annotate(rules) print(f"\nIteration {i} Stability Report: {tracker.report()}") for rule in annotated_rules: print(f" {rule['antecedents']} -> {rule['consequents']}: " f"{rule['stability']} (score: {rule['stability_score']})") # Output shows rules progressing from New -> Emerging -> Stable across iterations ``` ## DriftDetector - Pattern Change Detection The DriftDetector identifies significant shifts in rule support between iterations, flagging rules as "Drift" (8-20% change) or "Major Drift" (>=20% change) with direction indicators (rising/falling). This enables the system to adapt to evolving customer behavior patterns. ```python from engine.self_learner import DriftDetector # Initialize detector detector = DriftDetector() # Iteration 1 rules (baseline) rules_iter1 = [ {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.15, "confidence": 0.70, "lift": 2.5}, {"antecedents": ["G016"], "consequents": ["G017"], "support": 0.10, "confidence": 0.65, "lift": 2.0}, ] # Iteration 2 rules (streaming gear popularity BOOMS) rules_iter2 = [ {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.16, "confidence": 0.72, "lift": 2.6}, {"antecedents": ["G016"], "consequents": ["G017"], "support": 0.18, "confidence": 0.75, "lift": 2.8}, # +80% support! ] # Detect and annotate drift rules_iter1, report1 = detector.detect_and_annotate(rules_iter1, iteration=1) print(f"Iteration 1 drift events: {len(report1)}") # 0 (no prior data) rules_iter2, report2 = detector.detect_and_annotate(rules_iter2, iteration=2) print(f"Iteration 2 drift events: {len(report2)}") for event in report2: print(f" Rule: {event['rule_ant']} -> {event['rule_con']}") print(f" Level: {event['level']}, Direction: {event['direction']}") print(f" Change: {event['rel_change']:.1%} ({event['prev_support']:.4f} -> {event['curr_support']:.4f})") # Access drift annotation on individual rules for rule in rules_iter2: if rule.get('drift') and rule['drift'].get('level'): print(f"Drifting rule: {rule['antecedents']} -> {rule['consequents']}") print(f" {rule['drift']['direction']} by {rule['drift']['rel_change']:.1%}") ``` ## rank_homepage - Product Ranking Algorithm The rank_homepage function scores every product based on its appearance in association rules and individual popularity, prioritizing items that are strong consequents (pulled by popular anchors) for homepage placement. ```python from engine.recommender import rank_homepage # Product catalog products = { "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"}, "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"}, "G003": {"name": "Mechanical Keyboard", "price": 129.00, "category": "Peripherals"}, "G010": {"name": "PS5 Console", "price": 499.00, "category": "Consoles"}, "G013": {"name": "Controller", "price": 69.00, "category": "Peripherals"}, } # Mined rules and itemsets rules = [ {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.15, "lift": 2.5}, {"antecedents": ["G001"], "consequents": ["G003"], "support": 0.12, "lift": 2.2}, {"antecedents": ["G010"], "consequents": ["G013"], "support": 0.20, "lift": 3.5}, ] frequent_itemsets = { frozenset(["G001"]): 150, frozenset(["G002"]): 120, frozenset(["G010"]): 180, frozenset(["G013"]): 160, } n_transactions = 1000 # Generate homepage ranking ranking = rank_homepage(products, rules, frequent_itemsets, n_transactions) # Ranking formula: base + (lift × support × 12) for consequents # + (support × 6) for antecedents # + (item_support × 4) for popularity for item in ranking[:5]: print(f"Rank {item['rank']}: {products[item['product_id']]['name']} (score: {item['score']})") ``` ## build_cross_sell_map - Cart Suggestions The build_cross_sell_map function creates per-product recommendation lists for cart sidebar display. When a user adds product X, the system looks up cross_sell[X] to show complementary items ranked by confidence × lift. ```python from engine.recommender import build_cross_sell_map products = { "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"}, "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"}, "G003": {"name": "Mechanical Keyboard", "price": 129.00, "category": "Peripherals"}, "G015": {"name": "USB Hub", "price": 49.00, "category": "Accessories"}, } rules = [ {"antecedents": ["G001"], "consequents": ["G002"], "confidence": 0.75, "lift": 2.8}, {"antecedents": ["G001"], "consequents": ["G003"], "confidence": 0.68, "lift": 2.5}, {"antecedents": ["G001"], "consequents": ["G015"], "confidence": 0.55, "lift": 2.1}, {"antecedents": ["G002"], "consequents": ["G003"], "confidence": 0.60, "lift": 2.0}, ] # Build cross-sell map (top 5 recommendations per product) cross_sell = build_cross_sell_map(products, rules, top_per_item=5) # When user adds Gaming Laptop to cart: trigger_product = "G001" recommendations = cross_sell.get(trigger_product, []) print(f"Customer added: {products[trigger_product]['name']}") print("Recommended cross-sells:") for rec in recommendations: print(f" - {products[rec['product_id']]['name']}") print(f" Score: {rec['score']}, Confidence: {rec['confidence']:.0%}") print(f" Reason: {rec['reason']}") ``` ## generate_bundles - Dynamic Bundle Pricing The generate_bundles function creates discount bundle deals from high-support multi-item frequent itemsets. Discounts scale with bundle size: 10% for 2-item, 15% for 3-item, 20% for 4+ item bundles. ```python from engine.recommender import generate_bundles products = { "G007": {"name": "Graphics Card", "price": 599.00, "category": "Components"}, "G008": {"name": "NVMe SSD", "price": 129.00, "category": "Components"}, "G009": {"name": "DDR5 RAM", "price": 199.00, "category": "Components"}, "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"}, "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"}, } # Frequent itemsets from FP-Growth frequent_itemsets = { frozenset(["G007", "G008", "G009"]): 85, # PC Build bundle frozenset(["G007", "G008"]): 120, # GPU + SSD frozenset(["G001", "G002"]): 150, # Laptop + Mouse } n_transactions = 1000 # Generate top 8 bundles bundles = generate_bundles(products, frequent_itemsets, n_transactions, top_k=8) for bundle in bundles: print(f"Bundle: {' + '.join(bundle['item_names'])}") print(f" Support: {bundle['support']:.2%}") print(f" Original: ${bundle['original_price']:.2f}") print(f" Bundle Price: ${bundle['bundle_price']:.2f} ({bundle['discount_pct']}% off)") print(f" Savings: ${bundle['savings']:.2f}") ``` ## generate_promos - Promotional Copy Generation The generate_promos function creates dynamic promotional campaigns from high-lift association rules, generating bundle discounts for single-antecedent rules and "buy 2 get 1" deals for multi-antecedent rules. ```python from engine.recommender import generate_promos products = { "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"}, "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"}, "G003": {"name": "Mechanical Keyboard", "price": 129.00, "category": "Peripherals"}, "G015": {"name": "USB Hub", "price": 49.00, "category": "Accessories"}, } # High-lift rules eligible for promotions (lift >= 2.0, confidence >= 0.45) rules = [ {"antecedents": ["G001"], "consequents": ["G002"], "lift": 3.2, "confidence": 0.75}, {"antecedents": ["G002", "G003"], "consequents": ["G015"], "lift": 2.8, "confidence": 0.65}, {"antecedents": ["G001"], "consequents": ["G003"], "lift": 2.5, "confidence": 0.60}, ] # Generate top 5 promotional campaigns promos = generate_promos(products, rules, top_k=5) for promo in promos: print(f"Promo: {promo['title']}") print(f" Type: {promo['type']}") print(f" Description: {promo['description']}") print(f" Discount: {promo['discount']}%") print(f" Trigger items: {promo['trigger_items']}") print(f" Target items: {promo['target_items']}") # Output examples: # - "Gaming Laptop Bundle" (bundle_discount): "Buy Gaming Laptop and save 20% on Wireless Mouse" # - "Combo Deal: Wireless Mouse + Mechanical Keyboard" (buy2get1): "Add ... and get USB Hub at 25% off!" ``` ## generate_all_batches - Transaction Data Generation The generate_all_batches function creates cumulative transaction batches for testing the self-learning pipeline. It uses product affinity groups with tunable probabilities and batch multipliers to simulate evolving buying patterns over time (e.g., holiday trends, seasonal shifts). ```python from engine.data_generator import ( generate_all_batches, save_to_csv, load_from_csv, compute_stats, GAMETECH_PRODUCTS, OFFICEPRO_PRODUCTS ) # Generate 3 cumulative batches for GameTech Store (Dataset A) # Batch 1: 1000 transactions (baseline) # Batch 2: 1000 + 600 = 1600 transactions (streaming gear trend) # Batch 3: 1600 + 400 = 2000 transactions (gaming room setup trend) batches_a = generate_all_batches("A") print(f"Dataset A - GameTech Store:") print(f" Iteration 1: {len(batches_a[1])} transactions") print(f" Iteration 2: {len(batches_a[2])} transactions (cumulative)") print(f" Iteration 3: {len(batches_a[3])} transactions (cumulative)") # Generate for OfficePro Store (Dataset B) batches_b = generate_all_batches("B") # Save transactions to CSV save_to_csv(batches_a[3], "data/dataset_a_full.csv") # Load transactions from CSV transactions = load_from_csv("data/dataset_a_full.csv") # Compute dataset statistics stats = compute_stats(transactions, GAMETECH_PRODUCTS) print(f"\nDataset Statistics:") print(f" Total transactions: {stats['total_transactions']}") print(f" Unique items: {stats['unique_items']}") print(f" Avg basket size: {stats['avg_basket_size']}") print(f" Basket size range: {stats['min_basket_size']} - {stats['max_basket_size']}") # Item frequencies show individual product popularity for pid, freq in list(stats['item_frequencies'].items())[:5]: print(f" {freq['name']}: {freq['count']} ({freq['support']:.1%})") ``` ## Command Line Interface The main.py script provides a CLI for running the full self-learning pipeline on both datasets, generating JSON outputs, and exporting data for the web frontend dashboard. ```bash # Run full pipeline on both datasets (GameTech A + OfficePro B) python main.py # Process only Dataset A (GameTech Store) python main.py --dataset A # Process only Dataset B (OfficePro Store) python main.py --dataset B # Run with verbose output (default: enabled) python main.py --verbose # Output files generated: # - data/dataset_a_iter1.csv, dataset_a_iter2.csv, dataset_a_iter3.csv # - data/dataset_b_iter1.csv, dataset_b_iter2.csv, dataset_b_iter3.csv # - output/dataset_a_iter1.json, dataset_a_iter2.json, dataset_a_iter3.json # - output/dataset_b_iter1.json, dataset_b_iter2.json, dataset_b_iter3.json # - frontend_data.js (for web dashboard) # Then open index.html in browser to view interactive dashboard ``` ## Summary NexusCart MBA Engine serves e-commerce platforms requiring intelligent product recommendations that adapt over time. Primary use cases include: homepage product ranking based on association strength, cart cross-sell widgets showing "frequently bought together" items, automated bundle pricing with dynamic discounts, promotional campaign generation from high-lift rules, and business insights for shelf placement and category synergy. The self-learning mechanisms (threshold auto-tuning, rule stability tracking, drift detection) enable the system to continuously improve without manual intervention. Integration follows a simple pattern: initialize MBAEngine with your product catalog, feed it transaction batches through the run() method across multiple iterations, and consume the structured output containing rankings, cross-sell maps, bundles, promos, and insights. The engine exports to both JSON files for API consumption and a JavaScript module for direct frontend integration. For custom deployments, individual components (fp_growth, AutoThresholdTuner, RuleStabilityTracker, DriftDetector, recommendation generators) can be imported and composed independently to fit specific pipeline architectures.