NexusCart MBA Engine (njdc/labrep2)

NexusCart MBA Engine

https://github.com/njdc/labrep2
Admin
NexusCart MBA Engine is a self-learning Market Basket Analysis system that uses FP-Growth algorithm...

Tokens:6,997
Snippets:44
Trust Score:2.1
Update:2 months ago
Show doc for...
Context Summary (auto-generated)
Raw
# NexusCart MBA Engine

NexusCart MBA Engine is a self-learning Market Basket Analysis system designed for e-commerce platforms. It uses the FP-Growth algorithm to discover frequent itemsets and association rules from transaction data, then automatically converts these patterns into actionable business recommendations including homepage rankings, cross-sell suggestions, bundle deals, and promotional campaigns. The engine differentiates itself from traditional Apriori-based solutions by compressing the entire transaction database into an FP-Tree structure using only two database passes, making it significantly faster for dense e-commerce baskets.

The system features intelligent self-learning mechanisms that adapt over time: an AutoThresholdTuner that binary-searches for optimal support/confidence parameters, a RuleStabilityTracker that monitors which rules persist across data iterations (labeling them as Stable/Emerging/New), and a DriftDetector that flags significant changes in buying patterns. These components work together to ensure the recommendation engine continuously improves without manual intervention, automatically adjusting to seasonal trends, new product launches, and shifting customer preferences.

## MBAEngine - Core Orchestrator

The MBAEngine class is the central orchestrator that wires together FP-Growth mining, self-learning mechanisms, and recommendation generation into a single pipeline. It accepts a product catalog and processes transactions through multiple iterations, automatically tuning thresholds and tracking pattern stability across each run.

```python
from engine.mba_engine import MBAEngine
from engine.data_generator import GAMETECH_PRODUCTS, generate_all_batches

# Initialize engine with product catalog
products = GAMETECH_PRODUCTS
engine = MBAEngine(products)

# Generate transaction batches (simulating data growth over time)
batches = generate_all_batches("A")  # Returns {1: [...], 2: [...], 3: [...]}

# Run 3 iterations of self-learning
for iteration in range(1, 4):
    result = engine.run(batches[iteration], iteration=iteration)

    # Access key outputs
    print(f"Iteration {iteration}:")
    print(f"  Transactions: {result['stats']['total_transactions']}")
    print(f"  Auto-tuned min_support: {result['config']['min_support']}")
    print(f"  Auto-tuned min_confidence: {result['config']['min_confidence']}")
    print(f"  Frequent itemsets found: {result['stats']['frequent_itemsets']}")
    print(f"  Association rules: {result['stats']['rules_count']}")
    print(f"  Stability report: {result['stability_report']}")
    print(f"  Drift events: {len(result['drift_report'])}")

    # Access recommendations
    homepage = result['homepage_ranking'][:5]
    bundles = result['bundles'][:3]
    cross_sell = result['cross_sell']
    promos = result['promos']
    insights = result['insights']

# Output structure includes:
# - config: algorithm settings and tuning history
# - stats: transaction and pattern statistics
# - frequent_itemsets: top 40 itemsets with support
# - rules: association rules with all metrics
# - stability_report: {stable: N, emerging: N, new: N}
# - drift_report: rules with significant support changes
# - homepage_ranking, frequently_bought_together, cross_sell, bundles, promos, insights
```

## fp_growth - Frequent Pattern Mining

The fp_growth function implements the FP-Growth algorithm from scratch, mining frequent itemsets without candidate generation. It builds a compressed FP-Tree in two database passes and recursively extracts patterns through conditional pattern bases, achieving O(n × avg_basket × tree_depth) complexity compared to Apriori's O(n × 2^k).

```python
from engine.fp_growth import fp_growth, generate_rules

# Sample transaction data (each list = one basket)
transactions = [
    ["G001", "G002", "G003"],  # Laptop + Mouse + Keyboard
    ["G001", "G004"],          # Laptop + Monitor
    ["G010", "G013"],          # PS5 + Controller
    ["G007", "G008", "G009"],  # GPU + SSD + RAM
    ["G001", "G002", "G015"],  # Laptop + Mouse + USB Hub
    ["G010", "G006", "G013"],  # PS5 + Headset + Controller
    ["G003", "G002"],          # Keyboard + Mouse
    ["G001", "G002", "G003", "G004"],  # Full PC setup
    # ... more transactions
]

# Mine frequent itemsets (min_support = 20%)
min_support = 0.02
frequent_itemsets = fp_growth(transactions, min_support)

# Returns dict: {frozenset({'G001', 'G002'}): 4, frozenset({'G010'}): 2, ...}
for itemset, count in sorted(frequent_itemsets.items(), key=lambda x: -x[1])[:10]:
    support = count / len(transactions)
    print(f"Itemset: {sorted(itemset)}, Count: {count}, Support: {support:.2%}")

# Generate association rules from frequent itemsets
n_transactions = len(transactions)
min_confidence = 0.5
rules = generate_rules(frequent_itemsets, n_transactions, min_confidence)

# Each rule includes: antecedents, consequents, support, confidence, lift, leverage, conviction
for rule in rules[:5]:
    print(f"Rule: {rule['antecedents']} -> {rule['consequents']}")
    print(f"  Support: {rule['support']:.4f}")
    print(f"  Confidence: {rule['confidence']:.4f}")
    print(f"  Lift: {rule['lift']:.4f}")
    print(f"  Leverage: {rule['leverage']:.4f}")
    print(f"  Conviction: {rule['conviction']:.4f}")
```

## AutoThresholdTuner - Self-Tuning Support/Confidence

The AutoThresholdTuner automatically finds optimal min_support and min_confidence values using binary search to produce a target number of high-quality rules (8-25 rules with quality score >= 0.35). Quality is computed as 50% normalized lift + 30% confidence + 20% normalized support.

```python
from engine.self_learner import AutoThresholdTuner
from engine.fp_growth import fp_growth, generate_rules

# Initialize tuner with custom ranges
tuner = AutoThresholdTuner(
    sup_range=(0.01, 0.30),   # Support search range
    conf_range=(0.30, 0.90),  # Confidence search range
    max_steps=14              # Maximum binary search iterations
)

# Define mining function that tuner will call
def mine_fn(min_sup, min_conf):
    itemsets = fp_growth(transactions, min_sup)
    return generate_rules(itemsets, len(transactions), min_conf)

# Run auto-tuning
min_sup, min_conf, rules, history, reason = tuner.tune(mine_fn)

print(f"Optimal min_support: {min_sup}")
print(f"Optimal min_confidence: {min_conf}")
print(f"Rules found: {len(rules)}")
print(f"Tuning reason: {reason}")

# Inspect tuning history (each step of binary search)
for step in history:
    print(f"Step {step['step']}: sup={step['min_sup']:.4f}, conf={step['min_conf']:.4f}, "
          f"total_rules={step['total_rules']}, quality_rules={step['quality_rules']}")

# Get composite quality score for any rule
for rule in rules[:3]:
    score = tuner.composite_score(rule)
    print(f"Rule {rule['antecedents']} -> {rule['consequents']}: quality={score}")
```

## RuleStabilityTracker - Cross-Iteration Monitoring

The RuleStabilityTracker monitors which association rules appear consistently across multiple data iterations, labeling them as Stable (>=66% of iterations), Emerging (33-65%), or New (current iteration only). This helps identify reliable patterns versus transient correlations.

```python
from engine.self_learner import RuleStabilityTracker

# Initialize tracker
tracker = RuleStabilityTracker()

# Simulate 3 iterations of rule discovery
iteration_1_rules = [
    {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.15, "confidence": 0.70, "lift": 2.5},
    {"antecedents": ["G010"], "consequents": ["G013"], "support": 0.12, "confidence": 0.80, "lift": 3.2},
]

iteration_2_rules = [
    {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.18, "confidence": 0.72, "lift": 2.6},
    {"antecedents": ["G010"], "consequents": ["G013"], "support": 0.14, "confidence": 0.82, "lift": 3.4},
    {"antecedents": ["G007"], "consequents": ["G008"], "support": 0.10, "confidence": 0.65, "lift": 2.1},
]

iteration_3_rules = [
    {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.20, "confidence": 0.75, "lift": 2.8},
    {"antecedents": ["G007"], "consequents": ["G008"], "support": 0.11, "confidence": 0.68, "lift": 2.3},
    {"antecedents": ["G005"], "consequents": ["G014"], "support": 0.08, "confidence": 0.60, "lift": 1.9},
]

# Process each iteration
for i, rules in enumerate([iteration_1_rules, iteration_2_rules, iteration_3_rules], 1):
    tracker.update(rules, iteration=i)
    annotated_rules = tracker.annotate(rules)

    print(f"\nIteration {i} Stability Report: {tracker.report()}")
    for rule in annotated_rules:
        print(f"  {rule['antecedents']} -> {rule['consequents']}: "
              f"{rule['stability']} (score: {rule['stability_score']})")

# Output shows rules progressing from New -> Emerging -> Stable across iterations
```

## DriftDetector - Pattern Change Detection

The DriftDetector identifies significant shifts in rule support between iterations, flagging rules as "Drift" (8-20% change) or "Major Drift" (>=20% change) with direction indicators (rising/falling). This enables the system to adapt to evolving customer behavior patterns.

```python
from engine.self_learner import DriftDetector

# Initialize detector
detector = DriftDetector()

# Iteration 1 rules (baseline)
rules_iter1 = [
    {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.15, "confidence": 0.70, "lift": 2.5},
    {"antecedents": ["G016"], "consequents": ["G017"], "support": 0.10, "confidence": 0.65, "lift": 2.0},
]

# Iteration 2 rules (streaming gear popularity BOOMS)
rules_iter2 = [
    {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.16, "confidence": 0.72, "lift": 2.6},
    {"antecedents": ["G016"], "consequents": ["G017"], "support": 0.18, "confidence": 0.75, "lift": 2.8},  # +80% support!
]

# Detect and annotate drift
rules_iter1, report1 = detector.detect_and_annotate(rules_iter1, iteration=1)
print(f"Iteration 1 drift events: {len(report1)}")  # 0 (no prior data)

rules_iter2, report2 = detector.detect_and_annotate(rules_iter2, iteration=2)
print(f"Iteration 2 drift events: {len(report2)}")

for event in report2:
    print(f"  Rule: {event['rule_ant']} -> {event['rule_con']}")
    print(f"    Level: {event['level']}, Direction: {event['direction']}")
    print(f"    Change: {event['rel_change']:.1%} ({event['prev_support']:.4f} -> {event['curr_support']:.4f})")

# Access drift annotation on individual rules
for rule in rules_iter2:
    if rule.get('drift') and rule['drift'].get('level'):
        print(f"Drifting rule: {rule['antecedents']} -> {rule['consequents']}")
        print(f"  {rule['drift']['direction']} by {rule['drift']['rel_change']:.1%}")
```

## rank_homepage - Product Ranking Algorithm

The rank_homepage function scores every product based on its appearance in association rules and individual popularity, prioritizing items that are strong consequents (pulled by popular anchors) for homepage placement.

```python
from engine.recommender import rank_homepage

# Product catalog
products = {
    "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"},
    "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"},
    "G003": {"name": "Mechanical Keyboard", "price": 129.00, "category": "Peripherals"},
    "G010": {"name": "PS5 Console", "price": 499.00, "category": "Consoles"},
    "G013": {"name": "Controller", "price": 69.00, "category": "Peripherals"},
}

# Mined rules and itemsets
rules = [
    {"antecedents": ["G001"], "consequents": ["G002"], "support": 0.15, "lift": 2.5},
    {"antecedents": ["G001"], "consequents": ["G003"], "support": 0.12, "lift": 2.2},
    {"antecedents": ["G010"], "consequents": ["G013"], "support": 0.20, "lift": 3.5},
]

frequent_itemsets = {
    frozenset(["G001"]): 150,
    frozenset(["G002"]): 120,
    frozenset(["G010"]): 180,
    frozenset(["G013"]): 160,
}

n_transactions = 1000

# Generate homepage ranking
ranking = rank_homepage(products, rules, frequent_itemsets, n_transactions)

# Ranking formula: base + (lift × support × 12) for consequents
#                      + (support × 6) for antecedents
#                      + (item_support × 4) for popularity
for item in ranking[:5]:
    print(f"Rank {item['rank']}: {products[item['product_id']]['name']} (score: {item['score']})")
```

## build_cross_sell_map - Cart Suggestions

The build_cross_sell_map function creates per-product recommendation lists for cart sidebar display. When a user adds product X, the system looks up cross_sell[X] to show complementary items ranked by confidence × lift.

```python
from engine.recommender import build_cross_sell_map

products = {
    "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"},
    "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"},
    "G003": {"name": "Mechanical Keyboard", "price": 129.00, "category": "Peripherals"},
    "G015": {"name": "USB Hub", "price": 49.00, "category": "Accessories"},
}

rules = [
    {"antecedents": ["G001"], "consequents": ["G002"], "confidence": 0.75, "lift": 2.8},
    {"antecedents": ["G001"], "consequents": ["G003"], "confidence": 0.68, "lift": 2.5},
    {"antecedents": ["G001"], "consequents": ["G015"], "confidence": 0.55, "lift": 2.1},
    {"antecedents": ["G002"], "consequents": ["G003"], "confidence": 0.60, "lift": 2.0},
]

# Build cross-sell map (top 5 recommendations per product)
cross_sell = build_cross_sell_map(products, rules, top_per_item=5)

# When user adds Gaming Laptop to cart:
trigger_product = "G001"
recommendations = cross_sell.get(trigger_product, [])

print(f"Customer added: {products[trigger_product]['name']}")
print("Recommended cross-sells:")
for rec in recommendations:
    print(f"  - {products[rec['product_id']]['name']}")
    print(f"    Score: {rec['score']}, Confidence: {rec['confidence']:.0%}")
    print(f"    Reason: {rec['reason']}")
```

## generate_bundles - Dynamic Bundle Pricing

The generate_bundles function creates discount bundle deals from high-support multi-item frequent itemsets. Discounts scale with bundle size: 10% for 2-item, 15% for 3-item, 20% for 4+ item bundles.

```python
from engine.recommender import generate_bundles

products = {
    "G007": {"name": "Graphics Card", "price": 599.00, "category": "Components"},
    "G008": {"name": "NVMe SSD", "price": 129.00, "category": "Components"},
    "G009": {"name": "DDR5 RAM", "price": 199.00, "category": "Components"},
    "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"},
    "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"},
}

# Frequent itemsets from FP-Growth
frequent_itemsets = {
    frozenset(["G007", "G008", "G009"]): 85,  # PC Build bundle
    frozenset(["G007", "G008"]): 120,          # GPU + SSD
    frozenset(["G001", "G002"]): 150,          # Laptop + Mouse
}

n_transactions = 1000

# Generate top 8 bundles
bundles = generate_bundles(products, frequent_itemsets, n_transactions, top_k=8)

for bundle in bundles:
    print(f"Bundle: {' + '.join(bundle['item_names'])}")
    print(f"  Support: {bundle['support']:.2%}")
    print(f"  Original: ${bundle['original_price']:.2f}")
    print(f"  Bundle Price: ${bundle['bundle_price']:.2f} ({bundle['discount_pct']}% off)")
    print(f"  Savings: ${bundle['savings']:.2f}")
```

## generate_promos - Promotional Copy Generation

The generate_promos function creates dynamic promotional campaigns from high-lift association rules, generating bundle discounts for single-antecedent rules and "buy 2 get 1" deals for multi-antecedent rules.

```python
from engine.recommender import generate_promos

products = {
    "G001": {"name": "Gaming Laptop", "price": 1499.00, "category": "Computers"},
    "G002": {"name": "Wireless Mouse", "price": 79.00, "category": "Peripherals"},
    "G003": {"name": "Mechanical Keyboard", "price": 129.00, "category": "Peripherals"},
    "G015": {"name": "USB Hub", "price": 49.00, "category": "Accessories"},
}

# High-lift rules eligible for promotions (lift >= 2.0, confidence >= 0.45)
rules = [
    {"antecedents": ["G001"], "consequents": ["G002"], "lift": 3.2, "confidence": 0.75},
    {"antecedents": ["G002", "G003"], "consequents": ["G015"], "lift": 2.8, "confidence": 0.65},
    {"antecedents": ["G001"], "consequents": ["G003"], "lift": 2.5, "confidence": 0.60},
]

# Generate top 5 promotional campaigns
promos = generate_promos(products, rules, top_k=5)

for promo in promos:
    print(f"Promo: {promo['title']}")
    print(f"  Type: {promo['type']}")
    print(f"  Description: {promo['description']}")
    print(f"  Discount: {promo['discount']}%")
    print(f"  Trigger items: {promo['trigger_items']}")
    print(f"  Target items: {promo['target_items']}")

# Output examples:
# - "Gaming Laptop Bundle" (bundle_discount): "Buy Gaming Laptop and save 20% on Wireless Mouse"
# - "Combo Deal: Wireless Mouse + Mechanical Keyboard" (buy2get1): "Add ... and get USB Hub at 25% off!"
```

## generate_all_batches - Transaction Data Generation

The generate_all_batches function creates cumulative transaction batches for testing the self-learning pipeline. It uses product affinity groups with tunable probabilities and batch multipliers to simulate evolving buying patterns over time (e.g., holiday trends, seasonal shifts).

```python
from engine.data_generator import (
    generate_all_batches, save_to_csv, load_from_csv, compute_stats,
    GAMETECH_PRODUCTS, OFFICEPRO_PRODUCTS
)

# Generate 3 cumulative batches for GameTech Store (Dataset A)
# Batch 1: 1000 transactions (baseline)
# Batch 2: 1000 + 600 = 1600 transactions (streaming gear trend)
# Batch 3: 1600 + 400 = 2000 transactions (gaming room setup trend)
batches_a = generate_all_batches("A")

print(f"Dataset A - GameTech Store:")
print(f"  Iteration 1: {len(batches_a[1])} transactions")
print(f"  Iteration 2: {len(batches_a[2])} transactions (cumulative)")
print(f"  Iteration 3: {len(batches_a[3])} transactions (cumulative)")

# Generate for OfficePro Store (Dataset B)
batches_b = generate_all_batches("B")

# Save transactions to CSV
save_to_csv(batches_a[3], "data/dataset_a_full.csv")

# Load transactions from CSV
transactions = load_from_csv("data/dataset_a_full.csv")

# Compute dataset statistics
stats = compute_stats(transactions, GAMETECH_PRODUCTS)
print(f"\nDataset Statistics:")
print(f"  Total transactions: {stats['total_transactions']}")
print(f"  Unique items: {stats['unique_items']}")
print(f"  Avg basket size: {stats['avg_basket_size']}")
print(f"  Basket size range: {stats['min_basket_size']} - {stats['max_basket_size']}")

# Item frequencies show individual product popularity
for pid, freq in list(stats['item_frequencies'].items())[:5]:
    print(f"  {freq['name']}: {freq['count']} ({freq['support']:.1%})")
```

## Command Line Interface

The main.py script provides a CLI for running the full self-learning pipeline on both datasets, generating JSON outputs, and exporting data for the web frontend dashboard.

```bash
# Run full pipeline on both datasets (GameTech A + OfficePro B)
python main.py

# Process only Dataset A (GameTech Store)
python main.py --dataset A

# Process only Dataset B (OfficePro Store)
python main.py --dataset B

# Run with verbose output (default: enabled)
python main.py --verbose

# Output files generated:
# - data/dataset_a_iter1.csv, dataset_a_iter2.csv, dataset_a_iter3.csv
# - data/dataset_b_iter1.csv, dataset_b_iter2.csv, dataset_b_iter3.csv
# - output/dataset_a_iter1.json, dataset_a_iter2.json, dataset_a_iter3.json
# - output/dataset_b_iter1.json, dataset_b_iter2.json, dataset_b_iter3.json
# - frontend_data.js (for web dashboard)

# Then open index.html in browser to view interactive dashboard
```

## Summary

NexusCart MBA Engine serves e-commerce platforms requiring intelligent product recommendations that adapt over time. Primary use cases include: homepage product ranking based on association strength, cart cross-sell widgets showing "frequently bought together" items, automated bundle pricing with dynamic discounts, promotional campaign generation from high-lift rules, and business insights for shelf placement and category synergy. The self-learning mechanisms (threshold auto-tuning, rule stability tracking, drift detection) enable the system to continuously improve without manual intervention.

Integration follows a simple pattern: initialize MBAEngine with your product catalog, feed it transaction batches through the run() method across multiple iterations, and consume the structured output containing rankings, cross-sell maps, bundles, promos, and insights. The engine exports to both JSON files for API consumption and a JavaScript module for direct frontend integration. For custom deployments, individual components (fp_growth, AutoThresholdTuner, RuleStabilityTracker, DriftDetector, recommendation generators) can be imported and composed independently to fit specific pipeline architectures.