GMBA 621 / FINC 332 — Interactive Study Guide
Anomaly Detection &
Market Basket Analysis
Finding what doesn't belong and what goes together. Work through the activities below to test yourself on the concepts you need to know cold.
Anomaly Detection Market Basket Analysis The Pipeline
Progress
0 / 20 pts
01
Anomaly Detection
Your bank sends you a "Was this you?" text after an unusual purchase. That's anomaly detection — a model learned what your "normal" spending looks like (typical amounts, usual locations, common times) and flagged something that broke the pattern.
The goal is simple: learn what normal looks like, then flag anything that deviates. This is used everywhere — fraud detection, healthcare monitoring, cybersecurity, manufacturing quality control.
🔑 Key Concept
Point anomaly detection using simple statistical rules (Z-scores, IQR) is the most commonly deployed method in industry. You don't need fancy algorithms to catch a $9,000 charge on a $75-average card. Start simple — it works.
TypeWhat It MeansExample
PointA single value way outside the normal range$9,500 on a card averaging $50–$200
ContextualNormal in one context, weird in another95°F in January in Erie, PA
CollectiveEach point is individually fine, but the group is unusual8 small $9.99 charges in 4 min from 6 countries
⚡ Test Yourself
For each scenario, pick the anomaly type. You get 2 points for each correct answer.
1. A factory sensor normally reads 180–220 psi. Today it spiked to 847 psi.
2. A customer buys 3 TVs at once. Unusual, but it's Black Friday.
3. An employee badges into the building at 3 AM, accesses a server room they never use, and copies 12 GB of files — each action is individually permissible.
4. A patient's resting heart rate is 110 bpm. The normal range is 60–100.
Detection Methods — ranked by how often they're actually deployed:
MethodHow It WorksIndustry Usage
Statistical RulesZ-score > 2–3? Flag. Outside IQR fences? Flag.Most common. Start here.
Distance-BasedHow far is this point from its nearest neighbors or cluster centroid?Reuse your cluster model as an anomaly screener.
Density-BasedIs this point in a sparse area while others are in dense areas?LOF, DBSCAN — good for irregular shapes.
Model-BasedTrain a model on "normal" data; anything outside the learned boundary is flagged.Isolation Forest, One-Class SVM — for complex patterns.
⚖️ The Tradeoff
Every anomaly system balances two errors: False Positives (flagged something normal — annoys customers) vs. False Negatives (missed a real anomaly — fraud slips through). Lower your threshold → catch more fraud BUT more false alarms. There is no "right" answer — it depends on the cost of each error in your specific business context.
02
Market Basket Analysis
When Amazon says "Customers who bought this also bought…" — that's association rule mining at work. The algorithm scans millions of transactions to find items that appear together more often than random chance predicts.
The famous example: a retailer discovered that men buying diapers on Friday evenings also bought beer (Lift: 2.1). No human would have guessed that. But the algorithm checked every possible combination and found it. That's the power of MBA — it surfaces the surprising rules, not the obvious ones.
🔑 Critical Rule
Only analyze small baskets. A transaction with 50 items creates thousands of possible item pairs — mostly noise. Filter out large baskets (typically >10–15 items) before running the algorithm. Small baskets produce cleaner, more meaningful association rules.
⚡ Test Yourself
Click each card to reveal the metric definition. You get 1 point per card. Try to recall the definition BEFORE you flip.
Support
How popular is this combo?
tap to reveal
P(A ∩ B)
Joint probability
Bread AND Butter appear in 30% of all transactions. Low support = rare combo — but may still be interesting if lift is high.
Confidence
If A, how likely B?
tap to reveal
P(B | A)
Conditional probability
75% of bread buyers also buy butter. High confidence alone isn't enough — you need lift to know if it's better than chance.
Lift
Better than random chance?
tap to reveal
Confidence / P(B)
THE metric that matters
Lift > 1 = real association.
Lift = 1 = independent.
Lift < 1 = items avoid each other.
⚡ Practice Problem
Look at these 10 transactions. Find the rule {Bread} → {Butter} and try to calculate Support, Confidence, and Lift before revealing the answers.
TxnItemsTxnItems
T1Bread, Butter, MilkT6Bread, Butter, Milk
T2Bread, ButterT7Eggs, Milk
T3Milk, Eggs, BreadT8Bread, Milk
T4Bread, Butter, EggsT9Bread, Butter
T5Butter, MilkT10Butter, Eggs
Answer: {Bread} → {Butter}
Bread appears in: T1, T2, T3, T4, T6, T8, T9 = 7 transactions
Butter appears in: T1, T2, T4, T5, T6, T9, T10 = 7 transactions
Both appear in: T1, T2, T4, T6, T9 = 5 transactions

Support = 5/10 = 50% (half of all transactions have both)
Confidence = 5/7 = 71.4% (of Bread buyers, 71% also buy Butter)
Lift = 0.714 / 0.70 = 1.02 (barely above 1 — this is essentially independent! Not as strong as you'd think.)

This is why lift matters — high confidence can be misleading if the consequent is already very popular.
The Apriori Principle: If an item is infrequent, any itemset containing it is also infrequent. This lets the algorithm prune early — it doesn't waste time checking combinations that can't possibly meet the minimum support threshold. This is what makes MBA computationally feasible across millions of transactions.
03
The Unsupervised Learning Pipeline
CLUSTER
Group similar things
DETECT
Flag what doesn't fit
ASSOCIATE
Find what goes together
The power move: run MBA within each cluster for segment-specific association rules. A recommendation that works for College Students (Cluster 1) might be completely different from Retirees (Cluster 3). Both techniques are unsupervised — no target variable needed — and both require human judgment to interpret. The model finds patterns; you decide what to do about them.
PAIR Framework
Every model connects to a business decision: Prediction → Action → Impact → Risk. If you can't fill in all four, the model isn't ready for deployment.
04
Rapid-Fire Knowledge Check
Each question is worth 1 point. Try to answer from memory before clicking. These are the concepts you should know off the top of your head.
1. Lift = 0.7. Is this a positive association?
2. Z-score = 4.2. Normal or anomaly?
3. A basket has 47 items. Include in MBA analysis?
4. What's the most commonly deployed anomaly detection method?
5. Support = 2%, Confidence = 90%, Lift = 6.3. Worth investigating?
6. 95°F reading in January in Erie, PA. What type of anomaly?
7. What does the "A" in PAIR stand for?
8. In MBA, which metric tells you if the association is better than random chance?
9. If an item is infrequent, can a set containing it be frequent? (Apriori principle)
10. You lower your anomaly threshold. What happens to false positives?