Your bank sends you a "Was this you?" text after an unusual purchase. That's anomaly detection — a model learned what your "normal" spending looks like (typical amounts, usual locations, common times) and flagged something that broke the pattern.
The goal is simple: learn what normal looks like, then flag anything that deviates. This is used everywhere — fraud detection, healthcare monitoring, cybersecurity, manufacturing quality control.
🔑 Key Concept
Point anomaly detection using simple statistical rules (Z-scores, IQR) is the most commonly deployed method in industry. You don't need fancy algorithms to catch a $9,000 charge on a $75-average card. Start simple — it works.
| Type | What It Means | Example |
|---|---|---|
| Point | A single value way outside the normal range | $9,500 on a card averaging $50–$200 |
| Contextual | Normal in one context, weird in another | 95°F in January in Erie, PA |
| Collective | Each point is individually fine, but the group is unusual | 8 small $9.99 charges in 4 min from 6 countries |
⚡ Test Yourself
For each scenario, pick the anomaly type. You get 2 points for each correct answer.
1. A factory sensor normally reads 180–220 psi. Today it spiked to 847 psi.
2. A customer buys 3 TVs at once. Unusual, but it's Black Friday.
3. An employee badges into the building at 3 AM, accesses a server room they never use, and copies 12 GB of files — each action is individually permissible.
4. A patient's resting heart rate is 110 bpm. The normal range is 60–100.
Detection Methods — ranked by how often they're actually deployed:
| Method | How It Works | Industry Usage |
|---|---|---|
| Statistical Rules | Z-score > 2–3? Flag. Outside IQR fences? Flag. | Most common. Start here. |
| Distance-Based | How far is this point from its nearest neighbors or cluster centroid? | Reuse your cluster model as an anomaly screener. |
| Density-Based | Is this point in a sparse area while others are in dense areas? | LOF, DBSCAN — good for irregular shapes. |
| Model-Based | Train a model on "normal" data; anything outside the learned boundary is flagged. | Isolation Forest, One-Class SVM — for complex patterns. |
⚖️ The Tradeoff
Every anomaly system balances two errors: False Positives (flagged something normal — annoys customers) vs. False Negatives (missed a real anomaly — fraud slips through). Lower your threshold → catch more fraud BUT more false alarms. There is no "right" answer — it depends on the cost of each error in your specific business context.
When Amazon says "Customers who bought this also bought…" — that's association rule mining at work. The algorithm scans millions of transactions to find items that appear together more often than random chance predicts.
The famous example: a retailer discovered that men buying diapers on Friday evenings also bought beer (Lift: 2.1). No human would have guessed that. But the algorithm checked every possible combination and found it. That's the power of MBA — it surfaces the surprising rules, not the obvious ones.
🔑 Critical Rule
Only analyze small baskets. A transaction with 50 items creates thousands of possible item pairs — mostly noise. Filter out large baskets (typically >10–15 items) before running the algorithm. Small baskets produce cleaner, more meaningful association rules.
⚡ Test Yourself
Click each card to reveal the metric definition. You get 1 point per card. Try to recall the definition BEFORE you flip.
Support
How popular is this combo?
tap to reveal
P(A ∩ B)
Joint probability
Bread AND Butter appear in 30% of all transactions. Low support = rare combo — but may still be interesting if lift is high.
Confidence
If A, how likely B?
tap to reveal
P(B | A)
Conditional probability
75% of bread buyers also buy butter. High confidence alone isn't enough — you need lift to know if it's better than chance.
Lift
Better than random chance?
tap to reveal
Confidence / P(B)
THE metric that matters
Lift > 1 = real association.
Lift = 1 = independent.
Lift < 1 = items avoid each other.
Lift = 1 = independent.
Lift < 1 = items avoid each other.
⚡ Practice Problem
Look at these 10 transactions. Find the rule {Bread} → {Butter} and try to calculate Support, Confidence, and Lift before revealing the answers.
| Txn | Items | Txn | Items | |
|---|---|---|---|---|
| T1 | Bread, Butter, Milk | T6 | Bread, Butter, Milk | |
| T2 | Bread, Butter | T7 | Eggs, Milk | |
| T3 | Milk, Eggs, Bread | T8 | Bread, Milk | |
| T4 | Bread, Butter, Eggs | T9 | Bread, Butter | |
| T5 | Butter, Milk | T10 | Butter, Eggs |
Answer: {Bread} → {Butter}
Bread appears in: T1, T2, T3, T4, T6, T8, T9 = 7 transactionsButter appears in: T1, T2, T4, T5, T6, T9, T10 = 7 transactions
Both appear in: T1, T2, T4, T6, T9 = 5 transactions
Support = 5/10 = 50% (half of all transactions have both)
Confidence = 5/7 = 71.4% (of Bread buyers, 71% also buy Butter)
Lift = 0.714 / 0.70 = 1.02 (barely above 1 — this is essentially independent! Not as strong as you'd think.)
This is why lift matters — high confidence can be misleading if the consequent is already very popular.
The Apriori Principle: If an item is infrequent, any itemset containing it is also infrequent. This lets the algorithm prune early — it doesn't waste time checking combinations that can't possibly meet the minimum support threshold. This is what makes MBA computationally feasible across millions of transactions.
CLUSTER
Group similar things
→
DETECT
Flag what doesn't fit
→
ASSOCIATE
Find what goes together
The power move: run MBA within each cluster for segment-specific association rules. A recommendation that works for College Students (Cluster 1) might be completely different from Retirees (Cluster 3). Both techniques are unsupervised — no target variable needed — and both require human judgment to interpret. The model finds patterns; you decide what to do about them.
PAIR Framework
Every model connects to a business decision: Prediction → Action → Impact → Risk. If you can't fill in all four, the model isn't ready for deployment.
Each question is worth 1 point. Try to answer from memory before clicking. These are the concepts you should know off the top of your head.
1. Lift = 0.7. Is this a positive association?
2. Z-score = 4.2. Normal or anomaly?
3. A basket has 47 items. Include in MBA analysis?
4. What's the most commonly deployed anomaly detection method?
5. Support = 2%, Confidence = 90%, Lift = 6.3. Worth investigating?
6. 95°F reading in January in Erie, PA. What type of anomaly?
7. What does the "A" in PAIR stand for?
8. In MBA, which metric tells you if the association is better than random chance?
9. If an item is infrequent, can a set containing it be frequent? (Apriori principle)
10. You lower your anomaly threshold. What happens to false positives?