Anomaly Detection & Market Basket Analysis

Your bank sends you a "Was this you?" text after an unusual purchase. That's anomaly detection — a model learned what your "normal" spending looks like (typical amounts, usual locations, common times) and flagged something that broke the pattern.

The goal is simple: learn what normal looks like, then flag anything that deviates. This is used everywhere — fraud detection, healthcare monitoring, cybersecurity, manufacturing quality control.

🔑 Key Concept

Point anomaly detection using simple statistical rules (Z-scores, IQR) is the most commonly deployed method in industry. You don't need fancy algorithms to catch a $9,000 charge on a $75-average card. Start simple — it works.

Type	What It Means	Example
Point	A single value way outside the normal range	$9,500 on a card averaging $50–$200
Contextual	Normal in one context, weird in another	95°F in January in Erie, PA
Collective	Each point is individually fine, but the group is unusual	8 small $9.99 charges in 4 min from 6 countries

⚡ Test Yourself

For each scenario, pick the anomaly type. You get 2 points for each correct answer.

1. A factory sensor normally reads 180–220 psi. Today it spiked to 847 psi.

2. A customer buys 3 TVs at once. Unusual, but it's Black Friday.

3. An employee badges into the building at 3 AM, accesses a server room they never use, and copies 12 GB of files — each action is individually permissible.

4. A patient's resting heart rate is 110 bpm. The normal range is 60–100.

Detection Methods — ranked by how often they're actually deployed:

Method	How It Works	Industry Usage
Statistical Rules	Z-score > 2–3? Flag. Outside IQR fences? Flag.	Most common. Start here.
Distance-Based	How far is this point from its nearest neighbors or cluster centroid?	Reuse your cluster model as an anomaly screener.
Density-Based	Is this point in a sparse area while others are in dense areas?	LOF, DBSCAN — good for irregular shapes.
Model-Based	Train a model on "normal" data; anything outside the learned boundary is flagged.	Isolation Forest, One-Class SVM — for complex patterns.

⚖️ The Tradeoff

Every anomaly system balances two errors: False Positives (flagged something normal — annoys customers) vs. False Negatives (missed a real anomaly — fraud slips through). Lower your threshold → catch more fraud BUT more false alarms. There is no "right" answer — it depends on the cost of each error in your specific business context.

When Amazon says "Customers who bought this also bought…" — that's association rule mining at work. The algorithm scans millions of transactions to find items that appear together more often than random chance predicts.

The famous example: a retailer discovered that men buying diapers on Friday evenings also bought beer (Lift: 2.1). No human would have guessed that. But the algorithm checked every possible combination and found it. That's the power of MBA — it surfaces the surprising rules, not the obvious ones.

🔑 Critical Rule

Only analyze small baskets. A transaction with 50 items creates thousands of possible item pairs — mostly noise. Filter out large baskets (typically >10–15 items) before running the algorithm. Small baskets produce cleaner, more meaningful association rules.

⚡ Test Yourself

Click each card to reveal the metric definition. You get 1 point per card. Try to recall the definition BEFORE you flip.

Support

How popular is this combo?

tap to reveal

P(A ∩ B)

Joint probability

Bread AND Butter appear in 30% of all transactions. Low support = rare combo — but may still be interesting if lift is high.

Confidence

If A, how likely B?

tap to reveal

P(B | A)

Conditional probability

75% of bread buyers also buy butter. High confidence alone isn't enough — you need lift to know if it's better than chance.

Lift

Better than random chance?

tap to reveal

Confidence / P(B)

THE metric that matters

Lift > 1 = real association.
Lift = 1 = independent.
Lift < 1 = items avoid each other.

⚡ Practice Problem

Look at these 10 transactions. Find the rule {Bread} → {Butter} and try to calculate Support, Confidence, and Lift before revealing the answers.

Txn	Items	Txn	Items
T1	Bread, Butter, Milk	T6	Bread, Butter, Milk
T2	Bread, Butter	T7	Eggs, Milk
T3	Milk, Eggs, Bread	T8	Bread, Milk
T4	Bread, Butter, Eggs	T9	Bread, Butter
T5	Butter, Milk	T10	Butter, Eggs

Answer: {Bread} → {Butter}

Bread appears in: T1, T2, T3, T4, T6, T8, T9 = 7 transactions
Butter appears in: T1, T2, T4, T5, T6, T9, T10 = 7 transactions
Both appear in: T1, T2, T4, T6, T9 = 5 transactions

Support = 5/10 = 50% (half of all transactions have both)
Confidence = 5/7 = 71.4% (of Bread buyers, 71% also buy Butter)
Lift = 0.714 / 0.70 = 1.02 (barely above 1 — this is essentially independent! Not as strong as you'd think.)

This is why lift matters — high confidence can be misleading if the consequent is already very popular.

The Apriori Principle: If an item is infrequent, any itemset containing it is also infrequent. This lets the algorithm prune early — it doesn't waste time checking combinations that can't possibly meet the minimum support threshold. This is what makes MBA computationally feasible across millions of transactions.

CLUSTER

Group similar things

→

DETECT

Flag what doesn't fit

→

ASSOCIATE

Find what goes together

The power move: run MBA within each cluster for segment-specific association rules. A recommendation that works for College Students (Cluster 1) might be completely different from Retirees (Cluster 3). Both techniques are unsupervised — no target variable needed — and both require human judgment to interpret. The model finds patterns; you decide what to do about them.

PAIR Framework

Every model connects to a business decision: Prediction → Action → Impact → Risk. If you can't fill in all four, the model isn't ready for deployment.

Each question is worth 1 point. Try to answer from memory before clicking. These are the concepts you should know off the top of your head.

1. Lift = 0.7. Is this a positive association?

2. Z-score = 4.2. Normal or anomaly?

3. A basket has 47 items. Include in MBA analysis?

4. What's the most commonly deployed anomaly detection method?

5. Support = 2%, Confidence = 90%, Lift = 6.3. Worth investigating?

6. 95°F reading in January in Erie, PA. What type of anomaly?

7. What does the "A" in PAIR stand for?

8. In MBA, which metric tells you if the association is better than random chance?

9. If an item is infrequent, can a set containing it be frequent? (Apriori principle)

10. You lower your anomaly threshold. What happens to false positives?