Everything derives from four cells. Learn the 2x2 table and the rest falls out.
This is the foundation. Every screening metric — sensitivity, specificity, PPV, NPV — is just a ratio of cells in this table. Change the numbers and watch everything update.
| Disease + | Disease − | |
|---|---|---|
| Test + | True Positive |
False Positive |
| Test − | False Negative |
True Negative |
Try this: set TP=90, FP=80, FN=10, TN=820. Sensitivity is 90% but PPV is only 53%. That's the prevalence trap in action — when disease is rare, even good tests generate tons of false positives.
Stare at this table before doing the quiz. Every question below tests one of these rows.
| Metric | Formula | What It Answers | Prevalence Changes It? | Board Rule |
|---|---|---|---|---|
| Sensitivity | TP / (TP + FN) | Of everyone sick, how many did the test catch? | NO — fixed property of the test | SnNOut — high sens + negative result = rules OUT disease |
| Specificity | TN / (TN + FP) | Of everyone healthy, how many did the test clear correctly? | NO — fixed property of the test | SpPIn — high spec + positive result = rules IN disease |
| PPV | TP / (TP + FP) | If test is positive, what's the chance they're actually sick? | YES — prevalence UP = PPV UP | Low prevalence → tons of false positives → low PPV even with great test |
| NPV | TN / (TN + FN) | If test is negative, what's the chance they're actually healthy? | YES — prevalence UP = NPV DOWN | High prevalence → more disease missed → NPV drops |
The one-sentence version: Sensitivity and specificity describe the test. PPV and NPV describe the patient's result. The test doesn't change — but what the result means depends on who you're testing.
Sensitivity = how good is the test at catching people who HAVE the disease?
Formula: TP / (TP + FN). Think: "of everyone who's SICK, how many did the test catch?"
High sensitivity means few false negatives. If you test negative, you're probably clear.
Specificity = how good is the test at correctly clearing people who DON'T have the disease?
Formula: TN / (TN + FP). Think: "of everyone who's HEALTHY, how many did the test get right?"
High specificity means few false positives. If you test positive, it's probably real.
SnNOut — Sensitivity rules out. A highly sensitive test, when negative, rules out disease. 🔑 S-N-N-Out: Sniff it out — a sensitive nose catches everything. If it smells nothing, nothing's there.
SpPIn — Specificity rules in. A highly specific test, when positive, rules in disease. 🔑 S-P-P-In: Specific means picky. If a picky test says yes, it's in. It doesn't say yes to just anyone.
Boards LOVE this trap: "A screening test has 99% specificity. A patient tests positive. Does he have the disease?"
The answer depends on prevalence. Even 99% specificity gives tons of false positives in a low-prevalence population. SpPIn only works when pre-test probability isn't rock-bottom.
PPV (Positive Predictive Value) = if the test is positive, what's the chance the patient actually has the disease?
NPV (Negative Predictive Value) = if the test is negative, what's the chance the patient is actually healthy?
Here's the key: sensitivity and specificity are fixed properties of the test. They don't change with prevalence. But PPV and NPV change dramatically with prevalence.
Watch what happens to PPV as prevalence drops. Test sensitivity = 95%, specificity = 95%, population = 10,000.
At 1% prevalence with 95%/95% test: PPV drops to ~16%. That means 84% of positives are false. This is why you don't screen everyone for rare diseases.
Prevalence up = PPV up, NPV down.
Prevalence down = PPV down, NPV up.
Think: if everyone has the disease, a positive test is almost certainly right (high PPV). If nobody has it, a positive test is probably wrong (low PPV).
Higher rung = stronger evidence. But each type has its own measure of association and its own weaknesses.
RR vs OR: If you can follow people forward and count who gets sick = RR (RCT, cohort). If you start with sick people and look backward = OR (case-control). Boards ask this constantly. Case-control = OR. Always.
This is the table that makes the elimination game easy. Learn the columns and you can ID any study in one read.
| Study Type | Direction | Starts With | Measure | Causation? | Best For | Key Weakness |
|---|---|---|---|---|---|---|
| RCT | Forward | Random assignment to groups | RR | YES | Testing treatments | Expensive, ethical limits |
| Cohort | Forward (or retro) | Exposure status | RR | Association + temporality | Common exposures, incidence | Slow, expensive, attrition |
| Case-Control | BACKWARD | Disease status | OR only | No | Rare diseases | Recall bias, can't calc incidence |
| Cross-Sectional | Snapshot (no direction) | Both at once | Prevalence, OR | No | Prevalence, hypothesis generating | No temporality at all |
| Case Series | Descriptive | Interesting cases | None | No | New/rare conditions | No comparison group |
Tap any villain to reveal their full description. These are the ways studies lie to you.
The board question gives you a scenario. Match the pattern to the bias.
| Bias | The Pattern | Classic Study Type | The Fix |
|---|---|---|---|
| Selection | Sample isn't representative of the population | Any | Randomization, representative sampling |
| Recall | Sick people remember exposures better than healthy people | Case-control | Use medical records, not patient memory |
| Observer | Researcher sees what they expect to see | Any unblinded | Blinding |
| Hawthorne | Subjects act different because they know they're watched | Any | Control group (both groups are watched) |
| Lead-time | Earlier detection looks like longer survival (same death age) | Screening studies | Measure mortality rate, not survival time |
| Length-time | Screening catches slow tumors more than aggressive ones | Screening studies | Account for tumor growth rates |
| Berkson | Hospital patients make 2 diseases look correlated | Hospital-based case-control | Use community-based samples |
| Confounding | Hidden 3rd variable linked to BOTH exposure and outcome | Any observational | Randomization (best), stratification, regression |
| Attrition | Dropouts aren't random — sicker people leave treatment group | RCT, cohort | Intention-to-treat analysis |
ARR (Absolute Risk Reduction) = risk in control − risk in treatment. The raw difference.
RRR (Relative Risk Reduction) = ARR / risk in control. The percentage drop. Sounds more impressive than it is.
NNT = 1 / ARR. The number of patients you need to treat for ONE to benefit.
NNH = 1 / (risk of harm in treatment − risk in control). Same idea, but for harm.
Enter event rates to see NNT.
Drug ads love RRR because it sounds dramatic. "50% reduction in heart attacks!" But if risk went from 2% to 1%, the ARR is just 1% and NNT = 100. You'd treat 100 people for 1 to benefit. Always ask for the absolute numbers.
Work through the branches. Each step challenges you before revealing the answer.
Don't want to click through the tree? Here's the whole thing in one table.
| Scenario | Test | Board Clue |
|---|---|---|
| Categorical, 2 groups, large sample | Chi-square (χ²) | All expected counts ≥ 5 |
| Categorical, 2 groups, small sample | Fisher exact | "small cell counts" or any expected count < 5 |
| Categorical, 3+ groups | Chi-square | Same test, bigger table |
| Continuous, normal, 2 groups | t-test | Paired = before/after same subjects. Unpaired = different subjects |
| Continuous, normal, 3+ groups | ANOVA | If significant, use post-hoc (Tukey/Bonferroni) to find which groups differ |
| Continuous, non-normal, 2 groups | Mann-Whitney U | "non-normally distributed" + "2 groups" = Mann-Whitney |
| Continuous, non-normal, 3+ groups | Kruskal-Wallis | Non-parametric ANOVA |
| Correlation between 2 continuous variables | Pearson (normal) / Spearman (non-normal) | "association between two measurements" |
You rejected the null hypothesis, but it was actually true. You said "there IS a difference" when there isn't. A false positive of the study itself.
Standard threshold: alpha = 0.05 (5% chance of this error)
You failed to reject the null, but the alternative was true. You said "no difference" when there IS one. A false negative of the study itself.
Standard threshold: beta = 0.20 (20% chance of this error)
The probability of correctly detecting a real effect. Standard = 0.80 (80%).
How to increase power: increase sample size (most common), increase effect size, increase alpha (rarely done), decrease variability.
| Type I Error (alpha) | Type II Error (beta) | |
|---|---|---|
| What happened | Said "there IS a difference" — but there isn't | Said "no difference" — but there IS one |
| Analogy | False positive of the study | False negative of the study |
| Threshold | alpha = 0.05 | beta = 0.20 |
| Related to | p-value (p < alpha = significant) | Power = 1 - beta = 0.80 |
| Board clue | "False alarm" — rejected null when it was true | "Missed it" — small sample, underpowered study |
| Fix | Lower alpha (stricter threshold) | Increase sample size |
p-valueThe probability of seeing results this extreme (or more) IF the null hypothesis were true. It is NOT the probability that the null is true. This is the most commonly misinterpreted stat on boards. < 0.05 = statistically significant. But it says nothing about clinical significance.
95% Confidence IntervalIf you repeated the study 100 times, ~95 of those intervals would contain the true value. A 95% CI for RR that crosses 1.0 = not significant. A 95% CI for a mean difference that crosses 0 = not significant.: if the CI for an RR or OR includes 1.0, the result is NOT statistically significant. If the CI for a mean difference includes 0, NOT significant.
Five scenarios. Each clue eliminates options until one remains. Think before each reveal.
4 board-style questions pulled from a rotating pool. Different every time you load the page.