A screening test has 95% sensitivity. A patient tests negative. Can you rule out the disease?

Biostats: Sensitivity, Specificity & Study Design

Everything derives from four cells. Learn the 2x2 table and the rest falls out.

The 2x2 Table

This is the foundation. Every screening metric — sensitivity, specificity, PPV, NPV — is just a ratio of cells in this table. Change the numbers and watch everything update.

Disease + Disease −
Test +
True Positive

False Positive
Test −
False Negative

True Negative
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
PPV TP / (TP + FP)
NPV TN / (TN + FN)
Prevalence (TP + FN) / Total
Accuracy (TP + TN) / Total

Try this: set TP=90, FP=80, FN=10, TN=820. Sensitivity is 90% but PPV is only 53%. That's the prevalence trap in action — when disease is rare, even good tests generate tons of false positives.

The Cheat Sheet — All 4 Metrics at a Glance

Stare at this table before doing the quiz. Every question below tests one of these rows.

Metric Formula What It Answers Prevalence Changes It? Board Rule
Sensitivity TP / (TP + FN) Of everyone sick, how many did the test catch? NO — fixed property of the test SnNOut — high sens + negative result = rules OUT disease
Specificity TN / (TN + FP) Of everyone healthy, how many did the test clear correctly? NO — fixed property of the test SpPIn — high spec + positive result = rules IN disease
PPV TP / (TP + FP) If test is positive, what's the chance they're actually sick? YES — prevalence UP = PPV UP Low prevalence → tons of false positives → low PPV even with great test
NPV TN / (TN + FN) If test is negative, what's the chance they're actually healthy? YES — prevalence UP = NPV DOWN High prevalence → more disease missed → NPV drops

The one-sentence version: Sensitivity and specificity describe the test. PPV and NPV describe the patient's result. The test doesn't change — but what the result means depends on who you're testing.

Sensitivity vs Specificity

Sensitivity = how good is the test at catching people who HAVE the disease?

Formula: TP / (TP + FN). Think: "of everyone who's SICK, how many did the test catch?"

High sensitivity means few false negatives. If you test negative, you're probably clear.

Specificity = how good is the test at correctly clearing people who DON'T have the disease?

Formula: TN / (TN + FP). Think: "of everyone who's HEALTHY, how many did the test get right?"

High specificity means few false positives. If you test positive, it's probably real.

SnNOutSensitivity rules out. A highly sensitive test, when negative, rules out disease. 🔑 S-N-N-Out: Sniff it out — a sensitive nose catches everything. If it smells nothing, nothing's there.

SpPInSpecificity rules in. A highly specific test, when positive, rules in disease. 🔑 S-P-P-In: Specific means picky. If a picky test says yes, it's in. It doesn't say yes to just anyone.

Boards LOVE this trap: "A screening test has 99% specificity. A patient tests positive. Does he have the disease?"

The answer depends on prevalence. Even 99% specificity gives tons of false positives in a low-prevalence population. SpPIn only works when pre-test probability isn't rock-bottom.

You want a screening test for a deadly cancer in the general population. Which do you prioritize?

PPV, NPV & the Prevalence Trap

PPV (Positive Predictive Value) = if the test is positive, what's the chance the patient actually has the disease?

NPV (Negative Predictive Value) = if the test is negative, what's the chance the patient is actually healthy?

Here's the key: sensitivity and specificity are fixed properties of the test. They don't change with prevalence. But PPV and NPV change dramatically with prevalence.

Prevalence Slider

Watch what happens to PPV as prevalence drops. Test sensitivity = 95%, specificity = 95%, population = 10,000.

PPV
NPV
False Positives

At 1% prevalence with 95%/95% test: PPV drops to ~16%. That means 84% of positives are false. This is why you don't screen everyone for rare diseases.

Prevalence up = PPV up, NPV down.

Prevalence down = PPV down, NPV up.

Think: if everyone has the disease, a positive test is almost certainly right (high PPV). If nobody has it, a positive test is probably wrong (low PPV).

Study Types — The Evidence Ladder

Higher rung = stronger evidence. But each type has its own measure of association and its own weaknesses.

1
Meta-Analysis / Systematic Review
Pools data from multiple studies. The gold standard of evidence.
Measure: pooled effect size (OR, RR, etc.)
2
Randomized Controlled Trial (RCT)
Random assignment to treatment vs control. Gold standard for individual studies. Can establish causation.
Measure: Relative Risk (RR)RR = risk in exposed / risk in unexposed. RR=2 means twice the risk. Only valid when you can measure true incidence (prospective studies).
3
Cohort Study
Follow exposed vs unexposed groups forward in time (prospective) or look back (retrospective). Can show association + temporality.
Measure: Relative Risk (RR)Cohort studies track incidence over time, so you can calculate true risk ratios. Prospective = strongest. Retrospective cohort still uses RR because you're following groups forward through records.
4
Case-Control Study
Start with disease (cases) and no disease (controls), look BACKWARD for exposures. Good for rare diseases.
Measure: Odds Ratio (OR)OR = (a*d)/(b*c) from the 2x2 table. You CAN'T use RR because you selected on outcome, not exposure — you don't know true incidence. OR approximates RR when disease is rare.
5
Cross-Sectional Study
Snapshot in time. Measures exposure AND disease simultaneously. Good for prevalence, bad for causation.
Measure: Prevalence, ORSince exposure and outcome are measured at the same time, you can't establish which came first. Gives you prevalence (how common) and can calculate an OR, but cannot prove causation.
6
Case Series / Case Report
Description of individual cases. No comparison group. Lowest evidence but can identify new conditions.
Measure: descriptive only — no statistical comparison

RR vs OR: If you can follow people forward and count who gets sick = RR (RCT, cohort). If you start with sick people and look backward = OR (case-control). Boards ask this constantly. Case-control = OR. Always.

Study Types — Side-by-Side Comparison

This is the table that makes the elimination game easy. Learn the columns and you can ID any study in one read.

Study Type Direction Starts With Measure Causation? Best For Key Weakness
RCT Forward Random assignment to groups RR YES Testing treatments Expensive, ethical limits
Cohort Forward (or retro) Exposure status RR Association + temporality Common exposures, incidence Slow, expensive, attrition
Case-Control BACKWARD Disease status OR only No Rare diseases Recall bias, can't calc incidence
Cross-Sectional Snapshot (no direction) Both at once Prevalence, OR No Prevalence, hypothesis generating No temporality at all
Case Series Descriptive Interesting cases None No New/rare conditions No comparison group

Bias Types — The Villain Gallery

Tap any villain to reveal their full description. These are the ways studies lie to you.

Selection Bias
"The sample was rigged from the start."
The study population isn't representative. Example: studying depression by recruiting at a hospital — you've already selected for sicker patients. Berkson bias is a specific type: using hospital patients makes it look like two diseases are associated just because both cause hospitalization.
Recall Bias
"Sick people remember more."
People with disease recall exposures more carefully than healthy controls. Classic in case-control studies. Example: mothers of babies with birth defects remember every medication they took. Healthy-baby mothers forget.
Observer / Hawthorne
"People act different when watched."
Observer bias: researcher sees what they expect (fix: blinding). Hawthorne effect: subjects change behavior because they know they're being observed, regardless of treatment group. Both inflate the treatment effect.
Lead-Time Bias
"Earlier detection looks like longer survival."
Screening catches disease earlier, so survival TIME from diagnosis appears longer — even if the patient dies at the same age. You didn't extend life; you just moved the starting line. Fix: measure mortality rate, not survival time.
Length-Time Bias
"Screening catches the slow ones."
Slow-growing tumors are present longer → more likely to be caught by screening. Aggressive tumors kill before the next screen. So screening populations look like they have better outcomes — but you're just selecting less aggressive disease.
Confounding
"A hidden third variable fooled everyone."
An unmeasured variable is associated with BOTH the exposure and the outcome. Example: coffee drinking seems to cause lung cancer — but coffee drinkers smoke more. Smoking is the confounder. Fix: randomization (best) or stratification/regression.
Berkson Bias
"Hospital patients aren't normal."
A type of selection bias specific to hospital-based studies. Two conditions appear correlated because both independently increase hospitalization probability. The association is an artifact of WHERE you sampled, not biology.
Attrition Bias
"The ones who left weren't random."
People drop out of studies for reasons related to the outcome. If sick patients in the treatment group quit due to side effects, the remaining treatment group looks healthier. Fix: intention-to-treat analysis — count everyone in their assigned group, even dropouts.

Bias Quick-Reference

The board question gives you a scenario. Match the pattern to the bias.

Bias The Pattern Classic Study Type The Fix
Selection Sample isn't representative of the population Any Randomization, representative sampling
Recall Sick people remember exposures better than healthy people Case-control Use medical records, not patient memory
Observer Researcher sees what they expect to see Any unblinded Blinding
Hawthorne Subjects act different because they know they're watched Any Control group (both groups are watched)
Lead-time Earlier detection looks like longer survival (same death age) Screening studies Measure mortality rate, not survival time
Length-time Screening catches slow tumors more than aggressive ones Screening studies Account for tumor growth rates
Berkson Hospital patients make 2 diseases look correlated Hospital-based case-control Use community-based samples
Confounding Hidden 3rd variable linked to BOTH exposure and outcome Any observational Randomization (best), stratification, regression
Attrition Dropouts aren't random — sicker people leave treatment group RCT, cohort Intention-to-treat analysis

NNT, NNH & Risk Reduction

ARR (Absolute Risk Reduction) = risk in control − risk in treatment. The raw difference.

RRR (Relative Risk Reduction) = ARR / risk in control. The percentage drop. Sounds more impressive than it is.

NNT = 1 / ARR. The number of patients you need to treat for ONE to benefit.

NNH = 1 / (risk of harm in treatment − risk in control). Same idea, but for harm.

NNT Calculator

Enter event rates to see NNT.

ARR =  |  RRR =  |  NNT =

Drug ads love RRR because it sounds dramatic. "50% reduction in heart attacks!" But if risk went from 2% to 1%, the ARR is just 1% and NNT = 100. You'd treat 100 people for 1 to benefit. Always ask for the absolute numbers.

Which Statistical Test? — Decision Tree

Work through the branches. Each step challenges you before revealing the answer.

What type of data is your outcome variable?

Statistical Tests — Quick-Reference

Don't want to click through the tree? Here's the whole thing in one table.

Scenario Test Board Clue
Categorical, 2 groups, large sample Chi-square (χ²) All expected counts ≥ 5
Categorical, 2 groups, small sample Fisher exact "small cell counts" or any expected count < 5
Categorical, 3+ groups Chi-square Same test, bigger table
Continuous, normal, 2 groups t-test Paired = before/after same subjects. Unpaired = different subjects
Continuous, normal, 3+ groups ANOVA If significant, use post-hoc (Tukey/Bonferroni) to find which groups differ
Continuous, non-normal, 2 groups Mann-Whitney U "non-normally distributed" + "2 groups" = Mann-Whitney
Continuous, non-normal, 3+ groups Kruskal-Wallis Non-parametric ANOVA
Correlation between 2 continuous variables Pearson (normal) / Spearman (non-normal) "association between two measurements"

Power, Error & p-Values

Type I Error (alpha)

You rejected the null hypothesis, but it was actually true. You said "there IS a difference" when there isn't. A false positive of the study itself.

Standard threshold: alpha = 0.05 (5% chance of this error)

Type II Error (beta)

You failed to reject the null, but the alternative was true. You said "no difference" when there IS one. A false negative of the study itself.

Standard threshold: beta = 0.20 (20% chance of this error)

Power = 1 − beta

The probability of correctly detecting a real effect. Standard = 0.80 (80%).

How to increase power: increase sample size (most common), increase effect size, increase alpha (rarely done), decrease variability.

Type I vs Type II — Stop Confusing Them

Type I Error (alpha) Type II Error (beta)
What happened Said "there IS a difference" — but there isn't Said "no difference" — but there IS one
Analogy False positive of the study False negative of the study
Threshold alpha = 0.05 beta = 0.20
Related to p-value (p < alpha = significant) Power = 1 - beta = 0.80
Board clue "False alarm" — rejected null when it was true "Missed it" — small sample, underpowered study
Fix Lower alpha (stricter threshold) Increase sample size
A study with 50 patients finds no significant difference between drug A and placebo (p = 0.12). The investigator concludes the drug doesn't work. What's the most likely problem?

p-Value and Confidence Intervals

p-valueThe probability of seeing results this extreme (or more) IF the null hypothesis were true. It is NOT the probability that the null is true. This is the most commonly misinterpreted stat on boards. < 0.05 = statistically significant. But it says nothing about clinical significance.

95% Confidence IntervalIf you repeated the study 100 times, ~95 of those intervals would contain the true value. A 95% CI for RR that crosses 1.0 = not significant. A 95% CI for a mean difference that crosses 0 = not significant.: if the CI for an RR or OR includes 1.0, the result is NOT statistically significant. If the CI for a mean difference includes 0, NOT significant.

Elimination Game — Which Study Type?

Five scenarios. Each clue eliminates options until one remains. Think before each reveal.

Clinical Vignettes

4 board-style questions pulled from a rotating pool. Different every time you load the page.

built for Mo | bone wizardry