Biostats: Sensitivity, Specificity & Study Design

This is the foundation. Every screening metric — sensitivity, specificity, PPV, NPV — is just a ratio of cells in this table. Change the numbers and watch everything update.

Try this: set TP=90, FP=80, FN=10, TN=820. Sensitivity is 90% but PPV is only 53%. That's the prevalence trap in action — when disease is rare, even good tests generate tons of false positives.

The Cheat Sheet — All 4 Metrics at a Glance

Stare at this table before doing the quiz. Every question below tests one of these rows.

	Disease +	Disease −
Test +	True Positive	False Positive
Test −	False Negative	True Negative

Metric	Formula	What It Answers	Prevalence Changes It?	Board Rule
Sensitivity	TP / (TP + FN)	Of everyone sick, how many did the test catch?	NO — fixed property of the test	SnNOut — high sens + negative result = rules OUT disease
Specificity	TN / (TN + FP)	Of everyone healthy, how many did the test clear correctly?	NO — fixed property of the test	SpPIn — high spec + positive result = rules IN disease
PPV	TP / (TP + FP)	If test is positive, what's the chance they're actually sick?	YES — prevalence UP = PPV UP	Low prevalence → tons of false positives → low PPV even with great test
NPV	TN / (TN + FN)	If test is negative, what's the chance they're actually healthy?	YES — prevalence UP = NPV DOWN	High prevalence → more disease missed → NPV drops

Sensitivity vs Specificity

SnNOut — Sensitivity rules out. A highly sensitive test, when negative, rules out disease. 🔑 S-N-N-Out: Sniff it out — a sensitive nose catches everything. If it smells nothing, nothing's there.

SpPIn — Specificity rules in. A highly specific test, when positive, rules in disease. 🔑 S-P-P-In: Specific means picky. If a picky test says yes, it's in. It doesn't say yes to just anyone.

PPV, NPV & the Prevalence Trap

PPV (Positive Predictive Value) = if the test is positive, what's the chance the patient actually has the disease?

NPV (Negative Predictive Value) = if the test is negative, what's the chance the patient is actually healthy?

Here's the key: sensitivity and specificity are fixed properties of the test. They don't change with prevalence. But PPV and NPV change dramatically with prevalence.

At 1% prevalence with 95%/95% test: PPV drops to ~16%. That means 84% of positives are false. This is why you don't screen everyone for rare diseases.

Study Types — The Evidence Ladder

Higher rung = stronger evidence. But each type has its own measure of association and its own weaknesses.

Meta-Analysis / Systematic Review

Pools data from multiple studies. The gold standard of evidence.

Measure: pooled effect size (OR, RR, etc.)

Randomized Controlled Trial (RCT)

Random assignment to treatment vs control. Gold standard for individual studies. Can establish causation.

Measure: Relative Risk (RR)RR = risk in exposed / risk in unexposed. RR=2 means twice the risk. Only valid when you can measure true incidence (prospective studies).

Cohort Study

Follow exposed vs unexposed groups forward in time (prospective) or look back (retrospective). Can show association + temporality.

Measure: Relative Risk (RR)Cohort studies track incidence over time, so you can calculate true risk ratios. Prospective = strongest. Retrospective cohort still uses RR because you're following groups forward through records.

Case-Control Study

Start with disease (cases) and no disease (controls), look BACKWARD for exposures. Good for rare diseases.

Measure: Odds Ratio (OR)OR = (a*d)/(b*c) from the 2x2 table. You CAN'T use RR because you selected on outcome, not exposure — you don't know true incidence. OR approximates RR when disease is rare.

Cross-Sectional Study

Snapshot in time. Measures exposure AND disease simultaneously. Good for prevalence, bad for causation.

Measure: Prevalence, ORSince exposure and outcome are measured at the same time, you can't establish which came first. Gives you prevalence (how common) and can calculate an OR, but cannot prove causation.

Case Series / Case Report

Description of individual cases. No comparison group. Lowest evidence but can identify new conditions.

Measure: descriptive only — no statistical comparison

Study Types — Side-by-Side Comparison

This is the table that makes the elimination game easy. Learn the columns and you can ID any study in one read.

Study Type	Direction	Starts With	Measure	Causation?	Best For	Key Weakness
RCT	Forward	Random assignment to groups	RR	YES	Testing treatments	Expensive, ethical limits
Cohort	Forward (or retro)	Exposure status	RR	Association + temporality	Common exposures, incidence	Slow, expensive, attrition
Case-Control	BACKWARD	Disease status	OR only	No	Rare diseases	Recall bias, can't calc incidence
Cross-Sectional	Snapshot (no direction)	Both at once	Prevalence, OR	No	Prevalence, hypothesis generating	No temporality at all
Case Series	Descriptive	Interesting cases	None	No	New/rare conditions	No comparison group

Bias Types — The Villain Gallery

Tap any villain to reveal their full description. These are the ways studies lie to you.

Selection Bias

"The sample was rigged from the start."

The study population isn't representative. Example: studying depression by recruiting at a hospital — you've already selected for sicker patients. Berkson bias is a specific type: using hospital patients makes it look like two diseases are associated just because both cause hospitalization.

Recall Bias

"Sick people remember more."

People with disease recall exposures more carefully than healthy controls. Classic in case-control studies. Example: mothers of babies with birth defects remember every medication they took. Healthy-baby mothers forget.

Observer / Hawthorne

"People act different when watched."

Observer bias: researcher sees what they expect (fix: blinding). Hawthorne effect: subjects change behavior because they know they're being observed, regardless of treatment group. Both inflate the treatment effect.

Lead-Time Bias

"Earlier detection looks like longer survival."

Screening catches disease earlier, so survival TIME from diagnosis appears longer — even if the patient dies at the same age. You didn't extend life; you just moved the starting line. Fix: measure mortality rate, not survival time.

Length-Time Bias

"Screening catches the slow ones."

Slow-growing tumors are present longer → more likely to be caught by screening. Aggressive tumors kill before the next screen. So screening populations look like they have better outcomes — but you're just selecting less aggressive disease.

Confounding

"A hidden third variable fooled everyone."

An unmeasured variable is associated with BOTH the exposure and the outcome. Example: coffee drinking seems to cause lung cancer — but coffee drinkers smoke more. Smoking is the confounder. Fix: randomization (best) or stratification/regression.

Berkson Bias

"Hospital patients aren't normal."

A type of selection bias specific to hospital-based studies. Two conditions appear correlated because both independently increase hospitalization probability. The association is an artifact of WHERE you sampled, not biology.

Attrition Bias

"The ones who left weren't random."

People drop out of studies for reasons related to the outcome. If sick patients in the treatment group quit due to side effects, the remaining treatment group looks healthier. Fix: intention-to-treat analysis — count everyone in their assigned group, even dropouts.

Bias Quick-Reference

Bias	The Pattern	Classic Study Type	The Fix
Selection	Sample isn't representative of the population	Any	Randomization, representative sampling
Recall	Sick people remember exposures better than healthy people	Case-control	Use medical records, not patient memory
Observer	Researcher sees what they expect to see	Any unblinded	Blinding
Hawthorne	Subjects act different because they know they're watched	Any	Control group (both groups are watched)
Lead-time	Earlier detection looks like longer survival (same death age)	Screening studies	Measure mortality rate, not survival time
Length-time	Screening catches slow tumors more than aggressive ones	Screening studies	Account for tumor growth rates
Berkson	Hospital patients make 2 diseases look correlated	Hospital-based case-control	Use community-based samples
Confounding	Hidden 3rd variable linked to BOTH exposure and outcome	Any observational	Randomization (best), stratification, regression
Attrition	Dropouts aren't random — sicker people leave treatment group	RCT, cohort	Intention-to-treat analysis

NNT, NNH & Risk Reduction

ARR (Absolute Risk Reduction) = risk in control − risk in treatment. The raw difference.

RRR (Relative Risk Reduction) = ARR / risk in control. The percentage drop. Sounds more impressive than it is.

NNH = 1 / (risk of harm in treatment − risk in control). Same idea, but for harm.

Which Statistical Test? — Decision Tree

Work through the branches. Each step challenges you before revealing the answer.

Statistical Tests — Quick-Reference

Scenario	Test	Board Clue
Categorical, 2 groups, large sample	Chi-square (χ²)	All expected counts ≥ 5
Categorical, 2 groups, small sample	Fisher exact	"small cell counts" or any expected count < 5
Categorical, 3+ groups	Chi-square	Same test, bigger table
Continuous, normal, 2 groups	t-test	Paired = before/after same subjects. Unpaired = different subjects
Continuous, normal, 3+ groups	ANOVA	If significant, use post-hoc (Tukey/Bonferroni) to find which groups differ
Continuous, non-normal, 2 groups	Mann-Whitney U	"non-normally distributed" + "2 groups" = Mann-Whitney
Continuous, non-normal, 3+ groups	Kruskal-Wallis	Non-parametric ANOVA
Correlation between 2 continuous variables	Pearson (normal) / Spearman (non-normal)	"association between two measurements"

Power, Error & p-Values

Type I vs Type II — Stop Confusing Them

	Type I Error (alpha)	Type II Error (beta)
What happened	Said "there IS a difference" — but there isn't	Said "no difference" — but there IS one
Analogy	False positive of the study	False negative of the study
Threshold	alpha = 0.05	beta = 0.20
Related to	p-value (p < alpha = significant)	Power = 1 - beta = 0.80
Board clue	"False alarm" — rejected null when it was true	"Missed it" — small sample, underpowered study
Fix	Lower alpha (stricter threshold)	Increase sample size

p-Value and Confidence Intervals

p-valueThe probability of seeing results this extreme (or more) IF the null hypothesis were true. It is NOT the probability that the null is true. This is the most commonly misinterpreted stat on boards. < 0.05 = statistically significant. But it says nothing about clinical significance.

95% Confidence IntervalIf you repeated the study 100 times, ~95 of those intervals would contain the true value. A 95% CI for RR that crosses 1.0 = not significant. A 95% CI for a mean difference that crosses 0 = not significant.: if the CI for an RR or OR includes 1.0, the result is NOT statistically significant. If the CI for a mean difference includes 0, NOT significant.

Elimination Game — Which Study Type?

Five scenarios. Each clue eliminates options until one remains. Think before each reveal.

Clinical Vignettes

4 board-style questions pulled from a rotating pool. Different every time you load the page.