Statistics Memory Tricks | Math

Statistics

Memory Tricks

Proven Mnemonics & Acronyms — fast to learn, hard to forget.

📊 Statistics

SOCS: Shape · Outliers · Center · Spread

Describing Distributions — S=Shape · O=Outliers · C=Center · S=Spread

Never miss a component when describing a distribution

When asked to describe any distribution, always address all four. Missing any one costs points on every stats exam.

Shape — symmetric, skewed left, skewed right, bimodal, or uniform

Outliers — any unusual values? Use 1.5×IQR rule to identify

Center — mean for symmetric distributions, median for skewed

Spread — standard deviation (with mean) or IQR (with median)

📖 Full Lesson

📊 Statistics · Hypothesis Testing

PHANTOMS

P=Parameter · H=Hypotheses · A=Assumptions · N=Name test · T=Test statistic · O=Obtain p-value · M=Make decision · S=State conclusion

Never skip a step in a hypothesis test again

Every complete hypothesis test needs all 8 steps. Missing any step costs full points on AP Statistics and college exams.

Parameter — define the population parameter in context

Hypotheses — state H₀ and Hₐ with correct notation and direction

Assumptions — verify conditions: random, normal, independent

Name test — identify the specific test (z-test, t-test, chi-square)

Test statistic — calculate and show all work

Obtain p-value — find using table or calculator

Make decision — compare p-value to α, reject or fail to reject H₀

State conclusion — in context; never say "accept H₀"

📖 Full Lesson

📊 Statistics · Normal Distribution

68 − 95 − 99.7 Rule

Empirical Rule — 68% within 1σ · 95% within 2σ · 99.7% within 3σ of the mean

Know exactly what percentage falls within any standard deviation

In any normal distribution: 68% within 1 standard deviation, 95% within 2, 99.7% within 3. These three numbers are on almost every statistics exam.

📖 Full Lesson

📊 Statistics · Errors

TYPE I = False Positive (α). TYPE II = False Negative (β).

Hypothesis Test Errors — Type I: reject true H₀ (false alarm) · Type II: fail to reject false H₀ (miss the effect)

Never confuse Type I and Type II errors again

Type I (α): crying wolf — H₀ IS true but you reject it anyway. Type II (β): missing the wolf — H₀ IS false but you fail to reject it. Power = 1−β = correctly detecting a real effect.

Type I (α)

Rejecting H₀ when H₀ is actually TRUE — false positive, crying wolf

Type II (β)

Failing to reject H₀ when H₀ is actually FALSE — false negative, missing the effect

Power

1−β = probability of correctly rejecting a false H₀ (catching the wolf)

📖 Full Lesson

📊 Statistics · Scatterplots

DUFUS: Direction · Unusual · Form · Unusual clusters · Strength

Describing Scatterplots — D=Direction · U=Unusual points · F=Form · U=Unusual clusters · S=Strength

Describe any scatterplot completely and earn full credit

Direction (positive/negative), Unusual points (outliers or influential points), Form (linear or curved), Unusual clusters, Strength (strong/moderate/weak). Always reference the actual variables in context!

📖 Full Lesson

📊 Statistics · Confidence Intervals

PANIC: Parameter · Assumptions · Name · Interval · Conclusion

Confidence Interval Steps — P=Parameter · A=Assumptions · N=Name interval · I=Interval (calculate) · C=Conclusion

Build and interpret any confidence interval correctly

CI = point estimate ± (critical value × standard error). A 95% CI means: if you repeated the procedure many times, 95% of intervals would capture the true parameter. Never say "95% probability the parameter is in this interval."

Parameter — define the population parameter being estimated

Assumptions — check random, normal, and independence conditions

Name — state the specific interval procedure (one-sample t-interval, etc.)

Interval — calculate: point estimate ± (critical value × SE)

Conclusion — interpret in context with the confidence level stated

📖 Full Lesson

📊 Statistics · Probability

AND = Multiply. OR = Add (subtract overlap).

Probability Rules — P(A and B)=P(A)·P(B|A) · P(A or B)=P(A)+P(B)−P(A and B)

Never mix up AND and OR probability rules

P(A AND B) = P(A)×P(B|A). If independent: P(A)×P(B). P(A OR B) = P(A)+P(B)−P(A AND B). If mutually exclusive: P(A)+P(B). Remember: AND=multiply, OR=add and subtract overlap.

📖 Full Lesson

📊 Statistics · Regression

r is direction+strength (−1 to +1). r² is % of variation explained.

Correlation and Determination — r=correlation coefficient · r²=coefficient of determination

Interpret r and r² correctly — always tested on exams

r = direction and strength of linear relationship (−1 to +1). r=1 or −1 = perfect linear, r=0 = no linear relationship. r² = proportion of variation in y explained by x. Always interpret r² as a percentage.

📖 Full Lesson

📊 Statistics · Outliers

IQR Fence Rule: Outlier if < Q1−1.5×IQR or > Q3+1.5×IQR

Outlier Detection — IQR=Q3−Q1 · Lower fence=Q1−1.5×IQR · Upper fence=Q3+1.5×IQR

Identify outliers objectively using the 1.5×IQR fence

IQR = Q3−Q1. Lower fence = Q1−1.5×IQR. Upper fence = Q3+1.5×IQR. Any value outside these fences is an outlier. This is the standard rule used in box plots and statistical reports.

📖 Full Lesson

📊 Statistics · Sampling

RSVCS: Random · Stratified · Voluntary · Cluster · Systematic

Sampling Methods — R=Random (SRS) · S=Stratified · V=Voluntary response · C=Cluster · S=Systematic

Identify and distinguish all five sampling methods

Random (SRS): every individual equally likely. Stratified: sample randomly from each stratum. Cluster: randomly select whole clusters. Systematic: every kth individual. Voluntary response: self-selected — most prone to bias.

Random (SRS) — Simple Random Sample: every individual has equal chance of selection

Stratified — divide into groups (strata), randomly sample from each stratum

Voluntary response — people self-select; most biased method — strong opinions over-represented

Cluster — divide into clusters, randomly select entire clusters to survey

Systematic — every kth individual from a list (e.g. every 10th person)

📖 Full Lesson

📊 Statistics · Inference

If p is low, H₀ must go!

P-value Decision Rule — p<α: reject H₀ · p≥α: fail to reject H₀ · never "accept H₀"

Interpret p-values correctly — the most tested concept in inference

The p-value is the probability of getting results as extreme as observed, ASSUMING H₀ is true. If p<α (usually 0.05): reject H₀. Never say "accept H₀" — only "fail to reject." Low p-value means data is unlikely under H₀.

📖 Full Lesson

📊 Statistics · Measures

Mean is sensitive to outliers. Median is resistant.

Center Measures — Mean=sum÷n (affected by outliers) · Median=middle value (resistant to outliers)

Choose the right measure of center for skewed data

Mean = sum of all values ÷ n. Sensitive to outliers — one extreme value pulls it dramatically. Median = middle value when sorted. Resistant to outliers. For skewed distributions: report the median. For symmetric: either works.

📖 Full Lesson

Combinations vs Permutations

Order matters = Permutation. Order does not matter = Combination. nPr = n!/(n-r)! and nCr = n!/[r!(n-r)!]

When to use permutations versus combinations — the most common counting mistake

Ask: does rearranging the selection give a different outcome? Yes means permutation. No means combination.

Permutation nPr: ordered arrangements — passwords, race results, seating. Formula: n factorial divided by (n minus r) factorial. Combination nCr: unordered selections — committees, card hands, pizza toppings. Formula: n factorial divided by (r factorial times (n minus r) factorial). nCr equals nPr divided by r factorial — you divide out the ways to rearrange each group.

Order matters = P

Passwords, rankings, arrangements — use nPr

Order does not = C

Committees, hands, groups — use nCr

C = P divided by r!

Combinations divide out the redundant arrangements

📖 Full Lesson

Binomial Distribution

BINS — Binary outcomes, Independent trials, fixed Number n, Same probability p each trial.

Four conditions required for a binomial distribution to apply

Mean = np. Standard deviation = square root of np(1-p). All four BINS conditions must hold.

BINS: Binary (success or failure only), Independent trials, fixed Number n, Same probability p. Formula: P(X equals k) = nCk times p to the k times (1 minus p) to the (n minus k). Mean equals np. Standard deviation equals square root of np(1 minus p). Use normal approximation when np and n(1 minus p) are both at least 10.

BINS

Binary, Independent, fixed N, Same p — all four must hold

Mean = np

Expected number of successes in n trials

SD = sqrt(np(1-p))

Spread of the distribution around the mean

📖 Full Lesson

Z-Score

z = (x minus mu) divided by sigma. Positive means above mean. Negative means below. Absolute value over 2 is unusual.

Standardize any value to see how many standard deviations it is from the mean

Convert to z-score then use the z-table. The table always gives area to the LEFT.

z equals (x minus mu) divided by sigma. Positive z is above mean, negative is below, zero is at the mean. For probability P(X less than x): convert to z then look up left area. P(X greater than x) equals 1 minus P(X less than x). To find x from percentile: x equals mu plus z times sigma. For sample means use sigma divided by square root of n as the denominator.

z = (x-mu)/sigma

Standardize any value to its z-score

Table = left area

Always gives area to the LEFT — adjust for right or between

Reverse: x = mu+z*sigma

Find the x-value at a given percentile

📖 Full Lesson

Central Limit Theorem

CLT: sample means are approximately Normal with mean mu and standard error sigma divided by square root of n — for any population when n is at least 30.

Why we can use normal calculations for sample means even when the population is not normal

The CLT justifies z and t procedures on sample data — it is the foundation of all inference

For random samples from any population with mean mu and standard deviation sigma, the sampling distribution of x-bar is approximately normal with mean mu and standard error sigma divided by square root of n when n is at least 30. Larger n gives smaller standard error. The distribution of x-bar always centers at the true population mean. Doubling n reduces standard error by a factor of square root of 2.

Shape

Approximately Normal for n at least 30 regardless of population shape

Center

Mean of sampling distribution equals population mean mu

Spread

Standard error equals sigma divided by square root of n

📖 Full Lesson

Chi-Square Test

Chi-square = sum of (Observed minus Expected) squared divided by Expected. Always right-tailed. Expected = row total times column total divided by grand total.

Testing whether observed categorical data fits a claimed distribution or shows association

All expected counts must be at least 5. Larger chi-square means stronger evidence against the null.

Goodness of Fit tests if counts match a claimed distribution — df equals categories minus 1. Independence test checks association between two categorical variables — df equals (rows minus 1) times (columns minus 1). Expected count equals row total times column total divided by grand total. Always right-tailed. Reject null if chi-square exceeds critical value or p-value is less than alpha.

Formula

Sum of (O minus E) squared divided by E over all cells

Expected count

Row total times column total divided by grand total

Always right-tailed

Chi-square is always positive — large values fall to the right

📖 Full Lesson

t vs z Distribution

Know sigma? Use z. Do not know sigma? Use t. t has heavier tails and approaches z as degrees of freedom increase.

When to use t-procedures versus z-procedures for inference about means

In practice sigma is almost never known — use t for nearly all real-world inference about means

Use z when population sigma is known (rare). Use t when sigma is unknown — use sample standard deviation s instead. Degrees of freedom equals n minus 1 for one-sample t. The t-distribution is bell-shaped and symmetric but with heavier tails than z. As degrees of freedom increase, t approaches z. t-interval: x-bar plus or minus t-star times s divided by square root of n.

sigma known = z

Rare in practice — only when population SD is given

sigma unknown = t

Use sample SD s — df equals n minus 1

Heavier tails

t-star is larger than z-star for the same confidence level

📖 Full Lesson

🎓 Common Exam Questions

Q: What does SOCS stand for and what should you include for each component?

A: SOCS: Shape · Outliers · Center · Spread. Shape: symmetric, skewed left (tail pulls left), skewed right (tail pulls right), bimodal, or uniform. Outliers: note unusual values using 1.5×IQR rule; identify them by value. Center: mean for symmetric distributions, median for skewed (resistant to outliers). Spread: standard deviation paired with mean, or IQR paired with median. Always reference the actual variable name in context.

Q: What does PHANTOMS stand for and what are the 8 steps of a complete hypothesis test?

A: PHANTOMS: Parameter (define in context using μ, p, etc.) · Hypotheses (state H₀ and Hₐ with correct symbols and direction) · Assumptions (random sampling, normality, independence/10% condition) · Name test (one-sample t-test, two-proportion z-test, etc.) · Test statistic (calculate, show formula and work) · Obtain p-value (from table or calculator, state direction) · Make decision (p<α: reject H₀; p≥α: fail to reject H₀) · State conclusion (in context — never say "accept H₀"; say "there is/is not sufficient evidence...").

Q: Explain Type I and Type II errors with an example.

A: Type I Error (α): Rejecting H₀ when it is actually true. Example: concluding a drug works when it actually does not (false positive). α is the probability of making a Type I error — set in advance (usually 0.05). Type II Error (β): Failing to reject H₀ when it is actually false. Example: concluding a drug does not work when it actually does (false negative). Power=1−β. Trade-off: lowering α reduces Type I errors but increases Type II errors.

Q: What does PANIC stand for and how do you correctly interpret a confidence interval?

A: PANIC: Parameter · Assumptions · Name interval · Interval · Conclusion. Formula: point estimate ± (critical value × standard error). Correct interpretation: "We are 95% confident the true [parameter] is between [lower] and [upper]." Common mistake: "There is a 95% probability the parameter is in this interval" — WRONG after the interval is computed. The 95% refers to the long-run capture rate of the procedure, not probability about the specific interval.

Q: What is the 68-95-99.7 Rule and how is it applied?

A: In a normal distribution: 68% within μ±1σ, 95% within μ±2σ, 99.7% within μ±3σ. Application: if test scores are N(70,10), then 68% scored 60–80, 95% scored 50–90, 99.7% scored 40–100. To find percentage above/below a value, use symmetry: 95% within 2σ means 5% outside, so 2.5% in each tail. This rule applies ONLY to normal distributions.

Memory tricks for statistics

Memory Tricks