π’ Statistics
Mean = Add and Divide. Median = Middle. Mode = Most.
Measures of Central Tendency
Mean, median, and mode β locked in 3 seconds
Mean: add all values, divide by count. Median: the middle value when sorted. Mode: the value that appears most often. One sentence each.
π’ Statistics
"Reject if p is less than alpha"
Hypothesis Testing Decision Rule
When to reject the null hypothesis β always
If your p-value < Ξ± (usually 0.05), reject Hβ. If p > Ξ±, fail to reject. You never "accept" Hβ β you only fail to reject it.
π’ Statistics
68 Β· 95 Β· 99.7
Empirical Rule (Normal Distribution)
The empirical rule β the three numbers every stats student memorizes
68% of data falls within 1 SD. 95% within 2 SD. 99.7% within 3 SD. These three numbers cover virtually every normal distribution question.
π’ Statistics
Type I = False Alarm. Type II = Missed Call.
Error Types
Type I and Type II errors β impossible to mix up
Type I (Ξ±): reject Hβ when it's true β a false alarm. Type II (Ξ²): fail to reject Hβ when it's false β a missed call. Think: crying wolf vs. ignoring the wolf.
π’ Statistics
r closer to Β±1 = stronger
Correlation Coefficient
Reading correlation: closer to 1 or β1 is stronger
r = +1 is perfect positive. r = β1 is perfect negative. r = 0 means no linear relationship. The closer to either extreme, the stronger the correlation.
Standard Deviation
Standard deviation = spread of data. Small SD = data clustered near mean. Large SD = spread out.
Standard Deviation
How much the data typically varies from the mean
Variance = average squared deviation from mean. SD = βvariance. Low SD: data points cluster tightly around the mean. High SD: data is spread widely. About 68% of data falls within 1 SD of the mean in a normal distribution (68-95-99.7 rule).
Basic Probability Rules
Probability: P(A and B) = P(A) Γ P(B) if independent. P(A or B) = P(A) + P(B) - P(A and B).
Basic Probability Rules
Two essential probability formulas β AND and OR
AND (both events occur): multiply probabilities if independent. P(heads AND heads) = 0.5 Γ 0.5 = 0.25. OR (at least one occurs): add probabilities, subtract the overlap. P(A or B) = P(A) + P(B) - P(Aβ©B). For mutually exclusive events: P(A or B) = P(A) + P(B).
Confidence Intervals
Confidence interval: estimate Β± margin of error. Wider CI = less precise but more confident.
Confidence Intervals
A range of plausible values for a population parameter
95% CI means: if you repeated the study 100 times, about 95 of the intervals would contain the true population parameter. Wider interval = more confident but less precise. Increasing sample size narrows the interval without sacrificing confidence.
Correlation Coefficient
Correlation vs causation: r measures linear relationship strength, NOT cause and effect
Correlation Coefficient
What r tells you β and what it doesn't
r ranges from -1 to +1. r = 1: perfect positive linear relationship. r = -1: perfect negative. r = 0: no linear relationship. Strong correlation does NOT mean one variable causes the other. Always look for lurking variables (confounders).
Normal Distribution
Normal distribution: symmetric, bell-shaped. Mean = median = mode. Described by ΞΌ and Ο.
Normal Distribution
The bell curve β the most important distribution in statistics
Perfectly symmetric around the mean. 68% of data within 1Ο, 95% within 2Ο, 99.7% within 3Ο. Z-score = (x - ΞΌ)/Ο converts any normal distribution to standard normal (ΞΌ=0, Ο=1). Use z-table to find probabilities.
Chi-Square Test
Chi-square test: tests whether observed frequencies differ from expected frequencies
Chi-Square Test
Testing whether categorical data fits a pattern or shows an association
ΟΒ² = Ξ£(observed - expected)Β²/expected. Large ΟΒ² β observed data far from expected β more evidence against null hypothesis. Two uses: goodness-of-fit (does data fit a distribution?) and test of independence (are two categorical variables related?).
Linear Regression
Regression line: Ε· = bβ + bβx. Slope bβ = change in y per unit change in x. Intercept bβ = y when x=0.
Linear Regression
The line of best fit β predicting one variable from another
The regression line minimizes the sum of squared residuals (least squares). Slope: for each 1-unit increase in x, y changes by bβ units. Only predict within the range of your data (don't extrapolate). RΒ² = proportion of variation in y explained by x.
π Statistics
"Pie Γ la Mode" β Mode is Most Popular
Mode
The tastiest way to remember what mode means
"Pie Γ la mode" = fashionable in French. MODE = MOST POPULAR number in the dataset.
π Statistics
Median = Middle of the Highway
Median
The median strip runs down the center
Median strip = CENTER of highway. Median = CENTER value when numbers are in order. Resistant to outliers unlike mean.
π Statistics
"Bill Gates walks into a bar..."
Outliers: Mean vs Median
Why income data uses median not mean
Bill Gates walks in β average shoots up but no one feels richer. Outliers pull MEAN not MEDIAN.
π Statistics
68 Β· 95 Β· 99.7 β "The Radio Station Rule"
Empirical Rule
Three numbers that describe all normal distributions
68% within 1 SD. 95% within 2. 99.7% within 3. Think: "68.95 FM β the 99.7!"