See also Probability, where we try to predict future events based on statistics.
Parent Statistics and Probability.
Tell a story: good or bad. Stretch or squeeze the axes.
sports stats
nba.com
Data
Variability
Statistical Questions: to answer, you need to collect data with variability
Population and Sample
Bias
Correlation vs causality
types of statistical studies
“margin of error”
Experiment Design
\begin{align} r = \frac{1}{n-1} \sum \left ( \frac{x_i - \overline{x} }{s_x} \right ) \left ( \frac{y_i - \overline{y} }{s_y} \right ) \end{align}
\begin{align} y &= actual y \\ \hat{y} &= predicted y \\ \overline{y} &= mean of all y's \\ \end{align}
Represented by r-squared.
What percentage of the total variation in y is described by the variation in x?
What percentage of the variation described by the line?
How good is the li{ne as a predictor?
Total variation is the squared error of y from the mean of y.
\begin{align} r^2 &= 1 - \frac{SE_{LINE}}{SE\bar{y}} \\ SE_{LINE} &= \text{variation described by the line} \\ SE\bar{y} &= \text{total variation in y from the mean} \\ \end{align}
SSE_{LINE} && \text{variation described by the line}
Squared error for the line, the lower the value, the better the fit.
Coefficient of determination normalizes the squared error by making it a percentage and a probability.
The smaller the squared error, the higher the coefficient, the greater the probabiliy the line is a good fit.
Khan Unit: Confidence Intervals
95% confidence based on 2 standard deviations
the margin of error is 2 standard deviations
the confidence interval is the $\hat{p}$ plus or minus the margin of error
we want to know the proportion of the population that favors a candidate
sample the population
\begin{align} \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \\ \end{align}
calculate $\hat{p}$ as the proportion of the sample that favors your candidate
from that $\hat{p}$, n and 95%, calculate the margin of error
ratio - comparison of two quantities
part-to-part ratio
proportion - equality of two ratios, can be used for interpolation
percentage - a fraction with 100 in the denominator
percentage proportion - an equality of two ratios where the second ratio has a denominator of 100
binomial - the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome:
A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment,
and a sequence of outcomes is called a Bernoulli process;
for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution.
The binomial distribution is the basis for the popular binomial test of statistical significance.
two-point - a random variable which can take 1 of 2 possible values
bernoulli - a two-point distributions where the two possible outcomes are 0 and 1
variable x = a single value
random variable X = a number of values, results of experiments, to calculate P(X)
median, range, IQR
box-and-whisker plot
box plot vs dot plot
dot plot
mean absolute deviation (MAD)
central tendancy
spread
with a tight distribution, mean and standard deviation work best
with a skewed distribution, median and interquartile range work best
do algebra on the variance formula
\begin{align} \mu &= \frac{\sum_{x=1}^{N}(x_i)}{N} &&\text{mean formula}\\ \sigma^2 &= \frac{\sum_{x=1}^{N}(x_i-\mu)^2}{N} &&\text{variance formula 1}\\ \sigma^2 &= \frac{\sum_{x=1}^{N}(x_i)^2}{N} - \mu^2 &&\text{variance formula 2}\\ \sigma^2 &= \frac{\sum_{x=1}^{N}(x_i)^2}{N} - \frac{\left [\sum_{x=1}^{N}(x_i)\right ]^2}{N^2} &&\text{variance formula 3}\\ \end{align}
We can work with an entire population.
Or we can work with a sample from the population.
The math is the same except for two things.
y axis is percent instead of raw frequency count
Make histogram bars more and more narrow until the top becomes a line.
The data points can take on any value in a continuum, as opposed to being lumped into coarse buckets.
Area under the curve is 100%.
The curve will never go negative.
Measure the area under the density curve between two values, to get the percentage of data points falling between those two values. This can sometimes be estimated by calculating the area of the rectangle.
You cannot calculate the percent of a single value, because there is no rectangle. The line has no width.
mean and median are equal, both cut the area under the curve in half
in the bell curve, the mode is also equal to the mean and median
picture a bimodal symmetric curve
the area under the left half is equal to the area under the right half
aka skewed
mean will be towards the long tail from the median
long tail to the right ⇒ right-skewed distribution
and vice versa
Distribution
Frequency Distribution
Binomial Distribution
Percentile
Density Curve
Probability Distribution
Cumulative Probability Distribution
frequency - table of actual results
relative frequency - division to create percentage
probability = theoretical probability = relative frequency of the entire population
$$ probability = \frac{\text{number of successful outcomes}}{\text{number of possible outcomes}}$$
$$ relative frequency = \frac{\text{number of successful outcomes}}{\text{number of trials}}$$
each trial has boolean result, ie, success or failure
trial results are independent
fixed number of trials
same probability in each trial
number of trials until success
uses factorial in the formula
is discreet function
right-skewed
| Binomial | Geometric | |
| each trial has boolean result | x | x |
| trial results are independent | x | x |
| same probability in each trial | x | x |
| fixed number of trials | x | not |
Examples
Binomial: How many sixes in 12 rolls of the die?
Geometric: How many rolls of a die until one six?
Look at past events and organize them into patterns which tell a story and allow us to understand how, when, and why things happen.
The graph of a distribution is a curve.
There are several kinds of distributions.
discrete vs continuous
distributions
Naive Bayes Classifier in AI is based on Bayes' Theorem in probability theory.
aka Bell Curve or Gauss Distribution.
The _normal distribution_ is given by the equation $$e^{\frac{-x^2}{2}}$$
When we actually have input data, we will use this equation. $$y = \frac{1}{\sigma \sqrt[]{2\pi }}e^{\frac{(x-\mu )^2}{2\sigma ^2}}$$
Characteristics:
Salmon Khan uses problems from ck12.org open source flex book: AP Statistics
Empirical Rule: 68 - 95 - 99.7
a distribution where
$$\mu = 0 \text{ and } \sigma = 1$$
z-score = number of standard deviations away from the mean
$$ \text{z-score} = \frac{x - \mu}{\sigma}$$
allows comparison of values on different scales and distributions, like LSAT and MCAT
Standard Normal Table, aka z-table - a table based on the Standard Normal Distribution
gives cumulative probability for any z-score
where cumulative probability is the area under the curve to the left of the z-score
How to find the cumulative probability of a value in the distribution.
Walter Scheidel: The Great Leveler https://press.princeton.edu/books/paperback/9780691183251/the-great-leveler the only things that can level wealth inequality are: war revolution state collapse plague
nomal iq conscientiousness openness
pareto productivity creative output
Price's Law Derek De Sola Price 1960s study: a vanishingly small proportion of scientests operating in a given domain, produce half the output a tiny number of people produce almost all of everything aka square root law The square root of a number of people in a domain produce half the output if you have 10 employees, 3 of them produce half the output if you have 100 employees, 10 of them produce half the output if you have 10,000 employees, 100 of them produce half the output
The Matthew Principle From those who have everything, more will be given. From those who have nothing, all will be taken. (An economic principle, copying language from the new testament.)
iq, conscientiousness, openness are normally distributed and are good predictors of long-term performance but creative output is NOT normally distributed
2017 Personality 21: Biology & Traits: Performance Prediction https://www.youtube.com/watch?v=Q7GKmznaqsQ&t=0s
2017 Personality 18: Biology & Traits: Openness, Intelligence, Creativity https://www.youtube.com/watch?v=D7Kn5p7TP_Y&t=5393s
YouTube: Jordan Petersen - IQ and the Job Market
YouTube: Jordan Peterson - Controversial Facts about IQ
YouTube: Jordan Peterson - The Big IQ Controversy
The SAT, GRE, the LSAT, all of those are IQ tests. They are more crystallized than fluid.
IQ 145 to 160 - to be the best. 1 in 10,000 116 to 130 - 95-86 percentile 115-110 - 85-73 percentile below 87 - no jobs below 83 - 10% of pop. , cannot join the army
YouTube: Udacity: IQ score distribution