[Lec 2] Bayes Rule / probalbility

P(y|x) = P(x|y) P(y) / P(x)
P(x|y)
↓
"generative distribution"
이것은 class y가 주어질 때, data x가 어떻게 생겼을지 말한다.
* Gaussian
- (_______________) :
Probability densitu of one data, X
- Multiple Samples :
Assume I.I.D (Independently And (_______________)
- Joint Probability Density;
변수들이 독립적이라고 가정했기 때문에, 각각의 확률을 곱한다.
(_______________)
* Data Likelihood
· Paraneter가 주어질 때, data들에 대한 확률.
E.g. p(data|parameter)
· Parameter들은 모델에 의존적.
e.g 가우시안, 베타, 감마.. etc
· (_______________)
어떠한 M이 best인가? ↑ (Likelihood Maximize하기 위해서)
· (_______________)
* Maximum Likelihood Click-Through Rate
-CTR = Click through rate; Conversion rate
ex) e-commerce, advertising
- 2 possible outcomes,, click/don't click
- Just Bernoulli distribution
·(_______________)
* Confidence Intervals
Non-Bayesian/Frequentist method of dealing with uncertainty of
measurement of parameters
* Sum of Random Variables
-Sum of I.I.D random variables → d=Gaussain Distribution
* Distribution of estimate
· As we collect more samples (N), variance decreases
· Mu and sigma refer to mean/std dev of X
· Mu-estimate should have the same mean as X
· More variance in X should lead to more variance in mu-estimate
* Sum of Random Variables
· Variance grows proportionally to a variance of X
· Only decreases by the square root of N
· Therefore, need to collect many more samples to account for larger variance
·(_______________)
* Confidence Intervals
· We want to know the range of values that are likely to contain true ??
· Shade in 95% of middle of Gaussian, we can (almost) say " ?? is probably here"
· Note: "95% CI" doesn't tell us "?? is in this interval with probability 95%"
· In reality, all we can say is: if we did many experiments to calculate the sample
mean, 95% of the time, those confidence intervals would contain the true??

* Confidence Level/Significance Level
· We call the confidence level 1 - ?
· We call the significance level ?
· We'll see significance level again later with statistical testing

* Confidence interval limits
· We want the min/max value for the range where ? should lie
· Let's call them (_______________)
· We want to find the limits such that the area under the Gaussian is 0.95
· Again, calculus provides the tools - integral
(_______________)
* Confidence Interval Limits
· Standardize the normal and rescale
· New limits (_______________)
(_______________)
* Confidence Distribution Function (CDF)
· Can we make use of this?
(_______________)
· Gaussian is symmetric
· So if we want 5% on the tail ends, then we want each tail to be 2.5%
* CDF
· In other words, ???? should give us an area of 1 - 0.05/2 = 0.975
(_______________)
* Inverse CDF
· Scipy has a function to do this
· scipy.stats.norm.ppf
· ppf = percent point function, because of statisticians like crazy names
· Since Gaussian is symmetric:
(_______________)
OR
(_______________)
* Inverse CDF
· These are pretty standard calculations, so in many textbooks they'll just
use rounded-off numbers, (_______________)
* Confidence Interval
(_______________)
· We don't actually know ?
· But this is a valid approximation
* Confidence Interval Approximation
(_______________)
· To find the non-approximated version, we would use the inverse CDF of
the t-distribution - the outside scope of this course
· We are fine with Gaussian approximation because it will help with Bernoulli
· In fact, we'll use the exact same interval:
(_______________)
* Bernoulli Confidence Interval
· We've just replaced the Gaussian symbols with Bernoulli symbols,
it's the same formula
(_______________)
* Summary
· Apply CLT to show that maximum likelihood estimate of ?? is Gaussian-distributed
· Find left/right limits that capture 95% of the probability for where ? could be
Scale it to standard normal (mean 0, var 1)
· Rescale it back: X = mean + Z*stddev
· Remember: stddev of ?-estimate scales proportionally to stddev(x), but
inversely proportional to sqrt(N)
· Katerm we'll see that Bayesian methods of quantifying this uncertainty are
more systematic and elegant.
* Bayesian Paradigm
· We call the above "Frequentist statistics"
· Parameters of distribution are set, we just don't know what they are
· Data is then randomly generated via those distributions/parameters
· "Bayesian statistics" - opposite situation
· Parameters are random variables that have distributions
· Data is fixed
· Does this better reflect reality?
· In this way, we can model p(param | data)
Frequentist (Point estimate):
(_______________)
Bayesian (distribution):
(_______________)
댓글