카테고리 없음

[Lec 2] Bayes Rule / probalbility

Minwoo 2020. 1. 18.

P(y|x) = P(x|y) P(y) / P(x)

P(x|y)

↓

"generative distribution"

이것은 class y가 주어질 때, data x가 어떻게 생겼을지 말한다.

* Gaussian

- (_______________) :

Probability densitu of one data, X

- Multiple Samples :

Assume I.I.D (Independently And (_______________)

- Joint Probability Density;

변수들이 독립적이라고 가정했기 때문에, 각각의 확률을 곱한다.

(_______________)

* Data Likelihood

· Paraneter가 주어질 때, data들에 대한 확률.

E.g. p(data|parameter)

· Parameter들은 모델에 의존적.

e.g 가우시안, 베타, 감마.. etc

· (_______________)

어떠한 M이 best인가? ↑ (Likelihood Maximize하기 위해서)

· (_______________)

* Maximum Likelihood Click-Through Rate

-CTR = Click through rate; Conversion rate

ex) e-commerce, advertising

- 2 possible outcomes,, click/don't click

- Just Bernoulli distribution

·(_______________)

* Confidence Intervals

Non-Bayesian/Frequentist method of dealing with uncertainty of

measurement of parameters

* Sum of Random Variables

-Sum of I.I.D random variables → d=Gaussain Distribution

* Distribution of estimate

· As we collect more samples (N), variance decreases

· Mu and sigma refer to mean/std dev of X

· Mu-estimate should have the same mean as X

· More variance in X should lead to more variance in mu-estimate

* Sum of Random Variables

· Variance grows proportionally to a variance of X

· Only decreases by the square root of N

· Therefore, need to collect many more samples to account for larger variance

·(_______________)

* Confidence Intervals

· We want to know the range of values that are likely to contain true ??

· Shade in 95% of middle of Gaussian, we can (almost) say " ?? is probably here"

· Note: "95% CI" doesn't tell us "?? is in this interval with probability 95%"

· In reality, all we can say is: if we did many experiments to calculate the sample

mean, 95% of the time, those confidence intervals would contain the true??

* Confidence Level/Significance Level

· We call the confidence level 1 - ?

· We call the significance level ?

· We'll see significance level again later with statistical testing

* Confidence interval limits

· We want the min/max value for the range where ? should lie

· Let's call them (_______________)

· We want to find the limits such that the area under the Gaussian is 0.95

· Again, calculus provides the tools - integral

(_______________)

* Confidence Interval Limits

· Standardize the normal and rescale

· New limits (_______________)

(_______________)

* Confidence Distribution Function (CDF)

· Can we make use of this?

(_______________)

· Gaussian is symmetric

· So if we want 5% on the tail ends, then we want each tail to be 2.5%

* CDF

· In other words, ???? should give us an area of 1 - 0.05/2 = 0.975

(_______________)

* Inverse CDF

· Scipy has a function to do this

· scipy.stats.norm.ppf

· ppf = percent point function, because of statisticians like crazy names

· Since Gaussian is symmetric:

(_______________)

* Inverse CDF

· These are pretty standard calculations, so in many textbooks they'll just

use rounded-off numbers, (_______________)

* Confidence Interval

(_______________)

· We don't actually know ?

· But this is a valid approximation

* Confidence Interval Approximation

(_______________)

· To find the non-approximated version, we would use the inverse CDF of

the t-distribution - the outside scope of this course

· We are fine with Gaussian approximation because it will help with Bernoulli

· In fact, we'll use the exact same interval:

(_______________)

* Bernoulli Confidence Interval

· We've just replaced the Gaussian symbols with Bernoulli symbols,

it's the same formula

(_______________)

* Summary

· Apply CLT to show that maximum likelihood estimate of ?? is Gaussian-distributed

· Find left/right limits that capture 95% of the probability for where ? could be

Scale it to standard normal (mean 0, var 1)

· Rescale it back: X = mean + Z*stddev

· Remember: stddev of ?-estimate scales proportionally to stddev(x), but

inversely proportional to sqrt(N)

· Katerm we'll see that Bayesian methods of quantifying this uncertainty are

more systematic and elegant.

* Bayesian Paradigm

· We call the above "Frequentist statistics"

· Parameters of distribution are set, we just don't know what they are

· Data is then randomly generated via those distributions/parameters

· "Bayesian statistics" - opposite situation

· Parameters are random variables that have distributions

· Data is fixed

· Does this better reflect reality?

· In this way, we can model p(param | data)

Frequentist (Point estimate):

(_______________)

Bayesian (distribution):

(_______________)

[Lec 2] Bayes Rule / probalbility

* Gaussian

- (_______________) :

- Multiple Samples :

- Joint Probability Density;

* Data Likelihood

* Maximum Likelihood Click-Through Rate

* Confidence Intervals

* Sum of Random Variables

* Distribution of estimate

* Sum of Random Variables

* Confidence Intervals

* Confidence Level/Significance Level

* Confidence interval limits

* Confidence Distribution Function (CDF)

* CDF

* Inverse CDF

* Inverse CDF

* Confidence Interval

* Confidence Interval Approximation

* Bernoulli Confidence Interval

* Summary

* Bayesian Paradigm

댓글

티스토리툴바