카테고리 없음

[Lec 2] Bayes Rule / probalbility

Minwoo 2020. 1. 18.

 

 


 

 

 

 P(y|x) = P(x|y) P(y) / P(x)

 

P(x|y)

"generative distribution" 

 

이것은 class y가 주어질 때, data x가 어떻게 생겼을지 말한다.

 

 

 


 

 

* Gaussian

 

- (_______________) :

       Probability densitu of one data, X

 

- Multiple Samples :

      Assume I.I.D (Independently And (_______________)

 

- Joint Probability Density;

      변수들이 독립적이라고 가정했기 때문에, 각각의 확률을 곱한다.

      (_______________)

 

 

 

 

 


 

* Data Likelihood

 

 

· Paraneter가 주어질 때, data들에 대한 확률.

E.g. p(data|parameter)

 

 

· Parameter들은 모델에 의존적.

e.g 가우시안, 베타, 감마.. etc

 

 

· (_______________)

어떠한 M이 best인가? ↑ (Likelihood Maximize하기 위해서)

 

 

 

 

· (_______________)

 

 


 

 

 

* Maximum Likelihood Click-Through Rate

 

-CTR = Click through rate; Conversion rate

ex) e-commerce, advertising

 

- 2 possible outcomes,, click/don't click

 

- Just Bernoulli distribution

 

·(_______________)

 

 

 


 

 

* Confidence Intervals

 

Non-Bayesian/Frequentist method of dealing with uncertainty of

measurement of parameters

 

 

 


 

* Sum of Random Variables

-Sum of I.I.D random variables → d=Gaussain Distribution

 

 

 


 

* Distribution of estimate

 

· As we collect more samples (N), variance decreases

· Mu and sigma refer to mean/std dev of X                                      

· Mu-estimate should have the same mean as X                                

· More variance in X should lead to more variance in mu-estimate                           

 

 

 

 


 

 

* Sum of Random Variables

 

· Variance grows proportionally to a variance of X                             

· Only decreases by the square root of N                                           

· Therefore, need to collect many more samples to account for larger variance             

 

·(_______________)

 

 

 


 

* Confidence Intervals

 

· We want to know the range of values that are likely to contain true ??      

· Shade in 95% of middle of Gaussian, we can (almost) say " ?? is probably here"

· Note: "95% CI" doesn't tell us "?? is in this interval with probability 95%"

· In reality, all we can say is: if we did many experiments to calculate the sample       

  mean, 95% of the time, those confidence intervals would contain the true??

 

 


 

* Confidence Level/Significance Level

 

· We call the confidence level 1 - ?                                                     

· We call the significance level ?                                                                

· We'll see significance level again later with statistical testing                                 

 


 

* Confidence interval limits

 

· We want the min/max value for the range where ? should lie                        

· Let's call them (_______________)                                                              

· We want to find the limits such that the area under the Gaussian is 0.95  

· Again, calculus provides the tools - integral                                           

(_______________)  

 

 

 

 


 

* Confidence Interval Limits

 

· Standardize the normal and rescale                                                           

· New limits (_______________)                                                                  

 

                                                  

(_______________)

 

 

 


 

* Confidence Distribution Function (CDF)

 

· Can we make use of this?                                                                        

 

(_______________)    

 

· Gaussian is symmetric                                                                       

· So if we want 5% on the tail ends, then we want each tail to be 2.5%       

                                                              

 

 

 


 

     

* CDF

 

       · In other words, ???? should give us an area of 1 - 0.05/2 = 0.975                   

 

(_______________)

 

 

 


 

* Inverse CDF

 

· Scipy has a function to do this                                                                 

· scipy.stats.norm.ppf                                                                          

· ppf = percent point function, because of statisticians like crazy names         

· Since Gaussian is symmetric:                                                               

 

 

(_______________)

OR

(_______________)

 

 

 


 

 

* Inverse CDF

 

· These are pretty standard calculations, so in many textbooks they'll just           

  use rounded-off numbers, (_______________)                                            

 

 

 

 

 


 

* Confidence Interval

                                                                         

 

(_______________)    

 

· We don't actually know ?                                                                    

· But this is a valid approximation                                                        

 

 

 


                                   

* Confidence Interval Approximation

                                                                         

 

(_______________)    

 

· To find the non-approximated version, we would use the inverse CDF of

  the t-distribution - the outside scope of this course                        

· We are fine with Gaussian approximation because it will help with Bernoulli     

· In fact, we'll use the exact same interval:           

                             

(_______________)              

 

 

 


 

* Bernoulli Confidence Interval

                                                                         

 

· We've just replaced the Gaussian symbols with Bernoulli symbols,   

  it's the same formula                                                            

 

(_______________)    

 

 

 


 

* Summary

 

· Apply CLT to show that maximum likelihood estimate of ?? is Gaussian-distributed 

· Find left/right limits that capture 95% of the probability for where ? could be 

  Scale it to standard normal (mean 0, var 1)                                   

· Rescale it back: X = mean + Z*stddev                                            

· Remember: stddev of ?-estimate scales proportionally to stddev(x), but

  inversely proportional to sqrt(N)                                                 

· Katerm we'll see that Bayesian methods of quantifying this uncertainty are

  more systematic and elegant.                                                       

 

 


 

* Bayesian Paradigm

 

· We call the above "Frequentist statistics"                                                         

· Parameters of distribution are set, we just don't know what they are

· Data is then randomly generated via those distributions/parameters 

 

 

· "Bayesian statistics" - opposite situation                                      

· Parameters are random variables that have distributions                  

· Data is fixed                                                                             

· Does this better reflect reality?                                                    

· In this way, we can model p(param | data)                                    

 

Frequentist (Point estimate): 

(_______________)    

 

 

       Bayesian (distribution):

(_______________)    

 

 

댓글