diff --git a/main.org b/main.org index 0d4178f..e41abfe 100644 --- a/main.org +++ b/main.org @@ -1,5 +1,6 @@ #+TITLE: Probability and Statistics ( BTech CSE ) #+AUTHOR: Anmol Nawani +#+LATEX_CLASS: article #+LATEX_HEADER: \usepackage{amsmath} # *Statistics @@ -410,7 +411,7 @@ To get probability from a to b (inclusive and exclusive doesn't matter in contin + Mean \[ E(aX) = aE(X) \] \[ E(a) = a \] -\[ E(X + Y) = E(X) + E(Y) ] +\[ E(X + Y) = E(X) + E(Y) \] + Variance If @@ -576,3 +577,159 @@ Given, \[ X_2 \sim N(\mu_2, \sigma_2) \] Then, \[ a X_1 + b X_2 \sim N \left( a \mu_1 + b \mu_2, \sqrt{ a^2 \sigma_1^2 + b^2 \sigma_2^2} \right) \] +* Standard Normal Distribution + +The normal distribution with Mean 0 and Variance 1 is called the standard normal distribution. + +\[ Z \sim N(0,1) \] + +To calculate area under a given normal distribution, we can use the standard normal distribution. For that we need to calculate corresponding values in standard distribution from our given distribution. For that we have formula + +\[ For\ X \sim N(\mu, \sigma) \] +\[ z = \frac{x - \mu}{\sigma} \] +\[ x \rightarrow value\ in\ our\ normal\ distribution \] +\[ \mu \rightarrow mean\ of\ our\ distribution \] +\[ \sigma \rightarrow standard\ deviation\ of\ our\ distribution \] +\[ z \rightarrow corresponding\ value\ in\ standard\ normal\ distribution \] + +Example, + +Suppose for a normal distribution with X \sim N(\mu, \sigma) and we want to calculate probability P(a < X < b), then the ranges for same proability in the Z normal distribution will be, + +\[ z_1 = \frac{a - \mu}{\sigma} \] +\[ z_2 = \frac{b - \mu}{\sigma} \] +Now the proability in Z distribution is, +\[ P(z_1 < Z < z_2) \] +\[ P( \frac{a - \mu}{\sigma} < Z < \frac{b - \mu}{\sigma} ) \] + +So we need area under Z curve from a to b. +\\ +Then, we use the standard normal table to get the area. + ++ *Note* : The standard normal distribution is symmetric about the y axis. This fact can be used when calculating area under Z curve. + +* Joint Probability Mass Function +The joint probability mass distribution of two random variables X and Y is given by + +\[ f(x,y) = P(X=x, Y=y) \] + ++ For discrete +\[ \Sigma_x \Sigma_y f(x,y) = 1 \] + ++ For continious +\[ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)\ dx\ dy = 1 \] + +To get the probabilities, + +\[ P(a \le X \le b, c \le Y \le d ) = \int_c^d \int_a^b f(x,y)\ dx\ dy\] + +** Marginal probability distribution (from joint PMF) + ++ For discrete +\[ P(X=x) = f(x) = \Sigma_y f(x,y) \] +\[ P(Y=y) = f(y) = \Sigma_x f(x,y) \] + ++ For continious +\[ P(X=x) = f(x) = \int_{-\infty}^{\infty} f(x,y) dy \] +\[ P(Y=y) = f(y) = \int_{-\infty}^{\infty} f(x,y) dx \] + +** Conditional Probability for Joint PMF + +\[ P(X=x \mid Y=y) = f(x \mid y ) = \frac{ P(X=x, Y=y) }{ P(Y=y) } \] +\[ P(X=x \mid Y=y) = f(x \mid y) = \frac{ f(x,y) }{ f(y) } \] + +** Independant Random Variables + +The random variables X and Y are independant if, +\[ f(x,y) = f(x) f(y) \] + +** Moment of Joint Variables + +\[ E(X,Y) = E(XY) = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf(x,y) dx\ dy \] + +** Covaraince +The covariance of two random variables X and Y is given by, + +\[ cov(X,Y) = E(XY) - E(X)E(Y) \] + +*** Properties of covariance + ++ If X and Y are independant +\[ cov(X,Y) = 0 \] + ++ If variance of some random variable X is written var(X), then +\[ cov(X+Y, X-Y) = var(X) - var(Y) \] + ++ General of previous case +\[ cov(aX + bY, cX + dY) = ac . var(X) + bd . var(Y) + (ad + bc) . cov(X,Y) \] + +*** Variance of two random variables + +\[ var(aX + bY) = a^2 . var(X) + b^2 . var(Y) + 2ab . cov(X,Y) \] + +** Correlation + +The standard deviation of X is \sigma_X and standard deviation of Y is \sigma_Y. Then the correlation is given by, + +\[ \gamma(X,Y) = \rho_{XY} = \frac{cov(X,Y)}{\sigma_X \sigma_Y } \] + +here, \rho_{XY} lies between -1 and 1 +\[ -1 \le \rho_{XY} \le 1 \] + +** Conditional moments +\[ E(X \mid Y) = \int_{-\infty}^{\infty} x f(x \mid y ) dx \ will\ be\ a\ function\ of\ y \] + +# #+BEGIN_COMMENT + +* Useful equation +\[ n! = \int_0^\infty x^n e^{-x} dx \] +# #+END_COMMENT + +* Covariance in discrete data + +Suppose for two sets of discrete data, + +\[ X : x_1, x_2, x_3... x_n \] +\[ Y : y_1, y_2, y_3... y_n \] + +\[ cov(X,Y) = \frac{1}{n} \left( \sum_{i=1}^n x_i y_i \right) - [mean(x) . mean(y)] \] + +\[ n \rightarrow number\ of\ items \] +* Regression + +Regression is a technique to relate a dependent variable to one or more independant variables. + +** Lines of regression + +Both lines will pass through the point *(mean(x) , mean(y))* + +*** y on x +Equation of line, +\[ \frac{y - mean(y)}{x - mean(x)} = b_{yx} \] +Where, +\[ b_{yx} = \frac{cov(X,Y)}{var(Y)} \] + +*** x on y +Equation of line, +\[ \frac{x - mean(x)}{y - mean(y)} = b_{xy} \] +Where, +\[ b_{xy} = \frac{cov(X,Y)}{var(Y)} \] + +b_{xy} and b_{yx} are called regression coefficients. + ++ *Note* : if one of the regression coefficients is greater than 1, then the other must be less than 1. + +*** Correlation + +\[ \gamma(X,Y) = \rho_{XY} = \pm \sqrt{b_{xy} b_{yx}} \] + +The sign of regression coefficients (b_{xy} and b_{yx}) and the correlation coefficient is same. + +** Angle between lines of regression + +\[ tan \theta = \left( \frac{ 1- \rho^2 }{ \rho } \frac{ \sigma_X . \sigma_Y }{ var(X) + var(Y) } \right) \] + +Here \sigma is standard deviation. + ++ If $\rho = 0$ then $\theta = \frac{\pi}{2}$ ++ If $\rho = \pm 1$ then $\theta = 0$ diff --git a/main.pdf b/main.pdf index 83d1d1c..f96aefe 100644 Binary files a/main.pdf and b/main.pdf differ