Till Lines of Regression

Remaining sampling and testing
1 year ago · cbd1bf4591
parent 9fe80885bf
commit cbd1bf4591
2 changed files with 158 additions and 1 deletions
--- a/main.org
+++ b/main.org
@ -1,5 +1,6 @@
 #+TITLE: Probability and Statistics ( BTech CSE )
 #+AUTHOR: Anmol Nawani
+#+LATEX_CLASS: article
 #+LATEX_HEADER: \usepackage{amsmath}

 # *Statistics
@ -410,7 +411,7 @@ To get probability from a to b (inclusive and exclusive doesn't matter in contin
 + Mean
 \[ E(aX) = aE(X) \]
 \[ E(a) = a \]
-\[ E(X + Y) = E(X) + E(Y) ]
+\[ E(X + Y) = E(X) + E(Y) \]

 + Variance
 If
@ -576,3 +577,159 @@ Given,
 \[ X_2 \sim N(\mu_2, \sigma_2) \]
 Then,
 \[ a X_1 + b X_2 \sim N \left(  a \mu_1 + b \mu_2, \sqrt{ a^2 \sigma_1^2 + b^2 \sigma_2^2} \right) \]
+* Standard Normal Distribution
+
+The normal distribution with Mean 0 and Variance 1 is called the standard normal distribution.
+
+\[ Z \sim N(0,1) \]
+
+To calculate area under a given normal distribution, we can use the standard normal distribution. For that we need to calculate corresponding values in standard distribution from our given distribution. For that we have formula
+
+\[ For\ X \sim N(\mu, \sigma) \]
+\[ z = \frac{x - \mu}{\sigma} \]
+\[ x \rightarrow value\ in\ our\ normal\ distribution \]
+\[ \mu \rightarrow mean\ of\ our\ distribution \]
+\[ \sigma \rightarrow standard\ deviation\ of\ our\ distribution \]
+\[ z \rightarrow corresponding\ value\ in\ standard\ normal\ distribution \]
+
+Example,
+
+Suppose for a normal distribution with X \sim N(\mu, \sigma) and we want to calculate probability P(a < X < b), then the ranges for same proability in the Z normal distribution will be,
+
+\[ z_1 = \frac{a - \mu}{\sigma} \]
+\[ z_2 = \frac{b - \mu}{\sigma} \]
+Now the proability in Z distribution is,
+\[ P(z_1 < Z < z_2) \]
+\[ P( \frac{a - \mu}{\sigma} < Z < \frac{b - \mu}{\sigma} ) \]
+
+So we need area under Z curve from a to b.
+\\
+Then, we use the standard normal table to get the area.
+
+ *Note* : The standard normal distribution is symmetric about the y axis. This fact can be used when calculating area under Z curve.
+
+* Joint Probability Mass Function
+The joint probability mass distribution of two random variables X and Y is given by
+
+\[ f(x,y) =  P(X=x, Y=y) \]
+
+ For discrete
+\[ \Sigma_x \Sigma_y f(x,y) = 1 \]
+
+ For continious
+\[ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)\ dx\ dy = 1 \]
+
+To get the probabilities,
+
+\[ P(a \le X \le b, c \le Y \le d ) = \int_c^d \int_a^b f(x,y)\ dx\ dy\]
+
+** Marginal probability distribution (from joint PMF)
+
+ For discrete
+\[ P(X=x) = f(x) = \Sigma_y f(x,y) \]
+\[ P(Y=y) = f(y) = \Sigma_x f(x,y) \]
+
+ For continious
+\[ P(X=x) = f(x) = \int_{-\infty}^{\infty} f(x,y) dy \]
+\[ P(Y=y) = f(y) = \int_{-\infty}^{\infty} f(x,y) dx \]
+
+** Conditional Probability for Joint PMF
+
+\[ P(X=x \mid Y=y) = f(x \mid y ) = \frac{ P(X=x, Y=y) }{ P(Y=y) } \]
+\[ P(X=x \mid Y=y) = f(x \mid y)  = \frac{ f(x,y) }{ f(y) } \]
+
+** Independant Random Variables
+
+The random variables X and Y are independant if,
+\[ f(x,y) = f(x) f(y) \]
+
+** Moment of Joint Variables
+
+\[ E(X,Y) = E(XY) = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf(x,y) dx\ dy \]
+
+** Covaraince
+The covariance of two random variables X and Y is given by,
+
+\[ cov(X,Y) = E(XY) - E(X)E(Y) \]
+
+*** Properties of covariance
+
+ If X and Y are independant
+\[ cov(X,Y) = 0 \]
+
+ If variance of some random variable X is written var(X), then
+\[ cov(X+Y, X-Y) = var(X) - var(Y) \]
+
+ General of previous case
+\[ cov(aX + bY, cX + dY) = ac . var(X) + bd . var(Y) + (ad + bc) . cov(X,Y) \]
+
+*** Variance of two random variables
+
+\[ var(aX + bY) = a^2 . var(X) + b^2 . var(Y) + 2ab . cov(X,Y) \]
+
+** Correlation
+
+The standard deviation of X is \sigma_X and standard deviation of Y is \sigma_Y. Then the correlation is given by,
+
+\[ \gamma(X,Y) = \rho_{XY} = \frac{cov(X,Y)}{\sigma_X \sigma_Y } \]
+
+here, \rho_{XY} lies between -1 and 1 
+\[ -1 \le \rho_{XY} \le 1 \]
+
+** Conditional moments
+\[ E(X \mid Y) = \int_{-\infty}^{\infty} x f(x \mid y ) dx \ will\ be\ a\ function\ of\ y \]
+
+# #+BEGIN_COMMENT
+
+* Useful equation
+\[ n! = \int_0^\infty x^n e^{-x} dx \]
+# #+END_COMMENT
+
+* Covariance in discrete data
+
+Suppose for two sets of discrete data,
+
+\[ X : x_1, x_2, x_3... x_n \]
+\[ Y : y_1, y_2, y_3... y_n \]
+
+\[ cov(X,Y) = \frac{1}{n} \left( \sum_{i=1}^n x_i y_i \right) - [mean(x) . mean(y)] \]
+
+\[ n \rightarrow number\ of\ items \]
+* Regression
+
+Regression is a technique to relate a dependent variable to one or more independant variables.
+
+** Lines of regression
+
+Both lines will pass through the point *(mean(x) , mean(y))*
+
+*** y on x
+Equation of line,
+\[ \frac{y - mean(y)}{x - mean(x)} = b_{yx} \]
+Where,
+\[ b_{yx} = \frac{cov(X,Y)}{var(Y)} \]
+
+*** x on y
+Equation of line,
+\[ \frac{x - mean(x)}{y - mean(y)} = b_{xy} \]
+Where,
+\[ b_{xy} = \frac{cov(X,Y)}{var(Y)} \]
+
+b_{xy} and b_{yx} are called regression coefficients.
+
+ *Note* : if one of the regression coefficients is greater than 1, then the other must be less than 1.
+
+*** Correlation
+
+\[ \gamma(X,Y) = \rho_{XY} = \pm \sqrt{b_{xy} b_{yx}} \]
+
+The sign of regression coefficients (b_{xy} and b_{yx}) and the correlation coefficient is same.
+
+** Angle between lines of regression
+
+\[ tan \theta = \left(  \frac{ 1- \rho^2 }{ \rho }  \frac{ \sigma_X . \sigma_Y }{ var(X) + var(Y) } \right) \]
+
+Here \sigma is standard deviation.
+
+ If $\rho = 0$ then $\theta = \frac{\pi}{2}$
+ If $\rho = \pm 1$ then $\theta = 0$
--- a/main.pdf
+++ b/main.pdf