Till Lines of Regression

Remaining sampling and testing
master
lomna-dev 1 year ago
parent 9fe80885bf
commit cbd1bf4591

@ -1,5 +1,6 @@
#+TITLE: Probability and Statistics ( BTech CSE )
#+AUTHOR: Anmol Nawani
#+LATEX_CLASS: article
#+LATEX_HEADER: \usepackage{amsmath}
# *Statistics
@ -410,7 +411,7 @@ To get probability from a to b (inclusive and exclusive doesn't matter in contin
+ Mean
\[ E(aX) = aE(X) \]
\[ E(a) = a \]
\[ E(X + Y) = E(X) + E(Y) ]
\[ E(X + Y) = E(X) + E(Y) \]
+ Variance
If
@ -576,3 +577,159 @@ Given,
\[ X_2 \sim N(\mu_2, \sigma_2) \]
Then,
\[ a X_1 + b X_2 \sim N \left( a \mu_1 + b \mu_2, \sqrt{ a^2 \sigma_1^2 + b^2 \sigma_2^2} \right) \]
* Standard Normal Distribution
The normal distribution with Mean 0 and Variance 1 is called the standard normal distribution.
\[ Z \sim N(0,1) \]
To calculate area under a given normal distribution, we can use the standard normal distribution. For that we need to calculate corresponding values in standard distribution from our given distribution. For that we have formula
\[ For\ X \sim N(\mu, \sigma) \]
\[ z = \frac{x - \mu}{\sigma} \]
\[ x \rightarrow value\ in\ our\ normal\ distribution \]
\[ \mu \rightarrow mean\ of\ our\ distribution \]
\[ \sigma \rightarrow standard\ deviation\ of\ our\ distribution \]
\[ z \rightarrow corresponding\ value\ in\ standard\ normal\ distribution \]
Example,
Suppose for a normal distribution with X \sim N(\mu, \sigma) and we want to calculate probability P(a < X < b), then the ranges for same proability in the Z normal distribution will be,
\[ z_1 = \frac{a - \mu}{\sigma} \]
\[ z_2 = \frac{b - \mu}{\sigma} \]
Now the proability in Z distribution is,
\[ P(z_1 < Z < z_2) \]
\[ P( \frac{a - \mu}{\sigma} < Z < \frac{b - \mu}{\sigma} ) \]
So we need area under Z curve from a to b.
\\
Then, we use the standard normal table to get the area.
+ *Note* : The standard normal distribution is symmetric about the y axis. This fact can be used when calculating area under Z curve.
* Joint Probability Mass Function
The joint probability mass distribution of two random variables X and Y is given by
\[ f(x,y) = P(X=x, Y=y) \]
+ For discrete
\[ \Sigma_x \Sigma_y f(x,y) = 1 \]
+ For continious
\[ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)\ dx\ dy = 1 \]
To get the probabilities,
\[ P(a \le X \le b, c \le Y \le d ) = \int_c^d \int_a^b f(x,y)\ dx\ dy\]
** Marginal probability distribution (from joint PMF)
+ For discrete
\[ P(X=x) = f(x) = \Sigma_y f(x,y) \]
\[ P(Y=y) = f(y) = \Sigma_x f(x,y) \]
+ For continious
\[ P(X=x) = f(x) = \int_{-\infty}^{\infty} f(x,y) dy \]
\[ P(Y=y) = f(y) = \int_{-\infty}^{\infty} f(x,y) dx \]
** Conditional Probability for Joint PMF
\[ P(X=x \mid Y=y) = f(x \mid y ) = \frac{ P(X=x, Y=y) }{ P(Y=y) } \]
\[ P(X=x \mid Y=y) = f(x \mid y) = \frac{ f(x,y) }{ f(y) } \]
** Independant Random Variables
The random variables X and Y are independant if,
\[ f(x,y) = f(x) f(y) \]
** Moment of Joint Variables
\[ E(X,Y) = E(XY) = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf(x,y) dx\ dy \]
** Covaraince
The covariance of two random variables X and Y is given by,
\[ cov(X,Y) = E(XY) - E(X)E(Y) \]
*** Properties of covariance
+ If X and Y are independant
\[ cov(X,Y) = 0 \]
+ If variance of some random variable X is written var(X), then
\[ cov(X+Y, X-Y) = var(X) - var(Y) \]
+ General of previous case
\[ cov(aX + bY, cX + dY) = ac . var(X) + bd . var(Y) + (ad + bc) . cov(X,Y) \]
*** Variance of two random variables
\[ var(aX + bY) = a^2 . var(X) + b^2 . var(Y) + 2ab . cov(X,Y) \]
** Correlation
The standard deviation of X is \sigma_X and standard deviation of Y is \sigma_Y. Then the correlation is given by,
\[ \gamma(X,Y) = \rho_{XY} = \frac{cov(X,Y)}{\sigma_X \sigma_Y } \]
here, \rho_{XY} lies between -1 and 1
\[ -1 \le \rho_{XY} \le 1 \]
** Conditional moments
\[ E(X \mid Y) = \int_{-\infty}^{\infty} x f(x \mid y ) dx \ will\ be\ a\ function\ of\ y \]
# #+BEGIN_COMMENT
* Useful equation
\[ n! = \int_0^\infty x^n e^{-x} dx \]
# #+END_COMMENT
* Covariance in discrete data
Suppose for two sets of discrete data,
\[ X : x_1, x_2, x_3... x_n \]
\[ Y : y_1, y_2, y_3... y_n \]
\[ cov(X,Y) = \frac{1}{n} \left( \sum_{i=1}^n x_i y_i \right) - [mean(x) . mean(y)] \]
\[ n \rightarrow number\ of\ items \]
* Regression
Regression is a technique to relate a dependent variable to one or more independant variables.
** Lines of regression
Both lines will pass through the point *(mean(x) , mean(y))*
*** y on x
Equation of line,
\[ \frac{y - mean(y)}{x - mean(x)} = b_{yx} \]
Where,
\[ b_{yx} = \frac{cov(X,Y)}{var(Y)} \]
*** x on y
Equation of line,
\[ \frac{x - mean(x)}{y - mean(y)} = b_{xy} \]
Where,
\[ b_{xy} = \frac{cov(X,Y)}{var(Y)} \]
b_{xy} and b_{yx} are called regression coefficients.
+ *Note* : if one of the regression coefficients is greater than 1, then the other must be less than 1.
*** Correlation
\[ \gamma(X,Y) = \rho_{XY} = \pm \sqrt{b_{xy} b_{yx}} \]
The sign of regression coefficients (b_{xy} and b_{yx}) and the correlation coefficient is same.
** Angle between lines of regression
\[ tan \theta = \left( \frac{ 1- \rho^2 }{ \rho } \frac{ \sigma_X . \sigma_Y }{ var(X) + var(Y) } \right) \]
Here \sigma is standard deviation.
+ If $\rho = 0$ then $\theta = \frac{\pi}{2}$
+ If $\rho = \pm 1$ then $\theta = 0$

Binary file not shown.
Loading…
Cancel
Save