#+TITLE: Probability and Statistics ( BTech CSE ) #+AUTHOR: Anmol Nawani #+LATEX_CLASS: article #+LATEX_HEADER: \usepackage{amsmath} # *Statistics * Ungrouped Data Ungrouped data is data that has not been arranged in any way.So it is just a list of observations \[ x_1, x_2, x_3, ... x_n \] ** Mean \[ \bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} \] \[ \bar{x} = \frac{ \sum_{i = 1}^{n} x_i }{n} \] ** Mode The observation which occurs the highest number of time. So the x_i which has the highest count in the observation list. ** Median The median is the middle most observations. After ordering the n observations in observation list in either Ascending or Descending order (any order works). The median will be : + n is even \[ Median = \frac{ x_\frac{n}{2} + x_{(\frac{n}{2}+1)} }{2} \] + n is odd \[ Median = x_\frac{n+1}{2} \] ** Variance and Standard Deviation \[ Variance = \sigma^2 \] \[ Standard\ deviation = \sigma \] \[ \sigma^2 = \frac{\sum_{i=1}^{n} (x_i - Mean)^2 }{n} \] \[ \sigma^2 = \frac{\sum_{i=1}^n x_i^2}{n} - (Mean)^2 \] ** Moments *** About some constant A \[ r^{th}\ moment = \frac{1}{n} \Sigma(x_i - A)^r \] *** About Mean (Central Moment) When A = Mean, then the moment is called central moment. \[ \mu_r = \frac{1}{n} \Sigma(x_i - Mean)^r \] *** About Zero (Raw Moment) When A = 0, then the moment is called raw moment. \[ \mu_r^{'} = \frac{1}{n} \Sigma x_i^r \] * Grouped Data Data which is grouped based on the frequency at which it occurs. So if 9 appears 5 times in our observations, we group as x(observation) = 9 and f (frequency) = 5. #+attr_latex: :align |c|c|c| |------------------+---------------| | x (observations) | f (frequency) | |------------------+---------------| | 2 | 5 | | 1 | 3 | | 4 | 5 | | 8 | 9 | |------------------+---------------| If we store it in data way, i.e. the observations are of form 10-20, 20-30, 30-40 ... then we will get $x_i$ by doing \[ x_i = \frac{lower\ limit + upper\ limit}{2} \] i.e, $x_i$ for 20-30 will be $\frac{20 + 30}{2}$ So for data #+attr_latex: :align |c|c|c| |-------+---------------| | | f (frequency) | |-------+---------------| | 0- 20 | 2 | | 20-40 | 6 | | 40-60 | 1 | | 60-80 | 3 | |-------+---------------| the $x_i$'s will become. #+attr_latex: :align |c|c|c| |-------+-----+-----| | | f_i | x_i | |-------+-----+-----| | 0- 20 | 2 | 10 | | 20-40 | 6 | 30 | | 40-60 | 1 | 50 | | 60-80 | 3 | 70 | |-------+-----+-----| ** Mean \[ \bar{x} = \frac{ \Sigma f_i x_i}{\Sigma f_i } \] ** Mode The *modal class* is the record with the row with the highest f_i \[ Mode = l + (\frac{f_1 - f_0}{2f_1 - f_0 - f_2}) \times h \] In the formula : \\ l \rightarrow lower limit of modal class \\ f_1 \rightarrow frequency(f_i) of the modal class \\ f_0 \rightarrow frequency of the row preceding modal class \\ f_2 \rightarrow frequency of the row after the modal class \\ h \rightarrow size of class interval (upper limit - lower limit) ** Median The median for grouped data is calculated with the help of *cumulative frequency*. The cumulative frequency (cf_i) is given by: \[ cf_i = f_1 + f_2 + f_3 + ... + f_i \] The *median class* is the class whose cf_i is just greater than or is equal to $\frac{\Sigma f}{2}$ \[ Median = l + (\frac{(n/2) - cf}{f}) \times h \] In the formula : \\ l \rightarrow lower limit of the median class \\ h \rightarrow size of class interval (upper limit - lower limit) \\ n \rightarrow number of observations \\ cf \rightarrow cumulative frequency of the median class \\ f \rightarrow frequency of the median class ** Variance and Standard Deviation \[ Variance = \sigma^2 \] \[ Standard\ deviation = \sigma \] \[ \sigma^2 = \frac{\sum_{i=1}^{n} f_i(x_i - Mean)^2 }{\Sigma f_i} \] \[ \sigma^2 = \frac{\sum_{i=1}^n f_ix_i^2}{\Sigma f_i} - (Mean)^2 \] ** Moments *** About some constant A \[ r^{th}\ moment = \frac{1}{\Sigma f_i} [\Sigma f_i (x_i - A)^r] \] *** About Mean (Central Moment) When A = Mean, then the moment is called central moment. \[ \mu_r = \frac{1}{\Sigma f_i} [\Sigma f_i (x_i - Mean)^r] \] *** About Zero (Raw Moment) When A = 0, then the moment is called raw moment. \[ \mu_r^{'} = \frac{1}{\Sigma f_i} [\Sigma f_i x_i^r] \] * Relation between Mean, Median and Mode \[ 3Median = 2Mean + Mode \] * Relation between raw and central moments \[ \mu_0 = \mu_0^{'} = 1 \] \[ \mu_1 = 0 \] \[ \mu_2 = \mu_2^{'} - \mu_1^{'2} \] \[ \mu_3 = \mu_3^{'} - 3\mu_1^{'}\mu_2^{'} + 2\mu_1^{'3} \] \[ \mu_4 = \mu_4^{'} - 4\mu_3^{'}\mu_1^{'} + 6\mu_2^{'}\mu_1^{'2} - 3\mu_1^{'4} \] * Skewness and Kurtosis ** Skewness + If Mean > Mode, then skewness is positive + If Mean = Mode, then skewness is zero (graph is symmetric) + If Mean < Mode, then skewness is zero [[./skewness.PNG]] *** Pearson's coefficient of skewness The pearson's coefficient of skewness is denoted by S_{KP} \[ S_{KP} = \frac{Mean - Mode}{Standard\ Deviation} \] + If S_{KP} is zero then distribution is symmetrical + If S_{KP} is positive then distribution is positively skewed + If S_{KP} is negative then distribution is negatively skewed *** Moment based coefficient of skewness The moment based coefficient of skewness is denoted by \beta_1. The \mu here is central moment. \[ \beta_1 = \frac{\mu_3^2}{\mu_2^3} \] The drawback of using \beta_1 as a coefficient of skewness is that it *can only tell if distribution is symmetrical or not* ,when $\beta_1 = 0$. It can't tell us the direction of skewness, i.e positive or negative. + If \beta_1 is zero, then distribution is symmetrical *** Karl Pearson's \gamma_1 To remove the drawback of the \beta_1 , we can derive Karl Pearson's \gamma_1 \[ \gamma_1 = \sqrt{\beta_1} \] \[ \gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} \] + If \mu_3 is positive, the distribution has positive skewness + If \mu_3 is negative, the distribution has negative skewness + If \mu_3 is zero, the distribution is symmetrical ** Kurtosis Kurtosis is the measure of the peak and the curve and the "fatness" of the curve. # https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/ [[./kurtosis.PNG]] # https://www.bogleheads.org/wiki/Excess_kurtosis [[./kurtosis2.PNG]] The kurtosis is calculated using \beta_2 \[ \beta_2 = \frac{\mu_4}{\mu_2^2} \] The value of \beta_2 tell's us about the type of curve + Leptokurtic (High Peak) when \beta_2 > 3 + Mesokurtic (Normal Peak) when \beta_2 = 3 + Platykurtic (Low Peak) when \beta_2 < 3 *** Karl Pearson's \gamma_2 \gamma_2 is defined as: \[ \gamma_2 = \beta_2 - 3 \] + Leptokurtic when \gamma_2 > 0 + Mesokurtic when \gamma_2 = 0 + Platykurtic when \gamma_2 < 0 # *Probability * Basic Probability ** Conditional Probability If some event B has already occured, then the probability of the event A is: \[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \] $P(A \mid B)$ is read as A given B. So we are given that B has occured and this is probability of now A occuring. ** Law of Total Probability The law of total probability is used to find probability of some event A that has been partitioned into several different places/parts. \[ P(A) = P(A|B_1)P(B_1) + P(A|B_2)P(B_2) + P(A|B_3)P(B_3) + ... + P(A|B_i)P(B_i) \] \[ P(A) = \Sigma P(A|B_i)P(B_i) \] *Example*, Suppose we have 2 bags with marbles + Bag 1 : 7 red marbles and 3 green marbles + Bag 2 : 2 red marbles and 8 green marbles Now we select one bag at random (i.e, the probability of choosing any of the two bags is equal so 0.5). If we draw a marble, what is the probability that it is a green marble? *Sol.* The green marbles are in parts in bag 1 and bag 2. \\ Let G be the event of green marble. \\ Let B_1 be the event of choosing the bag 1 \\ Let B-2 be the event of choosing the bag 2 \\ Then, $P(G|B_1) = \frac{3}{7 + 3}$ and $P(G|B_2) = \frac{8}{2 + 8}$ \\ Now, we can use the law of total probability to get \[ P(G) = P(G|B_1)P(B_1) + P(G|B_2)P(B_2) \] *Example* 2, Suppose a there are 3 forests in a park. + Forest A occupies 50% of land and 20% plants in it are poisonous + Forest B occupies 30% of land and 40% plants in it are poisonous + Forest C occupies 20% of land and 70% plants in it are poisonous What is the probability of a random plant from the park being poisonous. *Sol.* Since probability is equal across whole area of the park. Event A is plant being from Forest A, Event B is plant being from Forest B and Event C is plant being from Forest C. If event P is plant being poisonous, then using law of total probability, \[ P(P) = P(P|A)P(A) + P(P|B)P(B) + P(P|C)P(C) \] And we know P(A) = 0.5, P(B) = 0.3 and P(C) = 0.2. Also P(P|A) = 0.20, P(P|B) = 0.40 and P(P|C) = 0.70 ** Some basic identities + Probabilities follow law of inclusion and exclusion \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \] + DeMorgan's Theorem \[ P(\overline{A \cap B }) = P(\overline{A} \cup \overline{B}) \] \[ P(\overline{A \cup B }) = P(\overline{A} \cap \overline{B}) \] + Some other Identity \[ P(\overline{A} \cap B) + P(A \cap B) = P(B) \] \[ P(A \cap \overline{B}) + P(A \cap B) = P(A) \] * Probability Function It is a mathematical function that gives probability of occurance of different possible outcomes. We use variables to represent these possible outcomes called *random variables*. These are represented by capital letters. Example, $X$, $Y$, etc. We use these random variables as: \\ Suppose X is flipping two coins. \[ X = \{HH, HT, TT, TH\} \] We can represent it as, \[ X = \{0, 1, 2, 3\} \] Now we can write a probability function $P(X=x)$ for flipping two coins as : #+attr_latex: :align |c|c|c| |-----+----------| | $x$ | $P(X=x)$ | |-----+----------| | 0 | 0.25 | | 1 | 0.25 | | 2 | 0.25 | | 3 | 0.25 | |-----+----------| Another example is throwing two dice and our random variable $X$ is sum of those two dice. #+attr_latex: :align |c|c|c| |-----+----------------| | $x$ | $P(X=x)$ | |-----+----------------| | 2 | $1/36$ | | 3 | $2/36$ | | 4 | $3/36$ | | 5 | $4/36$ | | 6 | $5/36$ | | 7 | $6/36$ | | 8 | $5/36$ | | 9 | $4/36$ | | 10 | $3/36$ | | 11 | $2/36$ | | 12 | $1/36$ | |-----+----------------| ** Types of probability functions (Continious and Discrete random variables) Based on the range of the Random variables, probability function has two different names. + For discrete random variables it is called Probability Distribution function. + For continious random variables it is called Probability Density function. * Proability Mass Function If we can get a function such that, \[ f(x) = P(X=x) \] then $f(x)$ is called a *Probability Mass Function* (PMF). ** Properties of Probability Mass Function Suppose a PMF \[ f(x) = P(X=x) \] Then, *** For discrete variables \[ \Sigma f(x) = 1 \] \[ E(X^n) = \Sigma x^n f(x) \] For $E(X)$, the summation is over all possible values of x. \[ Mean = E(X) = \Sigma x f(x) \] \[ Variance = E(X^2) - (E(X))^2 = \Sigma x^2 f(x) - ( \Sigma x f(x) )^2 \] To get probabilities \[ P(a \le X \le b) = \sum_{a}^{b} f(x) \] \[ P(a < X \le b) = (\sum_{a}^{b} f(x)) - f(a) \] \[ P(a \le X < b) = (\sum_{a}^{b} f(x)) - f(b) \] Basically, we just add all $f(x)$ values from range of samples we need. *** For continious variables \[ \int_{-\infty}^{\infty} f(x) dx = 1 \] \[ E(X^n) = \int_{-\infty}^{\infty} x^n f(x) dx \] We only consider integral from the possible values of x. Else we assume 0. \[ Mean = E(X) = \int_{-\infty}^{\infty} x f(x) dx \] \[ Variance = E(X^2) - (E(X))^2 = \int_{-\infty}^{\infty} x^2 f(x) dx - ( \int_{-\infty}^{\infty} x f(x) dx )^2 \] To get probability from a to b (inclusive and exclusive doesn't matter in continious). \[ P(a < X < b) = \int_{a}^{b} f(x) dx \] ** Some properties of mean and variance + Mean \[ E(aX) = aE(X) \] \[ E(a) = a \] \[ E(X + Y) = E(X) + E(Y) \] + Variance Variance is \[ V(X) = E(X^2) - (E(X))^2 \] Properties of variance are \[ V(aX) = a^2 V(X) \] \[ V(a) = 0 \] * Moment Generating Function The moment generating function is given by \[ M(t) = E(e^{tX}) \] ** For discrete \[ M(t) = \sum_{0}^{\infty} e^{tx} f(x) \] ** For continious \[ M(t) = \int_{-\infty}^{\infty} e^{tx} f(x) dx \] ** Calculations of Moments (E(X)) using MGF \[ E(X^n) = (\frac{d^n}{dt^n} M(t))_{t=0} \] * Binomial Distribution The use of a binomial distribution is to calculate a known probability repeated n number of times, i.e, doing *n* number of trials. A binomial distribution deals with discrete random variables. \[ X = \{ 0,1,2, .... n \} \] where *n* is the number of trials. \[ P(X=x) = \ ^nC_x\ (p)^x(q)^{n-x} \] Here \[ n \rightarrow number\ of\ trials \] \[ x \rightarrow number\ of\ successes \] \[ p \rightarrow probability\ of\ success \] \[ q \rightarrow probability\ of\ failure \] \[ p = 1 - q \] + Mean \[ Mean = np \] + Variance \[ Variance = npq \] + Moment Generating Function \[ M(t) = (q + pe^t)^n \] ** Additive Property of Binomial Distribution For an independent variable $X$. The binomial distribution is represented as \[ X ~ B(n,p) \] Here, \[ n \rightarrow number\ of\ trials \] \[ p \rightarrow probability\ of\ success \] + Property If given, \[ X_1 \sim B(n_1, p) \] \[ X_2 \sim B(n_2, p) \] Then, \[ X_1 + X_2 \sim B(n_1 + n_2, p) \] + *NOTE* If \[ X_1 \sim B(n_1, p_1) \] \[ X_2 \sim B(n_2, p_2) \] Then $X_1 + X_2$ is not a binomial distribution. ** Using a binomial distribution We can use binomial distribution to easily calculate probability of multiple trials, if probability of one trial is known. Example, the probability of a duplet (both dice have same number) when two dice are thrown is $\frac{6}{36}$. \\ Suppose now we want to know the probability of a 3 duplets if a pair of dice is thrown 5 times. So in this case : \[ number\ of\ trials\ (n) = 5 \] \[ number\ of\ duplets\ we\ want\ probability\ for\ (x) = 3 \] \[ probability\ of\ duplet\ (p) = \frac{6}{36} \] \[ q = 1 - p = 1 - \frac{6}{36} \] So using binomial distribution, \[ P(probability\ of\ 3\ duplets) = P(X=3) = \ ^5C_3 \left(\frac{6}{36}\right)^3 \left(\frac{30}{36}\right)^{5-3} \] * Poisson Distribution A case of the binomial distribution where *n* is indefinitely large and *p* is very small and *$\lambda = np$* is finite. \[ P(X=x) = \frac{e^{-\lambda}\lambda^x}{x!}\ if\ x = 0, 1, 2 ..... \] \[ P(X=x) = 0\ otherwise \] \[ \lambda = np \] + Mean \[ Mean = \lambda \] + Variance \[ Variance = \lambda \] + Moment Generating Funtion \[ M(t) = e^{\lambda\left(e^{t}-1\right)} \] ** Additive property If X_1, X_2, X_3..X_n follow poisson distribution with \lambda_1, \lambda_2, \lambda_3....\lambda_n \\ Then, \[ X_1 + X_2 + X_3...+X_n \sim \lambda_1 + \lambda_2 + \lambda_3 + ...+ \lambda_n \] * Exponential Distribution A continuous random distribution which has probability mass function \[ f(x) = \lambda e^{-\lambda x}\ ,\ when\ x \ge 0 \] \[ f(x) = 0 \ ,\ otherwise \] \[ where\ \lambda > 0 \] + Mean \[ Mean = \frac{1}{\lambda} \] + Variance \[ Variance = \frac{1}{\lambda^2} \] + Moment Generating Function \[ M(t) = \frac{\lambda}{\lambda - t} \] ** Memory Less Property \[ P[X > (s + t) \mid X > t] = P(X > s) \] * Normal Distribution Suppose for a probability funtion with random variable X, having mean \mu and variance \sigma^2. We denote normal distribution using $X \sim N(\mu,\sigma)$ \\ The probability mass funtion is \[ f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right) \] \[ -\infty < x < \infty \] \[ -\infty < \mu < \infty \] \[ \sigma > 0 \] Here, $exp(x) = e^x$ + Moment Generating Funtion \[ M(t) = exp\left( \mu t + \frac{\sigma^2 t^2}{2} \right) \] ** Odd Moments \[ E(X^{2n + 1}) = 0 \ , \ n = 0, 1, 2, ... \] ** Even Moments \[ E(X^{2n}) = 1.3.5....(2n-3)(2n-1) \sigma^{2n} \ , \ n = 0, 1, 2, ... \] ** Properties + In a normal distribution \[ Mean = Mode = Median \] + For normal distribution, mean deviation about mean is \[ \sigma \sqrt{ \frac{2}{\pi} } \] ** Additive property Suppose for distributions X_1, X_2, X_3 ... X_n with means \mu_1 , \mu_2 , \mu_3 ... \mu_n and standard deviation \sigma_1^2 , \sigma_2^2 , \sigma_3^2 ..... \sigma_n^2 respectively. \\ Then X_1 + X_2 + X_3 will have mean *( \mu_1 + \mu_2 + \mu_3 + ... + \mu_n )* and standard deviation *(\sigma_1^2 + \sigma_2^2 + \sigma_3^2 + ..... + \sigma_n^2 )* + Additive Case Given, \[ X_1 \sim N(\mu_1, \sigma_1) \] \[ X_2 \sim N(\mu_2, \sigma_2) \] Then, \[ a X_1 + b X_2 \sim N \left( a \mu_1 + b \mu_2, \sqrt{ a^2 \sigma_1^2 + b^2 \sigma_2^2} \right) \] * Standard Normal Distribution The normal distribution with Mean 0 and Variance 1 is called the standard normal distribution. \[ Z \sim N(0,1) \] To calculate area under a given normal distribution, we can use the standard normal distribution. For that we need to calculate corresponding values in standard distribution from our given distribution. For that we have formula \[ For\ X \sim N(\mu, \sigma) \] \[ z = \frac{x - \mu}{\sigma} \] \[ x \rightarrow value\ in\ our\ normal\ distribution \] \[ \mu \rightarrow mean\ of\ our\ distribution \] \[ \sigma \rightarrow standard\ deviation\ of\ our\ distribution \] \[ z \rightarrow corresponding\ value\ in\ standard\ normal\ distribution \] Example, Suppose for a normal distribution with X \sim N(\mu, \sigma) and we want to calculate probability P(a < X < b), then the ranges for same proability in the Z normal distribution will be, \[ z_1 = \frac{a - \mu}{\sigma} \] \[ z_2 = \frac{b - \mu}{\sigma} \] Now the proability in Z distribution is, \[ P(z_1 < Z < z_2) \] \[ P( \frac{a - \mu}{\sigma} < Z < \frac{b - \mu}{\sigma} ) \] So we need area under Z curve from a to b. \\ Then, we use the standard normal table to get the area. + *Note* : The standard normal distribution is symmetric about the y axis. This fact can be used when calculating area under Z curve. * Joint Probability Mass Function The joint probability mass distribution of two random variables X and Y is given by \[ f(x,y) = P(X=x, Y=y) \] + For discrete \[ \Sigma_x \Sigma_y f(x,y) = 1 \] + For continious \[ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)\ dx\ dy = 1 \] To get the probabilities, \[ P(a \le X \le b, c \le Y \le d ) = \int_c^d \int_a^b f(x,y)\ dx\ dy\] ** Marginal probability distribution (from joint PMF) + For discrete \[ P(X=x) = f(x) = \Sigma_y f(x,y) \] \[ P(Y=y) = f(y) = \Sigma_x f(x,y) \] + For continious \[ P(X=x) = f(x) = \int_{-\infty}^{\infty} f(x,y) dy \] \[ P(Y=y) = f(y) = \int_{-\infty}^{\infty} f(x,y) dx \] ** Conditional Probability for Joint PMF \[ P(X=x \mid Y=y) = f(x \mid y ) = \frac{ P(X=x, Y=y) }{ P(Y=y) } \] \[ P(X=x \mid Y=y) = f(x \mid y) = \frac{ f(x,y) }{ f(y) } \] ** Independant Random Variables The random variables X and Y are independant if, \[ f(x,y) = f(x) f(y) \] ** Moment of Joint Variables \[ E(X,Y) = E(XY) = \int_{-\infty}^\infty \int_{-\infty}^\infty xyf(x,y) dx\ dy \] ** Covaraince The covariance of two random variables X and Y is given by, \[ cov(X,Y) = E(XY) - E(X)E(Y) \] *** Properties of covariance + If X and Y are independant \[ cov(X,Y) = 0 \] + If variance of some random variable X is written var(X), then \[ cov(X+Y, X-Y) = var(X) - var(Y) \] + General of previous case \[ cov(aX + bY, cX + dY) = ac . var(X) + bd . var(Y) + (ad + bc) . cov(X,Y) \] *** Variance of two random variables \[ var(aX + bY) = a^2 . var(X) + b^2 . var(Y) + 2ab . cov(X,Y) \] ** Correlation The standard deviation of X is \sigma_X and standard deviation of Y is \sigma_Y. Then the correlation is given by, \[ \gamma(X,Y) = \rho_{XY} = \frac{cov(X,Y)}{\sigma_X \sigma_Y } \] here, \rho_{XY} lies between -1 and 1 \[ -1 \le \rho_{XY} \le 1 \] ** Conditional moments \[ E(X \mid Y) = \int_{-\infty}^{\infty} x f(x \mid y ) dx \ will\ be\ a\ function\ of\ y \] # #+BEGIN_COMMENT * Useful equation \[ n! = \int_0^\infty x^n e^{-x} dx \] # #+END_COMMENT * Covariance in discrete data Suppose for two sets of discrete data, \[ X : x_1, x_2, x_3... x_n \] \[ Y : y_1, y_2, y_3... y_n \] \[ cov(X,Y) = \frac{1}{n} \left( \sum_{i=1}^n x_i y_i \right) - [mean(x) . mean(y)] \] \[ n \rightarrow number\ of\ items \] * Regression Regression is a technique to relate a dependent variable to one or more independant variables. ** Lines of regression Both lines will pass through the point *(mean(x) , mean(y))* *** y on x Equation of line, \[ \frac{y - mean(y)}{x - mean(x)} = b_{yx} \] Where, \[ b_{yx} = \frac{cov(X,Y)}{var(Y)} \] *** x on y Equation of line, \[ \frac{x - mean(x)}{y - mean(y)} = b_{xy} \] Where, \[ b_{xy} = \frac{cov(X,Y)}{var(Y)} \] b_{xy} and b_{yx} are called regression coefficients. + *Note* : if one of the regression coefficients is greater than 1, then the other must be less than 1. *** Correlation \[ \gamma(X,Y) = \rho_{XY} = \pm \sqrt{b_{xy} b_{yx}} \] The sign of regression coefficients (b_{xy} and b_{yx}) and the correlation coefficient is same. ** Angle between lines of regression \[ tan \theta = \left( \frac{ 1- \rho^2 }{ \rho } \frac{ \sigma_X . \sigma_Y }{ var(X) + var(Y) } \right) \] Here \sigma is standard deviation. + If $\rho = 0$ then $\theta = \frac{\pi}{2}$ + If $\rho = \pm 1$ then $\theta = 0$ TODO : Maybe an example here * Sampling Notes not made for this currently, a pdf was provided by teacher as, [[./sampling.pdf][./sampling.pdf]]