commit 8552f74f917912f9a5161145728b72fdfd9a7130 Author: lomna-dev Date: Fri Apr 28 15:11:16 2023 +0530 First Commit Basic Stuff diff --git a/build.sh b/build.sh new file mode 100644 index 0000000..1f9b897 --- /dev/null +++ b/build.sh @@ -0,0 +1,4 @@ +set -xe + +emacs --script export.el +lualatex main.tex diff --git a/export.el b/export.el new file mode 100644 index 0000000..01f7318 --- /dev/null +++ b/export.el @@ -0,0 +1,2 @@ +(find-file "main.org") +(org-latex-export-to-latex) diff --git a/kurtosis.PNG b/kurtosis.PNG new file mode 100644 index 0000000..a5fc1d8 Binary files /dev/null and b/kurtosis.PNG differ diff --git a/kurtosis2.PNG b/kurtosis2.PNG new file mode 100644 index 0000000..fd5b1d6 Binary files /dev/null and b/kurtosis2.PNG differ diff --git a/main.org b/main.org new file mode 100644 index 0000000..25975b7 --- /dev/null +++ b/main.org @@ -0,0 +1,309 @@ +#+TITLE: Probability and Statistics ( BTech CSE ) +#+AUTHOR: Anmol Nawani + +# *Statistics + +* Ungrouped Data + +Ungrouped data is data that has not been arranged in any way.So it is just a list of observations + +\[ x_1, x_2, x_3, ... x_n \] + +** Mean +\[ \bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} \] + +\[ \bar{x} = \frac{ \sum_{i = 1}^{n} x_i }{n} \] + +** Mode +The observation which occurs the highest number of time. So the x_i which has the highest count in the observation list. + +** Median +The median is the middle most observations. +After ordering the n observations in observation list in either Ascending or Descending order (any order works). The median will be : + ++ n is even + +\[ Median = \frac{ x_\frac{n}{2} + x_{(\frac{n}{2}+1)} }{2} \] + ++ n is odd + +\[ Median = x_\frac{n+1}{2} \] + +** Variance and Standard Deviation + +\[ Variance = \sigma^2 \] +\[ Standard\ deviation = \sigma \] + +\[ \sigma^2 = \frac{\sum_{i=1}^{n} (x_i - Mean)^2 }{n} \] + +\[ \sigma^2 = \frac{\sum_{i=1}^n x_i^2}{n} - (Mean)^2 \] + +** Moments + +*** About some constant A + +\[ r^{th}\ moment = \frac{1}{n} \Sigma(x_i - A)^r \] + +*** About Mean (Central Moment) + +When A = Mean, then the moment is called central moment. + +\[ \mu_r = \frac{1}{n} \Sigma(x_i - Mean)^r \] + +*** About Zero (Raw Moment) + +When A = 0, then the moment is called raw moment. + +\[ \mu_r^{'} = \frac{1}{n} \Sigma x_i^r \] + +* Grouped Data + +Data which is grouped based on the frequency at which it occurs. So if 9 appears 5 times in our observations, we group as x(observation) = 9 and f (frequency) = 5. + +#+attr_latex: :align |c|c|c| +|------------------+---------------| +| x (observations) | f (frequency) | +|------------------+---------------| +| 2 | 5 | +| 1 | 3 | +| 4 | 5 | +| 8 | 9 | +|------------------+---------------| + +If we store it in data way, i.e. the observations are of form 10-20, 20-30, 30-40 ... then we will get $x_i$ by doing + +\[ x_i = \frac{lower\ limit + upper\ limit}{2} \] + +i.e, + +$x_i$ for 20-30 will be $\frac{20 + 30}{2}$ + +So for data + +#+attr_latex: :align |c|c|c| +|-------+---------------| +| | f (frequency) | +|-------+---------------| +| 0- 20 | 2 | +| 20-40 | 6 | +| 40-60 | 1 | +| 60-80 | 3 | +|-------+---------------| + +the $x_i$'s will become. + +#+attr_latex: :align |c|c|c| +|-------+-----+-----| +| | f_i | x_i | +|-------+-----+-----| +| 0- 20 | 2 | 10 | +| 20-40 | 6 | 30 | +| 40-60 | 1 | 50 | +| 60-80 | 3 | 70 | +|-------+-----+-----| + + +** Mean + +\[ \bar{x} = \frac{ \Sigma f_i x_i}{\Sigma f_i } \] + +** Mode + +The *modal class* is the record with the row with the highest f_i + +\[ Mode = l + (\frac{f_1 - f_0}{2f_1 - f_0 - f_2}) \times h \] + +In the formula : \\ +l \rightarrow lower limit of modal class \\ +f_1 \rightarrow frequency(f_i) of the modal class \\ +f_0 \rightarrow frequency of the row preceding modal class \\ +f_2 \rightarrow frequency of the row after the modal class \\ +h \rightarrow size of class interval (upper limit - lower limit) + +** Median +The median for grouped data is calculated with the help of *cumulative frequency*. The cumulative frequency (cf_i) is given by: + +\[ cf_i = f_1 + f_2 + f_3 + ... + f_i \] + +The *median class* is the class whose cf_i is just greater than or is equal to $\frac{\Sigma f}{2}$ + +\[ Median = l + (\frac{(n/2) - cf}{f}) \times h \] + +In the formula : \\ +l \rightarrow lower limit of the median class \\ +h \rightarrow size of class interval (upper limit - lower limit) \\ +n \rightarrow number of observations \\ +cf \rightarrow cumulative frequency of the median class \\ +f \rightarrow frequency of the median class + +** Variance and Standard Deviation + +\[ Variance = \sigma^2 \] +\[ Standard\ deviation = \sigma \] + +\[ \sigma^2 = \frac{\sum_{i=1}^{n} f_i(x_i - Mean)^2 }{\Sigma f_i} \] + +\[ \sigma^2 = \frac{\sum_{i=1}^n f_ix_i^2}{\Sigma f_i} - (Mean)^2 \] + +** Moments + +*** About some constant A + +\[ r^{th}\ moment = \frac{1}{\Sigma f_i} [\Sigma f_i (x_i - A)^r] \] + +*** About Mean (Central Moment) + +When A = Mean, then the moment is called central moment. +\[ \mu_r = \frac{1}{\Sigma f_i} [\Sigma f_i (x_i - Mean)^r] \] + +*** About Zero (Raw Moment) + +When A = 0, then the moment is called raw moment. +\[ \mu_r^{'} = \frac{1}{\Sigma f_i} [\Sigma f_i x_i^r] \] + +* Relation between Mean, Median and Mode + +\[ 3Median = 2Mean + Mode \] + +* Relation between raw and central moments + +\[ \mu_0 = \mu_0^{'} = 1 \] +\[ \mu_1 = 0 \] +\[ \mu_2 = \mu_2^{'} - \mu_1^{'2} \] +\[ \mu_3 = \mu_3^{'} - 3\mu_1^{'}\mu_2^{'} + 2\mu_1^{'3} \] +\[ \mu_4 = \mu_4^{'} - 4\mu_3^{'}\mu_1^{'} + 6\mu_2^{'}\mu_1^{'2} - 3\mu_1^{'4} \] + +* Skewness and Kurtosis + +** Skewness + ++ If Mean > Mode, then skewness is positive ++ If Mean = Mode, then skewness is zero (graph is symmetric) ++ If Mean < Mode, then skewness is zero + +[[./skewness.PNG]] + +*** Pearson's coefficient of skewness + +The pearson's coefficient of skewness is denoted by S_{KP} + +\[ S_{KP} = \frac{Mean - Mode}{Standard\ Deviation} \] + ++ If S_{KP} is zero then distribution is symmetrical ++ If S_{KP} is positive then distribution is positively skewed ++ If S_{KP} is negative then distribution is negatively skewed + +*** Moment based coefficient of skewness + +The moment based coefficient of skewness is denoted by \beta_1. The \mu here is central moment. + +\[ \beta_1 = \frac{\mu_3^2}{\mu_2^3} \] + +The drawback of using \beta_1 as a coefficient of skewness is that it *can only tell if distribution is symmetrical or not* ,when $\beta_1 = 0$. +It can't tell us the direction of skewness, i.e positive or negative. + ++ If \beta_1 is zero, then distribution is symmetrical + +*** Karl Pearson's \gamma_1 + +To remove the drawback of the \beta_1 , we can derive Karl Pearson's \gamma_1 + +\[ \gamma_1 = \sqrt{\beta_1} \] +\[ \gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} \] + ++ If \mu_3 is positive, the distribution has positive skewness ++ If \mu_3 is negative, the distribution has negative skewness ++ If \mu_3 is zero, the distribution is symmetrical + +** Kurtosis + +Kurtosis is the measure of the peak and the curve and the "fatness" of the curve. + +# https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/ +[[./kurtosis.PNG]] + +# https://www.bogleheads.org/wiki/Excess_kurtosis +[[./kurtosis2.PNG]] + +The kurtosis is calculated using \beta_2 + +\[ \beta_2 = \frac{\mu_4}{\mu_2^2} \] + +The value of \beta_2 tell's us about the type of curve + ++ Leptokurtic (High Peak) when \beta_2 > 3 ++ Mesokurtic (Normal Peak) when \beta_2 = 3 ++ Platykurtic (Low Peak) when \beta_2 < 3 + +*** Karl Pearson's \gamma_2 + +\gamma_2 is defined as: + +\[ \gamma_2 = \beta_2 - 3 \] + ++ Leptokurtic when \gamma_2 > 0 ++ Mesokurtic when \gamma_2 = 0 ++ Platykurtic when \gamma_2 < 0 + +# *Probability + +* Basic Probability + +** Conditional Probability + +If some event B has already occured, then the probability of the event A is: + +\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \] + +$P(A \mid B)$ is read as A given B. So we are given that B has occured and this is probability of now A occuring. + +** Law of Total Probability + +The law of total probability is used to find probability of some event A that has been partitioned into several different places/parts. + +\[ P(A) = P(A|B_1)P(B_1) + P(A|B_2)P(B_2) + P(A|B_3)P(B_3) + ... + P(A|B_i)P(B_i) \] +\[ P(A) = \Sigma P(A|B_i)P(B_i) \] + +*Example*, Suppose we have 2 bags with marbles + ++ Bag 1 : 7 red marbles and 3 green marbles ++ Bag 2 : 2 red marbles and 8 green marbles + +Now we select one bag at random (i.e, the probability of choosing any of the two bags is equal so 0.5). If we draw a marble, what is the probability that it is a green marble? + +*Sol.* The green marbles are in parts in bag 1 and bag 2. \\ +Let G be the event of green marble. \\ +Let B_1 be the event of choosing the bag 1 \\ +Let B-2 be the event of choosing the bag 2 \\ + +Then, $P(G|B_1) = \frac{3}{7 + 3}$ and $P(G|B_2) = \frac{8}{2 + 8}$ +\\ +Now, we can use the law of total probability to get + +\[ P(G) = P(G|B_1)P(B_1) + P(G|B_2)P(B_2) \] + +*Example* 2, Suppose a there are 3 forests in a park. ++ Forest A occupies 50% of land and 20% plants in it are poisonous ++ Forest B occupies 30% of land and 40% plants in it are poisonous ++ Forest C occupies 20% of land and 70% plants in it are poisonous +What is the probability of a random plant from the park being poisonous. + +*Sol.* Since probability is equal across whole area of the park. Event A is plant being from Forest A, Event B is plant being from Forest B and Event C is plant being from Forest C. If event P is plant being poisonous, then using law of total probability, + +\[ P(P) = P(P|A)P(A) + P(P|B)P(B) + P(P|C)P(C) \] + +And we know P(A) = 0.5, P(B) = 0.3 and P(C) = 0.2. Also P(P|A) = 0.20, P(P|B) = 0.40 and P(P|C) = 0.70 + + +** Some basic identities + ++ Probabilities follow law of inclusion and exclusion +\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \] + ++ DeMorgan's Theorem +\[ P(\overline{A \cap B }) = P(\overline{A} \cup \overline{B}) \] +\[ P(\overline{A \cup B }) = P(\overline{A} \cap \overline{B}) \] + ++ Some other Identity +\[ P(\overline{A} \cap B) + P(A \cap B) = P(B) \] +\[ P(A \cap \overline{B}) + P(A \cap B) = P(A) \] diff --git a/main.pdf b/main.pdf new file mode 100644 index 0000000..0aa9fec Binary files /dev/null and b/main.pdf differ diff --git a/remove.sh b/remove.sh new file mode 100644 index 0000000..aed926a --- /dev/null +++ b/remove.sh @@ -0,0 +1,13 @@ +set -xe + +remove () { + [ -e $1 ] && rm $1 +} + +remove main.toc +remove main.aux +remove main.log +remove main.out +remove main.tex +remove main.tex~ +remove main.html diff --git a/skewness.PNG b/skewness.PNG new file mode 100644 index 0000000..e249861 Binary files /dev/null and b/skewness.PNG differ