You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

9.0 KiB

Raw Blame History Unescape Escape

Probability and Statistics ( BTech CSE )

Ungrouped Data
Grouped Data
Relation between Mean, Median and Mode
Relation between raw and central moments
Skewness and Kurtosis
- Skewness
- Kurtosis
  - Karl Pearson's γ_2
Basic Probability

Ungrouped Data

Ungrouped data is data that has not been arranged in any way.So it is just a list of observations

\[ x_1, x_2, x_3, ... x_n \]

Mean

\[ \bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n} \]

\[ \bar{x} = \frac{ \sum_{i = 1}^{n} x_i }{n} \]

Mode

The observation which occurs the highest number of time. So the x_i which has the highest count in the observation list.

Median

The median is the middle most observations. After ordering the n observations in observation list in either Ascending or Descending order (any order works). The median will be :

n is even

\[ Median = \frac{ x_\frac{n}{2} + x_{(\frac{n}{2}+1)} }{2} \]

n is odd

\[ Median = x_\frac{n+1}{2} \]

Variance and Standard Deviation

\[ Variance = \sigma^2 \] \[ Standard\ deviation = \sigma \]

\[ \sigma^2 = \frac{\sum_{i=1}^{n} (x_i - Mean)^2 }{n} \]

\[ \sigma^2 = \frac{\sum_{i=1}^n x_i^2}{n} - (Mean)^2 \]

Moments

About some constant A

\[ r^{th}\ moment = \frac{1}{n} \Sigma(x_i - A)^r \]

About Mean (Central Moment)

When A = Mean, then the moment is called central moment.

\[ \mu_r = \frac{1}{n} \Sigma(x_i - Mean)^r \]

About Zero (Raw Moment)

When A = 0, then the moment is called raw moment.

\[ \mu_r^{'} = \frac{1}{n} \Sigma x_i^r \]

Grouped Data

Data which is grouped based on the frequency at which it occurs. So if 9 appears 5 times in our observations, we group as x(observation) = 9 and f (frequency) = 5.

x (observations)	f (frequency)
2	5
1	3
4	5
8	9

If we store it in data way, i.e. the observations are of form 10-20, 20-30, 30-40 … then we will get $x_i$ by doing

\[ x_i = \frac{lower\ limit + upper\ limit}{2} \]

i.e,

$x_i$ for 20-30 will be $\frac{20 + 30}{2}$

So for data

	f (frequency)
0- 20	2
20-40	6
40-60	1
60-80	3

the $x_i$'s will become.

	f_i	x_i
0- 20	2	10
20-40	6	30
40-60	1	50
60-80	3	70

Mean

\[ \bar{x} = \frac{ \Sigma f_i x_i}{\Sigma f_i } \]

Mode

The modal class is the record with the row with the highest f_i

\[ Mode = l + (\frac{f_1 - f_0}{2f_1 - f_0 - f_2}) \times h \]

In the formula :
l → lower limit of modal class
f_1 → frequency(f_i) of the modal class
f_0 → frequency of the row preceding modal class
f_2 → frequency of the row after the modal class
h → size of class interval (upper limit - lower limit)

Median

The median for grouped data is calculated with the help of cumulative frequency. The cumulative frequency (cf_i) is given by:

\[ cf_i = f_1 + f_2 + f_3 + ... + f_i \]

The median class is the class whose cf_i is just greater than or is equal to $\frac{\Sigma f}{2}$

\[ Median = l + (\frac{(n/2) - cf}{f}) \times h \]

In the formula :
l → lower limit of the median class
h → size of class interval (upper limit - lower limit)
n → number of observations
cf → cumulative frequency of the median class
f → frequency of the median class

Variance and Standard Deviation

\[ Variance = \sigma^2 \] \[ Standard\ deviation = \sigma \]

\[ \sigma^2 = \frac{\sum_{i=1}^{n} f_i(x_i - Mean)^2 }{\Sigma f_i} \]

\[ \sigma^2 = \frac{\sum_{i=1}^n f_ix_i^2}{\Sigma f_i} - (Mean)^2 \]

Moments

About some constant A

\[ r^{th}\ moment = \frac{1}{\Sigma f_i} [\Sigma f_i (x_i - A)^r] \]

About Mean (Central Moment)

When A = Mean, then the moment is called central moment. \[ \mu_r = \frac{1}{\Sigma f_i} [\Sigma f_i (x_i - Mean)^r] \]

About Zero (Raw Moment)

When A = 0, then the moment is called raw moment. \[ \mu_r^{'} = \frac{1}{\Sigma f_i} [\Sigma f_i x_i^r] \]

Relation between Mean, Median and Mode

\[ 3Median = 2Mean + Mode \]

Relation between raw and central moments

\[ \mu_0 = \mu_0^{'} = 1 \] \[ \mu_1 = 0 \] \[ \mu_2 = \mu_2^{'} - \mu_1^{'2} \] \[ \mu_3 = \mu_3^{'} - 3\mu_1^{'}\mu_2^{'} + 2\mu_1^{'3} \] \[ \mu_4 = \mu_4^{'} - 4\mu_3^{'}\mu_1^{'} + 6\mu_2^{'}\mu_1^{'2} - 3\mu_1^{'4} \]

Skewness and Kurtosis

Skewness

If Mean > Mode, then skewness is positive
If Mean = Mode, then skewness is zero (graph is symmetric)
If Mean < Mode, then skewness is zero

/Documents/ProbabilityAndStatistics/src/commit/8552f74f917912f9a5161145728b72fdfd9a7130/skewness.PNG

Pearson's coefficient of skewness

The pearson's coefficient of skewness is denoted by S_KP

\[ S_{KP} = \frac{Mean - Mode}{Standard\ Deviation} \]

If S_KP is zero then distribution is symmetrical
If S_KP is positive then distribution is positively skewed
If S_KP is negative then distribution is negatively skewed

Moment based coefficient of skewness

The moment based coefficient of skewness is denoted by β_1. The μ here is central moment.

\[ \beta_1 = \frac{\mu_3^2}{\mu_2^3} \]

The drawback of using β_1 as a coefficient of skewness is that it can only tell if distribution is symmetrical or not ,when $\beta_1 = 0$. It can't tell us the direction of skewness, i.e positive or negative.

If β_1 is zero, then distribution is symmetrical

Karl Pearson's γ_1

To remove the drawback of the β_1 , we can derive Karl Pearson's γ_1

\[ \gamma_1 = \sqrt{\beta_1} \] \[ \gamma_1 = \frac{\mu_3}{\mu_2^{3/2}} \]

If μ_3 is positive, the distribution has positive skewness
If μ_3 is negative, the distribution has negative skewness
If μ_3 is zero, the distribution is symmetrical

Kurtosis

Kurtosis is the measure of the peak and the curve and the "fatness" of the curve.

/Documents/ProbabilityAndStatistics/src/commit/8552f74f917912f9a5161145728b72fdfd9a7130/kurtosis.PNG

/Documents/ProbabilityAndStatistics/src/commit/8552f74f917912f9a5161145728b72fdfd9a7130/kurtosis2.PNG

The kurtosis is calculated using β_2

\[ \beta_2 = \frac{\mu_4}{\mu_2^2} \]

The value of β_2 tell's us about the type of curve

Leptokurtic (High Peak) when β_2 > 3
Mesokurtic (Normal Peak) when β_2 = 3
Platykurtic (Low Peak) when β_2 < 3

Karl Pearson's γ_2

γ_2 is defined as:

\[ \gamma_2 = \beta_2 - 3 \]

Leptokurtic when γ_2 > 0
Mesokurtic when γ_2 = 0
Platykurtic when γ_2 < 0

Basic Probability

Conditional Probability

If some event B has already occured, then the probability of the event A is:

\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]

$P(A \mid B)$ is read as A given B. So we are given that B has occured and this is probability of now A occuring.

Law of Total Probability

The law of total probability is used to find probability of some event A that has been partitioned into several different places/parts.

Example, Suppose we have 2 bags with marbles

Bag 1 : 7 red marbles and 3 green marbles
Bag 2 : 2 red marbles and 8 green marbles

Now we select one bag at random (i.e, the probability of choosing any of the two bags is equal so 0.5). If we draw a marble, what is the probability that it is a green marble?

Sol. The green marbles are in parts in bag 1 and bag 2.
Let G be the event of green marble.
Let B_1 be the event of choosing the bag 1
Let B-2 be the event of choosing the bag 2 \\

Then, $P(G|B_1) = \frac{3}{7 + 3}$ and $P(G|B_2) = \frac{8}{2 + 8}$ \\ Now, we can use the law of total probability to get

\[ P(G) = P(G|B_1)P(B_1) + P(G|B_2)P(B_2) \]

Example 2, Suppose a there are 3 forests in a park.

Forest A occupies 50% of land and 20% plants in it are poisonous
Forest B occupies 30% of land and 40% plants in it are poisonous
Forest C occupies 20% of land and 70% plants in it are poisonous

What is the probability of a random plant from the park being poisonous.

Sol. Since probability is equal across whole area of the park. Event A is plant being from Forest A, Event B is plant being from Forest B and Event C is plant being from Forest C. If event P is plant being poisonous, then using law of total probability,

\[ P(P) = P(P|A)P(A) + P(P|B)P(B) + P(P|C)P(C) \]

And we know P(A) = 0.5, P(B) = 0.3 and P(C) = 0.2. Also P(P|A) = 0.20, P(P|B) = 0.40 and P(P|C) = 0.70

Some basic identities

Probabilities follow law of inclusion and exclusion

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

DeMorgan's Theorem

\[ P(\overline{A \cap B }) = P(\overline{A} \cup \overline{B}) \] \[ P(\overline{A \cup B }) = P(\overline{A} \cap \overline{B}) \]

Some other Identity

\[ P(\overline{A} \cap B) + P(A \cap B) = P(B) \] \[ P(A \cap \overline{B}) + P(A \cap B) = P(A) \]

9.0 KiB Raw Blame History Unescape Escape

Probability and Statistics ( BTech CSE )

Ungrouped Data

Mean

Mode

Median

Variance and Standard Deviation

Moments

About some constant A

About Mean (Central Moment)

About Zero (Raw Moment)

Grouped Data

Mean

Mode

Median

Variance and Standard Deviation

Moments

About some constant A

About Mean (Central Moment)

About Zero (Raw Moment)

Relation between Mean, Median and Mode

Relation between raw and central moments

Skewness and Kurtosis

Skewness

Pearson's coefficient of skewness

Moment based coefficient of skewness

Karl Pearson's γ_1

Kurtosis

Karl Pearson's γ_2

Basic Probability

Conditional Probability

Law of Total Probability

Some basic identities

9.0 KiB

Raw Blame History Unescape Escape