Statistics and Probability Concepts

In machine learning, statistics and probability play an important role. Whenever we infer population parameters from sample statistics, we associate a probability to it. While prediction, probability plays an important role. Statistics and probability go hand in hand.
While learning statistics for machine learning, I came across many important terms. In this article, I like to summarise all the important terms which I have studied so far.
Table of Content
- Random Variable
- Probability distribution
- PMF vs PDF vs CDF
- Expected value
- Independent Events vs Mutually Exclusive Events vs Dependent Events
- Joint Probability vs Marginal Probability vs Conditional Probability vs Union Probability
- Bayes Theorem
- Normal distribution vs Uniform distribution
- Descriptive Statistics, Inferential Statistics
- Sampling Distribution, Central Limit Theorem
- Hypothesis Testing
1. Random Variable
What all values, random variable can take after performing an experiment. It is denoted as X
Example: Rolling a die. Random Variable X can take values [1,2,3,4,5,6]
Random variable can be discrete or continuous.
2. Probability distribution
It describes how probability is distributed over the values of the random variable.
Probability function P(X) is used to describe the probability distribution
Example: Probability of getting 2 while rolling a die.
Here 2 is the random variable
P(X=2) = 1/6
3. PDF vs PMF vs CDF
PMF — Probability Mass function
The probability distribution of discrete variables is known as the probability mass function.

CDF -Cumulative Distribution Function
CDF is used to calculate the cumulative probability for a given random variable (X)
Example. What is the probability of getting values less than or equal to 4, while rolling a dice? P(X≤4)
P(X≤4) =0.67 .

PDF — Probability density function
The probability distribution of continuous variables is known as the probability density function.
Example: Probability of weight of students in a class.


4.Expected value
The expected value is the mean of the random variable
E(P(X))= X * P(X)
Example: What is the expected value while rolling a dice?
Random variables → X={1,2,3,4,5,6}
Probability distribution → P(X=x)=1/6
[x can be 1 or 2 or 3 or 4 or 5 or 6]
Expected Value → E(P(X))=1/6*1 + 1/6*2 + 1/6*3 +1/6*4 +1/6*5 +1/6*6
5. Independent Events vs Mutually Exclusive Events vs Dependent Events
Mutually Exclusive Events:
Event A and Event B are said to be mutually exclusive if they have no common outcomes and both can’t occur at the same time.
P( A and B )=0
Example:
Event A= Drawing a King from a deck of cards
Event B= Drawing a Queen from a deck of cards
Mutually Exclusive and Collectively Exhaustive Events:
The sum of probabilities of mutually exclusive and collectively exhaustive events is 1.
Example: Throwing a fair coin.
A → Getting Head
B → Getting Tail
Both events A and B are mutually exclusive and collectively exhaustive events. The sum of probabilities of both the events is 1.
P(A)+P(B)=1
Independent Events
Event A and Event B are said to be independent if the occurrence of event A is not dependent on the occurrence of event B.
P(A and B)=P(A) * P(B)
Probability of getting 2 heads in a row = 1/2 * 1/2 =1/4
[Probability of getting heads in the second trial is not affected by the probability of getting heads in the first trial.
Dependent Events
Event A and Event B are said to be dependent if the occurrence of event A affects the occurrence of event B.
P(A and B) = P(A|B) P(B)
P(A) → Probability of drawing a King =4/52
P(B) → Probability of drawing a red card =26/52
P(A and B) → Probability of drawing a King and Red card =2/52
Let’s calculate using the formula:
P( A and B) =P(A|B) * P(B)
P(A|B)=Probability of drawing a king given red card = 2/26
P(B)= Probability of getting red card = 26/52
P(A and B)= 2/26 * 26/52 =2/52
6. Joint Probability vs Marginal Probability vs Conditional Probability vs Union Probability
Joint Probability — P( A and B)
Probability of A and B occurring.

Union Probability — P(A or B)
The probability of A or B occurring.
P(A∪B) =P(A) + P(B) — P(A∩B)
Marginal Probability -P(A)
Probability of A occurring
Conditional Probability -P(A|B)
Probability of A occurring given that B has occurred.
P(A|B) =P(A∩B)/P(B)
P(A and B) = Joint Probability
P(B) → Marginal Probability
If A and B are independent events,
P(A|B) = P(A)
P(B|A)=P(B)
7. Bayes Theorem
By using the Bayes theorem, we can calculate the conditional probability from the other conditional probability.
In some scenarios, computing P(A|B) or P(B|A) will be easy. Calculate the conditional probability which is easy to compute from the data.
Bayes theorem can be used to compute conditional probability which is really challenging.

8. Normal Distribution vs Uniform Distribution
Uniform Distribution:
The probability is uniformly distributed across all possible outcomes of the random variable
Example: Rolling a die
Probability is uniformly distributed across all possible outcomes {1,2,3,4,5,6}
P(X=1),P(X=2),P(X=3),P(X=4),P(X=5),P(X=6) →1/6
Uniform Distribution — Rectangle
Normal Distribution
A normal distribution is also known as Gaussian distribution. In a normal distribution, data points are distributed more around the mean. It is symmetric in shape.
Parameters for normal distribution →Mean and Variance
The shape of normal distribution — Bell shape
Mean=Median =Mode

9. Descriptive Statistics, Inferential Statistics
Descriptive Statistics:
Descriptive statistics are used to describe and summarize the data.
Measure of Central Tendency:
1. Mean — Average value
2. Median — Middle value
3. Mode — The most common value
Measure of Spread:
1.Variance — How far the data points vary from the mean value.
2. Standard Deviation- Square root of the variance
3. Range — Difference between the maximum value and minimum value
Measure of skewness :
1.Right skewed- The distribution is skewed towards the positive side. It has a long right tail.
2. Left skewed — The distribution is skewed towards the negative side. It has long left tail
Inferential Statistics:
Infer population parameter from a sample statistic
Central Limit Theorem, Hypothesis testing
10.Sampling Distribution, Central Limit Theorem
Sampling Distribution
Sampling — Taking representative samples from the population
The sampling distribution of the mean is the mean of all the sample means.
Sampling distribution properties
Sampling distribution of mean = Population mean
Samplimg distribution standard devation= population standard deviation / sqrt(sample size)
Central Limit Theorem:
If the sample size is greater than 30, the sampling distribution of mean follows a normal distribution.
11.Confidence Interval
Confidence Interval means the range in which population parameters can occur. It is an interval estimate. It provides additional information about the variability of the population parameter.
12. Hypothesis Testing
Hypothesis testing is used to test whether the assumption of population parameter should be rejected or not.
Null Hypothesis: Status quo
Alternate Hypothesis: challenges the status quo.
Status quo means accepted norm
Conclusion:
I have some important terms in statistics and probability for machine learning. Thanks for reading and I hope you all like it.
My other blog on statistics.
https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1
https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1
https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1
https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1
If you like to read more of my tutorials, follow me on Medium, LinkedIn, Twitter.
Become a Medium Member by Clicking here: https://indhumathychelliah.medium.com/membership