Important Terms in Statistics- Machine Learning

Statistics and Probability Concepts

In machine learning, statistics and probability play an important role. Whenever we infer population parameters from sample statistics, we associate a probability to it. While prediction, probability plays an important role. Statistics and probability go hand in hand.

While learning statistics for machine learning, I came across many important terms. In this article, I like to summarise all the important terms which I have studied so far.

Table of Content

Random Variable
Probability distribution
PMF vs PDF vs CDF
Expected value
Independent Events vs Mutually Exclusive Events vs Dependent Events
Joint Probability vs Marginal Probability vs Conditional Probability vs Union Probability
Bayes Theorem
Normal distribution vs Uniform distribution
Descriptive Statistics, Inferential Statistics
Sampling Distribution, Central Limit Theorem
Hypothesis Testing

1. Random Variable

What all values, random variable can take after performing an experiment. It is denoted as X
Example: Rolling a die. Random Variable X can take values [1,2,3,4,5,6]

Random variable can be discrete or continuous.

2. Probability distribution

It describes how probability is distributed over the values of the random variable.

Probability function P(X) is used to describe the probability distribution

Example: Probability of getting 2 while rolling a die.
Here 2 is the random variable

P(X=2) = 1/6

3. PDF vs PMF vs CDF

PMF — Probability Mass function

The probability distribution of discrete variables is known as the probability mass function.

CDF -Cumulative Distribution Function

CDF is used to calculate the cumulative probability for a given random variable (X)

Example. What is the probability of getting values less than or equal to 4, while rolling a dice? P(X≤4)

P(X≤4) =0.67 .

CDF for a discrete variable [Image by Author]

PDF — Probability density function

The probability distribution of continuous variables is known as the probability density function.

Example: Probability of weight of students in a class.

The probability density function for continuous variable [Image by Author]

CDF for continuous random variable [Image by Author]

4.Expected value

The expected value is the mean of the random variable

E(P(X))= X * P(X)

Example: What is the expected value while rolling a dice?

Random variables → X={1,2,3,4,5,6}

Probability distribution → P(X=x)=1/6

[x can be 1 or 2 or 3 or 4 or 5 or 6]

Expected Value → E(P(X))=1/6*1 + 1/6*2 + 1/6*3 +1/6*4 +1/6*5 +1/6*6

5. Independent Events vs Mutually Exclusive Events vs Dependent Events

Mutually Exclusive Events:

Event A and Event B are said to be mutually exclusive if they have no common outcomes and both can’t occur at the same time.

P( A and B )=0

Example:

Event A= Drawing a King from a deck of cards
Event B= Drawing a Queen from a deck of cards

Mutually Exclusive and Collectively Exhaustive Events:

The sum of probabilities of mutually exclusive and collectively exhaustive events is 1.

Example: Throwing a fair coin.

A → Getting Head
B → Getting Tail

Both events A and B are mutually exclusive and collectively exhaustive events. The sum of probabilities of both the events is 1.

P(A)+P(B)=1

Independent Events

Event A and Event B are said to be independent if the occurrence of event A is not dependent on the occurrence of event B.

P(A and B)=P(A) * P(B)
Probability of getting 2 heads in a row = 1/2 * 1/2 =1/4

[Probability of getting heads in the second trial is not affected by the probability of getting heads in the first trial.

Dependent Events

Event A and Event B are said to be dependent if the occurrence of event A affects the occurrence of event B.

P(A and B) = P(A|B) P(B)

P(A) → Probability of drawing a King =4/52
P(B) → Probability of drawing a red card =26/52

P(A and B) → Probability of drawing a King and Red card =2/52

Let’s calculate using the formula:

P( A and B) =P(A|B) * P(B)

P(A|B)=Probability of drawing a king given red card = 2/26
P(B)= Probability of getting red card = 26/52

P(A and B)= 2/26 * 26/52 =2/52

6. Joint Probability vs Marginal Probability vs Conditional Probability vs Union Probability

Joint Probability — P( A and B)

Probability of A and B occurring.

Joint Probability for different events [Image by Author]

Union Probability — P(A or B)

The probability of A or B occurring.

P(A∪B) =P(A) + P(B) — P(A∩B)

Marginal Probability -P(A)

Probability of A occurring

Conditional Probability -P(A|B)

Probability of A occurring given that B has occurred.

P(A|B) =P(A∩B)/P(B)

P(A and B) = Joint Probability
P(B) → Marginal Probability

If A and B are independent events,

P(A|B) = P(A)
P(B|A)=P(B)

7. Bayes Theorem

By using the Bayes theorem, we can calculate the conditional probability from the other conditional probability.

In some scenarios, computing P(A|B) or P(B|A) will be easy. Calculate the conditional probability which is easy to compute from the data.
Bayes theorem can be used to compute conditional probability which is really challenging.

8. Normal Distribution vs Uniform Distribution

Uniform Distribution:

The probability is uniformly distributed across all possible outcomes of the random variable

Example: Rolling a die

Probability is uniformly distributed across all possible outcomes {1,2,3,4,5,6}

P(X=1),P(X=2),P(X=3),P(X=4),P(X=5),P(X=6) →1/6

Uniform Distribution — Rectangle

Normal Distribution

A normal distribution is also known as Gaussian distribution. In a normal distribution, data points are distributed more around the mean. It is symmetric in shape.

Parameters for normal distribution →Mean and Variance

The shape of normal distribution — Bell shape

Mean=Median =Mode

9. Descriptive Statistics, Inferential Statistics

Descriptive Statistics:

Descriptive statistics are used to describe and summarize the data.

Measure of Central Tendency:
1. Mean — Average value
2. Median — Middle value
3. Mode — The most common value

Measure of Spread:
1.Variance — How far the data points vary from the mean value.
2. Standard Deviation- Square root of the variance
3. Range — Difference between the maximum value and minimum value

Measure of skewness :
1.Right skewed- The distribution is skewed towards the positive side. It has a long right tail.
2. Left skewed — The distribution is skewed towards the negative side. It has long left tail

Inferential Statistics:

Infer population parameter from a sample statistic

Central Limit Theorem, Hypothesis testing

10.Sampling Distribution, Central Limit Theorem

Sampling Distribution

Sampling — Taking representative samples from the population
The sampling distribution of the mean is the mean of all the sample means.

Sampling distribution properties

Sampling distribution of mean = Population mean
Samplimg distribution standard devation= population standard deviation / sqrt(sample size)

Central Limit Theorem:

If the sample size is greater than 30, the sampling distribution of mean follows a normal distribution.

11.Confidence Interval

Confidence Interval means the range in which population parameters can occur. It is an interval estimate. It provides additional information about the variability of the population parameter.

12. Hypothesis Testing

Hypothesis testing is used to test whether the assumption of population parameter should be rejected or not.

Null Hypothesis: Status quo
Alternate Hypothesis: challenges the status quo.

Status quo means accepted norm

Conclusion:

I have some important terms in statistics and probability for machine learning. Thanks for reading and I hope you all like it.

My other blog on statistics.

https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1

If you like to read more of my tutorials, follow me on Medium, LinkedIn, Twitter.

Become a Medium Member by Clicking here: https://indhumathychelliah.medium.com/membership

Buy Me a Coffee