# Important Terms in Statistics- Machine Learning

#### Statistics and Probability Concepts

In machine learning, statistics and probability play an important role. Whenever we infer population parameters from sample statistics, we associate a probability to it. While prediction, probability plays an important role. Statistics and probability go hand in hand.

While learning statistics for machine learning, I came across many important terms. In this article, I like to summarise all the important terms which I have studied so far.

### Table of Content

1. Random Variable
2. Probability distribution
3. PMF vs PDF vs CDF
4. Expected value
5. Independent Events vs Mutually Exclusive Events vs Dependent Events
6. Joint Probability vs Marginal Probability vs Conditional Probability vs Union Probability
7. Bayes Theorem
8. Normal distribution vs Uniform distribution
9. Descriptive Statistics, Inferential Statistics
10. Sampling Distribution, Central Limit Theorem
11. Hypothesis Testing

### 1. Random Variable

What all values, random variable can take after performing an experiment. It is denoted as X
Example: Rolling a die. Random Variable X can take values [1,2,3,4,5,6]

Random variable can be discrete or continuous.

### 2. Probability distribution

It describes how probability is distributed over the values of the random variable.

Probability function P(X) is used to describe the probability distribution

Example: Probability of getting 2 while rolling a die.
Here 2 is the random variable

P(X=2) = 1/6

### 3. PDF vs PMF vs CDF

#### PMF — Probability Mass function

The probability distribution of discrete variables is known as the probability mass function.

#### CDF -Cumulative Distribution Function

CDF is used to calculate the cumulative probability for a given random variable (X)

Example. What is the probability of getting values less than or equal to 4, while rolling a dice? P(X≤4)

P(X≤4) =0.67 .

#### PDF — Probability density function

The probability distribution of continuous variables is known as the probability density function.

Example: Probability of weight of students in a class.

### 4.Expected value

The expected value is the mean of the random variable

E(P(X))= X * P(X)

Example: What is the expected value while rolling a dice?

Random variables → X={1,2,3,4,5,6}

Probability distribution → P(X=x)=1/6

[x can be 1 or 2 or 3 or 4 or 5 or 6]

Expected Value → E(P(X))=1/6*1 + 1/6*2 + 1/6*3 +1/6*4 +1/6*5 +1/6*6

### 5. Independent Events vs Mutually Exclusive Events vs Dependent Events

#### Mutually Exclusive Events:

Event A and Event B are said to be mutually exclusive if they have no common outcomes and both can’t occur at the same time.

P( A and B )=0

Example:

Event A= Drawing a King from a deck of cards
Event B= Drawing a Queen from a deck of cards

#### Mutually Exclusive and Collectively Exhaustive Events:

The sum of probabilities of mutually exclusive and collectively exhaustive events is 1.

Example: Throwing a fair coin.

B → Getting Tail

Both events A and B are mutually exclusive and collectively exhaustive events. The sum of probabilities of both the events is 1.

P(A)+P(B)=1

#### Independent Events

Event A and Event B are said to be independent if the occurrence of event A is not dependent on the occurrence of event B.

P(A and B)=P(A) * P(B)
Probability of getting 2 heads in a row = 1/2 * 1/2 =1/4

[Probability of getting heads in the second trial is not affected by the probability of getting heads in the first trial.

#### Dependent Events

Event A and Event B are said to be dependent if the occurrence of event A affects the occurrence of event B.

P(A and B) = P(A|B) P(B)

P(A) → Probability of drawing a King =4/52
P(B) → Probability of drawing a red card =26/52

P(A and B) → Probability of drawing a King and Red card =2/52

Let’s calculate using the formula:

P( A and B) =P(A|B) * P(B)

P(A|B)=Probability of drawing a king given red card = 2/26
P(B)= Probability of getting red card = 26/52

P(A and B)= 2/26 * 26/52 =2/52

### 6. Joint Probability vs Marginal Probability vs Conditional Probability vs Union Probability

#### Joint Probability — P( A and B)

Probability of A and B occurring.

#### Union Probability — P(A or B)

The probability of A or B occurring.

P(A∪B) =P(A) + P(B) — P(A∩B)

#### Marginal Probability -P(A)

Probability of A occurring

#### Conditional Probability -P(A|B)

Probability of A occurring given that B has occurred.

#### P(A|B) =P(A∩B)/P(B)

P(A and B) = Joint Probability
P(B) → Marginal Probability

If A and B are independent events,

P(A|B) = P(A)
P(B|A)=P(B)

### 7. Bayes Theorem

By using the Bayes theorem, we can calculate the conditional probability from the other conditional probability.

In some scenarios, computing P(A|B) or P(B|A) will be easy. Calculate the conditional probability which is easy to compute from the data.
Bayes theorem can be used to compute conditional probability which is really challenging.

### 8. Normal Distribution vs Uniform Distribution

#### Uniform Distribution:

The probability is uniformly distributed across all possible outcomes of the random variable

Example: Rolling a die

Probability is uniformly distributed across all possible outcomes {1,2,3,4,5,6}

P(X=1),P(X=2),P(X=3),P(X=4),P(X=5),P(X=6) →1/6

Uniform Distribution — Rectangle

#### Normal Distribution

A normal distribution is also known as Gaussian distribution. In a normal distribution, data points are distributed more around the mean. It is symmetric in shape.

Parameters for normal distribution →Mean and Variance

The shape of normal distribution — Bell shape

Mean=Median =Mode

### 9. Descriptive Statistics, Inferential Statistics

#### Descriptive Statistics:

Descriptive statistics are used to describe and summarize the data.

Measure of Central Tendency:
1. Mean — Average value
2. Median — Middle value
3. Mode — The most common value

1.
Variance — How far the data points vary from the mean value.
2. Standard Deviation- Square root of the variance
3. Range — Difference between the maximum value and minimum value

Measure of skewness :
1.
Right skewed- The distribution is skewed towards the positive side. It has a long right tail.
2. Left skewed — The distribution is skewed towards the negative side. It has long left tail

#### Inferential Statistics:

Infer population parameter from a sample statistic

Central Limit Theorem, Hypothesis testing

### 10.Sampling Distribution, Central Limit Theorem

#### Sampling Distribution

Sampling — Taking representative samples from the population
The sampling distribution of the mean is the mean of all the sample means.

#### Sampling distribution properties

Sampling distribution of mean = Population mean
Samplimg distribution standard devation= population standard deviation / sqrt(sample size)

#### Central Limit Theorem:

If the sample size is greater than 30, the sampling distribution of mean follows a normal distribution.

### 11.Confidence Interval

Confidence Interval means the range in which population parameters can occur. It is an interval estimate. It provides additional information about the variability of the population parameter.

### 12. Hypothesis Testing

Hypothesis testing is used to test whether the assumption of population parameter should be rejected or not.

Null Hypothesis: Status quo
Alternate Hypothesis: challenges the status quo.

Status quo means accepted norm

### Conclusion:

I have some important terms in statistics and probability for machine learning. Thanks for reading and I hope you all like it.

### My other blog on statistics.

https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1

https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1

https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1

https://pub.towardsai.net/inferential-statistics-for-data-science-91cf4e0692b1