#### Data Science

#### Learn about NumPy from scratch

### NumPy

NumPy stands for **Num**erical **Py**thon.NumPy is one of the powerful python libraries that support large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.

In this article, let’s learn about the basics of NumPy needed for DataScience.

### Table of Content

- Different Ways to Create NumPy Arrays
- Python List vs NumPy Arrays
- Attributes of NumPy Array
- Indexing and Slicing NumPy Array
- NumPy Axis in Detail
- Operations on NumPy Arrays.

### Different Ways to Create NumPy Arrays

#### 1. Creating NumPy arrays from a python list

**How to create a vector from python lists?**

A vector is a one-dimensional array.

https://gist.github.com/IndhumathyChelliah/75062b88f6418131eee4b8494ea67b9d

**How to create matrices from a list of lists?**

To create an n-dimensional array from the python list, the list should be

n-level deep.

https://gist.github.com/IndhumathyChelliah/4f683c9f1b7bebc49e093ff83138d3f1

Similarly, we can create NumPy arrays from python tuples using

np. array()

**2. Creating Numpy arrays using arange function**

`np.arange(start,stop,step)`

→ Returns an array in the given range. It is similar to the Python range() function.

https://gist.github.com/IndhumathyChelliah/0ba2c0fad3f22b30ebb03d8d89c9a217

np.arange() will return a one-dimensional array.

To convert it to a multi-dimensional array, `reshape()`

function is used.

https://gist.github.com/IndhumathyChelliah/b458cb06772d7ea188a28a9ed239a451

Difference between range() and arange() function

`→ Step can’t be a float number. It returns a range object which can be accessed like python list.`

range()`→ Step can be a float number. It returns a NumPy array.`

arange()

**3. Creating a NumPy array using np.linspace()**

`np.linspace(start,stop,n)`

→ Returns an array containing an evenly spaced

number of elements in the given range.**n**

https://gist.github.com/IndhumathyChelliah/305dd74d8ac0abe6a93b2ce4690be680

Like arange(), linspace() also returns an one-dimensional array.

To convert it to a multi-dimensional array, `reshape()`

function is used.

https://gist.github.com/IndhumathyChelliah/2697e386e097df8fa2f920c2c8651848

Difference between arange() and linspace()

**4. Creating an array of 0’s, an array of 1’s**

`np.zeros(shape,dtype) `

→ Return an array of given shape and type filled with zeros.

By default, it will create a float array

https://gist.github.com/IndhumathyChelliah/9e52b69e95f192a9cbb73403acfec414

`np.ones(shape,dtype) `

→ Return an array of given shape and type filled with ones.

https://gist.github.com/IndhumathyChelliah/8a8816bc61036199ebd09c42628bc0ca

**5. Creating an array of random numbers**

`np.random.random(shape)`

→ Returns an array of given shape filled with random numbers between o and 1

https://gist.github.com/IndhumathyChelliah/e686fe8965498d1d7711bc9cd863d554

**6. Creating an array filled with the number ‘n’**

`np.full(shape,n)`

→Returns an array of given shape filled with number ‘n’

https://gist.github.com/IndhumathyChelliah/2e2caf0c3b34b87e617e9ec58b25b0b8

### Python List vs NumPy Array

- Python List is
**heterogeneous**. It can contain different data types.

NumPy array is**homogenous**. It contains the same datatypes. The commonly used array is int array and float array. - In NumPy Array, we can perform an element-wise operation.

https://gist.github.com/IndhumathyChelliah/82cc222a767cb1c19889c9ed9e68c043

3. NumPy array is faster than Python lists. Numpy Arrays are fixed in size whereas Python List can change in size.

### Attributes of NumPy Array

If we create a NumPy array from a csv file or if we work with large NumPy arrays, we can get the information about the array by using some of the attributes of the NumPy array.

`np.shape`

→ Returns the shape of the array

`np.dtype`

→Returns the data type

`np.ndim`

→Returns the number of dimensions of the array

https://gist.github.com/IndhumathyChelliah/3bfb0375609eb9458b16ca857086d720

### Indexing and Slicing NumPy Arrays

Indexing and Slicing NumPy array are similar to Python List Indexing and Slicing.

We can access the elements from an array by mentioning a particular index of an element, list of index, slice or slice with step.

**Accessing elements from a one-dimensional array**

https://gist.github.com/IndhumathyChelliah/77292af6640396d8a1ecb196e791c9cf

**2. Accessing elements from a 2-dimensional array**

All arrays generated by basic slicing are always **“views”** of the original array.

Views → An array that does not own its data, but refers to another array’s data instead.

https://gist.github.com/IndhumathyChelliah/8c1539970d27dbcba3c7212ea603ade5

**3. Accessing elements from a 3-Dimensional array**

https://gist.github.com/IndhumathyChelliah/5597a14e3e0057480892d665081242e0

### Boolean Indexing

Boolean Indexing → We can access and modify data using certain conditions.

https://gist.github.com/IndhumathyChelliah/31aade5af9d90ec10950269910ac8365

### Operations on NumPy Arrays

- Arithmetic Operations on NumPy Arrays
- Statistical Methods on Numpy
- Sorting NumPy Arrays
- Set Operations on NumPy Arrays
- Transposing Arrays
- Stacking NumPy Arrays
- Broadcasting NumPy Arrays

### Arithmetic Operations on NumPy Arrays

**Addition**

Using `+`

or `np.add()`

NumPy performs element-wise addition. So, we can perform the addition only if two arrays have the same dimensions.

https://gist.github.com/IndhumathyChelliah/5d01e4886917e0dea3593f530cd4410c

**2. Subtraction**

Using` -`

or `np.subtract()`

https://gist.github.com/IndhumathyChelliah/3fdd3c56eedcc946ba751f99c1d009f2

**3. Multiplication**

Using `*`

or `np.multiply()`

https://gist.github.com/IndhumathyChelliah/62b0551e119bfac8c8912d77eaaf590a

**4. Division**

Using / or np.divide()

https://gist.github.com/IndhumathyChelliah/d96d1af8775e4e91fc1990a404f72d9a

**5.Square root**

`np.sqrt() `

→ Calculating square root for all elements in the array.

https://gist.github.com/IndhumathyChelliah/80eb8a8bbe51185de937eaa29f33fcde

**6.Exponential**

`np.exp()`

→Calculating exponential for all elements in the array.

https://gist.github.com/IndhumathyChelliah/cdb292e7a51ca7d4d8416230e768f218

### NumPy Axis in Detail

Before going into statistical methods, sorting we will learn about the NumPy axis. It’s little bit tricky.

#### 3-Dimensional arrays

Let’s take a 3-D array of shape **(2,2,3)** which indicates **2** →matrices/array,** 2** → rows, **3** → columns.

If we sum across **axis=0** means, it will collapse the specified axis — here in a 3D array, it will collapse the matrices/array. So the shape of the sum matrix will be (2,3). The corresponding axis will be collapsed and that axis will be lost in the sum matrix.

If we sum across **axis=0**, it will collapse the **matrices/array**. It’s done by doing element-wise addition.

If we sum across** axis=1**, it will collapse the **row axis**. It’s done by doing column-wise addition.

For **axis=2**, it will collapse the **column axis**. It’s done by doing row-wise addition.

So, in NumPy, we have to remember **axis** parameter like **“It will collapse along the specified axis”**

#### 2-D array

To know more about the Numpy axis, refer to this medium article “Understanding NumPy sum” written by **Kshitij Bajracharya**.

### Statistical Methods on NumPy

We can compute statistical methods like mean, median, sum on Numpy Arrays.

`np.mean(a) `

→ Computes the mean of an array `a`

`np.median(a) `

→ Computes the median of an array `a`

`np.sum(a)`

→ Computes the sum of array `a`

`np.mean(a,axis=0)`

→ If we mention axis=0, column_wise computation. `np.mean(a,axis=1)`

→ If we mention axis=1, row-wise computation.

https://gist.github.com/IndhumathyChelliah/cce0ef3fa859073294d7822a27b112d9

### Sorting NumPy Arrays

`ndarray.sort()`

→ It will sort the original array`np.sort(ndarray)`

→ It will return the sorted copy of the array

To sort along the particular axis, have to mention the parameter** axis**

https://gist.github.com/IndhumathyChelliah/b268bc0087afe02b297f86a4a83a8781

### Set Operations on NumPy Arrays

Set operations can be done on 1-D NumPy Arrays

https://gist.github.com/IndhumathyChelliah/e63642f1bd6ae4de089c89a461cc108b

### Transposing NumPy Array

https://gist.github.com/IndhumathyChelliah/7497f058647721b18c4007504dbeafd2

### Stacking NumPy Arrays

**np.hstack(a1,a2)** → Horizontal Stacking. Two arrays should have the same number of rows

**np.vstack(a1,a2)** → Vertical Stacking. Two arrays should have the same number of columns.

https://gist.github.com/IndhumathyChelliah/c1865d08030ad9746f774417a2b9032b

### Broadcasting NumPy Arrays

In NumPy, arithmetic operations are done element-wise. To achieve this, arrays should be of the same size/shape.

To perform arithmetic operations on arrays of different sizes/shapes, broadcasting is used.

Imagine that broadcasting means stretching the array to the required shape/size to perform arithmetic operations on it.

**Broadcasting Rules**:

- The size of each dimension should be the same.
- The size of one of the dimensions should be one.

#### Scenario 1: Dimensions of both the arrays are the same

**Example 1: Two-dimensional arrays**

Let’s add two 2-D arrays (a1,a2) of different shapes.

The shape of a1 (2,1)

The shape of a2 (2,2)

a1+a2 → We can perform addition because the broadcast rule matches.

- The first axis is the same
- The second axis is 1 in one of the arrays (a1)
*[If the axis are not the same, one of them should be 1]*

https://gist.github.com/IndhumathyChelliah/e4b76cfdb637160c147530d4170838a4

**Example 2: Three-dimensional arrays**

Let’s add two 3-D arrays (a1,a2) of different shapes.

The shape of a1 (2,3,1)

The shape of a2 (2,3,2)

a1+a2 → We can perform addition because the broadcast rule matches.

- The first axis is the same
- The second axis is the same
- The third axis is 1 in one of the arrays (a1)
*[If the axis are not the same, one of them should be 1]*

https://gist.github.com/IndhumathyChelliah/6e9bf91bffcb63af561f873c7e779abb

#### Scenario 2: Dimensions of both the arrays are different

If the dimensions of the two arrays are different, then the shape of one with fewer dimensions is padded with ones on its leading side (left side)

**Example: Let’s add two arrays of different dimensions.**

The shape of a1 → (3,2)

The shape of a2 →(2,3,2)

Since dimensions of both arrays (a1,a2) are different, then the shape of a1 *(a1 having fewer dimensions than a2)* is padded with ones on its leading side (left side)

The shape of a1 becomes → (1,3,2) *[a1 only has fewer dimensions than a2]*

Now check the broadcast rules in a1,a2

- The first axis is 1 on one of the arrays (a1)
*[If the axis are not the same, one of them should be 1]* - The second axis is the same
- The third axis is the same

Now we can perform a1+a2 → because it matches broadcast rules.

https://gist.github.com/IndhumathyChelliah/42e820eab093519802fcf5e1ca889a9d

If we need to perform arithmetic operations on two arrays of different shapes, then it should match broadcast rules. Otherwise, it will throw an error.* (Not able to perform arithmetic operations)*

**Example: **

The shape of a1 (2,3)

The shape of a2 (2,4)

We can’t perform a1+a2, because it doesn’t match broadcast rules.

[Second axis is different and it is not 1 in one of the arrays]

### Conclusion

In this article, I have covered most of the standard NumPy operations required to kickstart your journey in DataScience.

### My Blog on Pandas

*Watch this space for more articles on Python and DataScience. If you like to read more of my tutorials, follow me on *** Medium**,

*LinkedIn*

*,**Twitter*

*.*#### Make a one-time donation

#### Make a monthly donation

#### Make a yearly donation

Choose an amount

Or enter a custom amount

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

Buy Me a CoffeeBuy Me a CoffeeBuy Me a Coffee