# Beginner’s Guide to NumPy for Data Science

### NumPy

NumPy stands for Numerical Python.NumPy is one of the powerful python libraries that support large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.

### Table of Content

1. Different Ways to Create NumPy Arrays
2. Python List vs NumPy Arrays
3. Attributes of NumPy Array
4. Indexing and Slicing NumPy Array
5. NumPy Axis in Detail
6. Operations on NumPy Arrays.

### Different Ways to Create NumPy Arrays

#### 1. Creating NumPy arrays from a python list

• How to create a vector from python lists?
A vector is a one-dimensional array.

https://gist.github.com/IndhumathyChelliah/75062b88f6418131eee4b8494ea67b9d

• How to create matrices from a list of lists?

To create an n-dimensional array from the python list, the list should be
n-level deep.

https://gist.github.com/IndhumathyChelliah/4f683c9f1b7bebc49e093ff83138d3f1

Similarly, we can create NumPy arrays from python tuples using` np. array()`

2. Creating Numpy arrays using arange function

`np.arange(start,stop,step)` → Returns an array in the given range. It is similar to the Python range() function.

np.arange() will return a one-dimensional array.
To convert it to a multi-dimensional array, `reshape()` function is used.

https://gist.github.com/IndhumathyChelliah/b458cb06772d7ea188a28a9ed239a451

Difference between range() and arange() function

`range() `→ Step can’t be a float number. It returns a range object which can be accessed like python list.
`arange() `→ Step can be a float number. It returns a NumPy array.

3. Creating a NumPy array using np.linspace()

`np.linspace(start,stop,n)` → Returns an array containing an evenly spaced `n` number of elements in the given range.

https://gist.github.com/IndhumathyChelliah/305dd74d8ac0abe6a93b2ce4690be680

Like arange(), linspace() also returns an one-dimensional array.
To convert it to a multi-dimensional array, `reshape()` function is used.

https://gist.github.com/IndhumathyChelliah/2697e386e097df8fa2f920c2c8651848

Difference between arange() and linspace()

4. Creating an array of 0’s, an array of 1’s

`np.zeros(shape,dtype) `→ Return an array of given shape and type filled with zeros.

By default, it will create a float array

https://gist.github.com/IndhumathyChelliah/9e52b69e95f192a9cbb73403acfec414

`np.ones(shape,dtype) `→ Return an array of given shape and type filled with ones.

https://gist.github.com/IndhumathyChelliah/8a8816bc61036199ebd09c42628bc0ca

5. Creating an array of random numbers

`np.random.random(shape)` → Returns an array of given shape filled with random numbers between o and 1

https://gist.github.com/IndhumathyChelliah/e686fe8965498d1d7711bc9cd863d554

6. Creating an array filled with the number ‘n’

`np.full(shape,n)` →Returns an array of given shape filled with number ‘n’

https://gist.github.com/IndhumathyChelliah/2e2caf0c3b34b87e617e9ec58b25b0b8

### Python List vs NumPy Array

1. Python List is heterogeneous. It can contain different data types.
NumPy array is homogenous. It contains the same datatypes. The commonly used array is int array and float array.
2. In NumPy Array, we can perform an element-wise operation.

https://gist.github.com/IndhumathyChelliah/82cc222a767cb1c19889c9ed9e68c043

3. NumPy array is faster than Python lists. Numpy Arrays are fixed in size whereas Python List can change in size.

### Attributes of NumPy Array

If we create a NumPy array from a csv file or if we work with large NumPy arrays, we can get the information about the array by using some of the attributes of the NumPy array.

`np.shape` → Returns the shape of the array

`np.dtype` →Returns the data type

`np.ndim` →Returns the number of dimensions of the array

https://gist.github.com/IndhumathyChelliah/3bfb0375609eb9458b16ca857086d720

### Indexing and Slicing NumPy Arrays

Indexing and Slicing NumPy array are similar to Python List Indexing and Slicing.

We can access the elements from an array by mentioning a particular index of an element, list of index, slice or slice with step.

1. Accessing elements from a one-dimensional array

https://gist.github.com/IndhumathyChelliah/77292af6640396d8a1ecb196e791c9cf

2. Accessing elements from a 2-dimensional array

All arrays generated by basic slicing are always “views” of the original array.
Views → An array that does not own its data, but refers to another array’s data instead.

3. Accessing elements from a 3-Dimensional array

https://gist.github.com/IndhumathyChelliah/5597a14e3e0057480892d665081242e0

### Boolean Indexing

Boolean Indexing → We can access and modify data using certain conditions.

### Operations on NumPy Arrays

1. Arithmetic Operations on NumPy Arrays
2. Statistical Methods on Numpy
3. Sorting NumPy Arrays
4. Set Operations on NumPy Arrays
5. Transposing Arrays
6. Stacking NumPy Arrays

### Arithmetic Operations on NumPy Arrays

Using `+` or `np.add()`

NumPy performs element-wise addition. So, we can perform the addition only if two arrays have the same dimensions.

https://gist.github.com/IndhumathyChelliah/5d01e4886917e0dea3593f530cd4410c

2. Subtraction

Using` -`or `np.subtract()`

https://gist.github.com/IndhumathyChelliah/3fdd3c56eedcc946ba751f99c1d009f2

3. Multiplication

Using `*` or `np.multiply()`

https://gist.github.com/IndhumathyChelliah/62b0551e119bfac8c8912d77eaaf590a

4. Division

Using / or np.divide()

https://gist.github.com/IndhumathyChelliah/d96d1af8775e4e91fc1990a404f72d9a

5.Square root

`np.sqrt() `→ Calculating square root for all elements in the array.

https://gist.github.com/IndhumathyChelliah/80eb8a8bbe51185de937eaa29f33fcde

6.Exponential

`np.exp()` →Calculating exponential for all elements in the array.

https://gist.github.com/IndhumathyChelliah/cdb292e7a51ca7d4d8416230e768f218

### NumPy Axis in Detail

Before going into statistical methods, sorting we will learn about the NumPy axis. It’s little bit tricky.

#### 3-Dimensional arrays

Let’s take a 3-D array of shape (2,2,3) which indicates 2 →matrices/array, 2 → rows, 3 → columns.

If we sum across axis=0 means, it will collapse the specified axis — here in a 3D array, it will collapse the matrices/array. So the shape of the sum matrix will be (2,3). The corresponding axis will be collapsed and that axis will be lost in the sum matrix. The shape of the sum matrix after collapsing the particular axis [Image by Author]

If we sum across axis=0, it will collapse the matrices/array. It’s done by doing element-wise addition.

If we sum across axis=1, it will collapse the row axis. It’s done by doing column-wise addition.

For axis=2, it will collapse the column axis. It’s done by doing row-wise addition.

So, in NumPy, we have to remember axis parameter like “It will collapse along the specified axis”

#### 2-D array

To know more about the Numpy axis, refer to this medium article “Understanding NumPy sum” written by Kshitij Bajracharya.

### Statistical Methods on NumPy

We can compute statistical methods like mean, median, sum on Numpy Arrays.

`np.mean(a) `→ Computes the mean of an array `a`
`np.median(a) `→ Computes the median of an array `a`
`np.sum(a)` → Computes the sum of array `a`

`np.mean(a,axis=0)` → If we mention axis=0, column_wise computation.
`np.mean(a,axis=1)` → If we mention axis=1, row-wise computation.

https://gist.github.com/IndhumathyChelliah/cce0ef3fa859073294d7822a27b112d9

### Sorting NumPy Arrays

`ndarray.sort()` → It will sort the original array
`np.sort(ndarray)` → It will return the sorted copy of the array

To sort along the particular axis, have to mention the parameter axis

https://gist.github.com/IndhumathyChelliah/b268bc0087afe02b297f86a4a83a8781

### Set Operations on NumPy Arrays

Set operations can be done on 1-D NumPy Arrays

https://gist.github.com/IndhumathyChelliah/e63642f1bd6ae4de089c89a461cc108b

### Transposing NumPy Array

https://gist.github.com/IndhumathyChelliah/7497f058647721b18c4007504dbeafd2

### Stacking NumPy Arrays

np.hstack(a1,a2) → Horizontal Stacking. Two arrays should have the same number of rows

np.vstack(a1,a2) → Vertical Stacking. Two arrays should have the same number of columns.

In NumPy, arithmetic operations are done element-wise. To achieve this, arrays should be of the same size/shape.

To perform arithmetic operations on arrays of different sizes/shapes, broadcasting is used.

Imagine that broadcasting means stretching the array to the required shape/size to perform arithmetic operations on it.

1. The size of each dimension should be the same.
2. The size of one of the dimensions should be one.

#### Scenario 1: Dimensions of both the arrays are the same

Example 1: Two-dimensional arrays
Let’s add two 2-D arrays (a1,a2) of different shapes.
The shape of a1 (2,1)
The shape of a2 (2,2)

• The first axis is the same
• The second axis is 1 in one of the arrays (a1)
[If the axis are not the same, one of them should be 1]

https://gist.github.com/IndhumathyChelliah/e4b76cfdb637160c147530d4170838a4

Example 2: Three-dimensional arrays

Let’s add two 3-D arrays (a1,a2) of different shapes.
The shape of a1 (2,3,1)
The shape of a2 (2,3,2)

• The first axis is the same
• The second axis is the same
• The third axis is 1 in one of the arrays (a1)
[If the axis are not the same, one of them should be 1]

https://gist.github.com/IndhumathyChelliah/6e9bf91bffcb63af561f873c7e779abb

#### Scenario 2: Dimensions of both the arrays are different

If the dimensions of the two arrays are different, then the shape of one with fewer dimensions is padded with ones on its leading side (left side)

Example: Let’s add two arrays of different dimensions.

The shape of a1 → (3,2)
The shape of a2 →(2,3,2)

Since dimensions of both arrays (a1,a2) are different, then the shape of a1 (a1 having fewer dimensions than a2) is padded with ones on its leading side (left side)

The shape of a1 becomes → (1,3,2)
[a1 only has fewer dimensions than a2]

Now check the broadcast rules in a1,a2

• The first axis is 1 on one of the arrays (a1)
[If the axis are not the same, one of them should be 1]
• The second axis is the same
• The third axis is the same

Now we can perform a1+a2 → because it matches broadcast rules.

https://gist.github.com/IndhumathyChelliah/42e820eab093519802fcf5e1ca889a9d

If we need to perform arithmetic operations on two arrays of different shapes, then it should match broadcast rules. Otherwise, it will throw an error. (Not able to perform arithmetic operations)

Example:
The shape of a1 (2,3)
The shape of a2 (2,4)

We can’t perform a1+a2, because it doesn’t match broadcast rules.

[Second axis is different and it is not 1 in one of the arrays]

### Conclusion

In this article, I have covered most of the standard NumPy operations required to kickstart your journey in DataScience.

### My Blog on Pandas

View at Medium.com

Watch this space for more articles on Python and DataScience. If you like to read more of my tutorials, follow me on Medium, LinkedIn, Twitter.

One-Time
Monthly
Yearly

#### Make a yearly donation

Choose an amount

\$5.00
\$15.00
\$100.00
\$5.00
\$15.00
\$100.00
\$5.00
\$15.00
\$100.00

Or enter a custom amount

\$