Beginner’s Guide to NumPy for Data Science

Data Science

Learn about NumPy from scratch

Photo by Black ice from Pexels

NumPy

NumPy stands for Numerical Python.NumPy is one of the powerful python libraries that support large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.

In this article, let’s learn about the basics of NumPy needed for DataScience.

Table of Content

  1. Different Ways to Create NumPy Arrays
  2. Python List vs NumPy Arrays
  3. Attributes of NumPy Array
  4. Indexing and Slicing NumPy Array
  5. NumPy Axis in Detail
  6. Operations on NumPy Arrays.

Different Ways to Create NumPy Arrays

1. Creating NumPy arrays from a python list

  • How to create a vector from python lists?
    A vector is a one-dimensional array.

https://gist.github.com/IndhumathyChelliah/75062b88f6418131eee4b8494ea67b9d

  • How to create matrices from a list of lists?

To create an n-dimensional array from the python list, the list should be 
n-level deep.

https://gist.github.com/IndhumathyChelliah/4f683c9f1b7bebc49e093ff83138d3f1

 

Creating NumPy arrays from Python Lists

Similarly, we can create NumPy arrays from python tuples using np. array()

2. Creating Numpy arrays using arange function

np.arange(start,stop,step) → Returns an array in the given range. It is similar to the Python range() function.

https://gist.github.com/IndhumathyChelliah/0ba2c0fad3f22b30ebb03d8d89c9a217

np.arange() will return a one-dimensional array. 
To convert it to a multi-dimensional array, reshape() function is used.

https://gist.github.com/IndhumathyChelliah/b458cb06772d7ea188a28a9ed239a451

Difference between range() and arange() function

range() → Step can’t be a float number. It returns a range object which can be accessed like python list.
arange() → Step can be a float number. It returns a NumPy array.

 

Using arange()

3. Creating a NumPy array using np.linspace()

np.linspace(start,stop,n) → Returns an array containing an evenly spaced n number of elements in the given range.

https://gist.github.com/IndhumathyChelliah/305dd74d8ac0abe6a93b2ce4690be680

Like arange(), linspace() also returns an one-dimensional array.
To convert it to a multi-dimensional array, reshape() function is used.

https://gist.github.com/IndhumathyChelliah/2697e386e097df8fa2f920c2c8651848

 

Using linspace()

Difference between arange() and linspace()

 

arange() vs linspace() [Image by Author]

4. Creating an array of 0’s, an array of 1’s

np.zeros(shape,dtype) → Return an array of given shape and type filled with zeros.

By default, it will create a float array

https://gist.github.com/IndhumathyChelliah/9e52b69e95f192a9cbb73403acfec414

np.ones(shape,dtype) → Return an array of given shape and type filled with ones.

https://gist.github.com/IndhumathyChelliah/8a8816bc61036199ebd09c42628bc0ca

 

Creating an array of 0’s and 1’s

5. Creating an array of random numbers

np.random.random(shape) → Returns an array of given shape filled with random numbers between o and 1

https://gist.github.com/IndhumathyChelliah/e686fe8965498d1d7711bc9cd863d554

6. Creating an array filled with the number ‘n’

np.full(shape,n) →Returns an array of given shape filled with number ‘n’

https://gist.github.com/IndhumathyChelliah/2e2caf0c3b34b87e617e9ec58b25b0b8


Python List vs NumPy Array

  1. Python List is heterogeneous. It can contain different data types.
    NumPy array is homogenous. It contains the same datatypes. The commonly used array is int array and float array.
  2. In NumPy Array, we can perform an element-wise operation.

 

Operations on NumpyArray vs Python Lists

https://gist.github.com/IndhumathyChelliah/82cc222a767cb1c19889c9ed9e68c043

3. NumPy array is faster than Python lists. Numpy Arrays are fixed in size whereas Python List can change in size.


Attributes of NumPy Array

If we create a NumPy array from a csv file or if we work with large NumPy arrays, we can get the information about the array by using some of the attributes of the NumPy array.

np.shape → Returns the shape of the array

np.dtype →Returns the data type

np.ndim →Returns the number of dimensions of the array

https://gist.github.com/IndhumathyChelliah/3bfb0375609eb9458b16ca857086d720


Indexing and Slicing NumPy Arrays

Indexing and Slicing NumPy array are similar to Python List Indexing and Slicing.

We can access the elements from an array by mentioning a particular index of an element, list of index, slice or slice with step.

 

Indexing Numpy Array
  1. Accessing elements from a one-dimensional array

 

Accessing elements from one-dimensional Numpy Array [Image by Author]

https://gist.github.com/IndhumathyChelliah/77292af6640396d8a1ecb196e791c9cf

2. Accessing elements from a 2-dimensional array

 

Accessing elements from 2-D dimensional array[Image by Author]

All arrays generated by basic slicing are always “views” of the original array.
Views → An array that does not own its data, but refers to another array’s data instead.

https://gist.github.com/IndhumathyChelliah/8c1539970d27dbcba3c7212ea603ade5

3. Accessing elements from a 3-Dimensional array

 

Accessing elements from a 3-D array

https://gist.github.com/IndhumathyChelliah/5597a14e3e0057480892d665081242e0

Boolean Indexing

Boolean Indexing → We can access and modify data using certain conditions.

 

Boolean Indexing [Image by Author]

https://gist.github.com/IndhumathyChelliah/31aade5af9d90ec10950269910ac8365


Operations on NumPy Arrays

  1. Arithmetic Operations on NumPy Arrays
  2. Statistical Methods on Numpy
  3. Sorting NumPy Arrays
  4. Set Operations on NumPy Arrays
  5. Transposing Arrays
  6. Stacking NumPy Arrays
  7. Broadcasting NumPy Arrays

Arithmetic Operations on NumPy Arrays

  1. Addition

Using + or np.add()

NumPy performs element-wise addition. So, we can perform the addition only if two arrays have the same dimensions.

https://gist.github.com/IndhumathyChelliah/5d01e4886917e0dea3593f530cd4410c

2. Subtraction

Using -or np.subtract()

https://gist.github.com/IndhumathyChelliah/3fdd3c56eedcc946ba751f99c1d009f2

3. Multiplication

Using * or np.multiply()

https://gist.github.com/IndhumathyChelliah/62b0551e119bfac8c8912d77eaaf590a

4. Division

Using / or np.divide()

https://gist.github.com/IndhumathyChelliah/d96d1af8775e4e91fc1990a404f72d9a

5.Square root

np.sqrt() → Calculating square root for all elements in the array.

https://gist.github.com/IndhumathyChelliah/80eb8a8bbe51185de937eaa29f33fcde

6.Exponential

np.exp() →Calculating exponential for all elements in the array.

https://gist.github.com/IndhumathyChelliah/cdb292e7a51ca7d4d8416230e768f218

NumPy Axis in Detail

Before going into statistical methods, sorting we will learn about the NumPy axis. It’s little bit tricky.

3-Dimensional arrays

Let’s take a 3-D array of shape (2,2,3) which indicates 2 →matrices/array, 2 → rows, 3 → columns.

 

3-D array [Image by Author]

If we sum across axis=0 means, it will collapse the specified axis — here in a 3D array, it will collapse the matrices/array. So the shape of the sum matrix will be (2,3). The corresponding axis will be collapsed and that axis will be lost in the sum matrix.

 

The shape of the sum matrix after collapsing the particular axis [Image by Author]

If we sum across axis=0, it will collapse the matrices/array. It’s done by doing element-wise addition.

 

Understanding axis in 3-D array[Image by Author]

If we sum across axis=1, it will collapse the row axis. It’s done by doing column-wise addition.

For axis=2, it will collapse the column axis. It’s done by doing row-wise addition.

 

Image by Author

So, in NumPy, we have to remember axis parameter like “It will collapse along the specified axis”

2-D array

 

Understanding axis on 2-D Arrays [Image by Author]

To know more about the Numpy axis, refer to this medium article “Understanding NumPy sum” written by Kshitij Bajracharya.


Statistical Methods on NumPy

We can compute statistical methods like mean, median, sum on Numpy Arrays.

np.mean(a) → Computes the mean of an array a
np.median(a) → Computes the median of an array a
np.sum(a) → Computes the sum of array a

np.mean(a,axis=0) → If we mention axis=0, column_wise computation. 
np.mean(a,axis=1) → If we mention axis=1, row-wise computation.

https://gist.github.com/IndhumathyChelliah/cce0ef3fa859073294d7822a27b112d9


Sorting NumPy Arrays

ndarray.sort() → It will sort the original array
np.sort(ndarray) → It will return the sorted copy of the array

To sort along the particular axis, have to mention the parameter axis

https://gist.github.com/IndhumathyChelliah/b268bc0087afe02b297f86a4a83a8781


Set Operations on NumPy Arrays

Set operations can be done on 1-D NumPy Arrays

https://gist.github.com/IndhumathyChelliah/e63642f1bd6ae4de089c89a461cc108b


Transposing NumPy Array

https://gist.github.com/IndhumathyChelliah/7497f058647721b18c4007504dbeafd2

Stacking NumPy Arrays

np.hstack(a1,a2) → Horizontal Stacking. Two arrays should have the same number of rows

np.vstack(a1,a2) → Vertical Stacking. Two arrays should have the same number of columns.

 

Stacking Arrays [Image by Author]

https://gist.github.com/IndhumathyChelliah/c1865d08030ad9746f774417a2b9032b


Broadcasting NumPy Arrays

In NumPy, arithmetic operations are done element-wise. To achieve this, arrays should be of the same size/shape.

To perform arithmetic operations on arrays of different sizes/shapes, broadcasting is used.

Imagine that broadcasting means stretching the array to the required shape/size to perform arithmetic operations on it.

Broadcasting Rules:

  1. The size of each dimension should be the same.
  2. The size of one of the dimensions should be one.

Scenario 1: Dimensions of both the arrays are the same

Example 1: Two-dimensional arrays
Let’s add two 2-D arrays (a1,a2) of different shapes. 
The shape of a1 (2,1)
The shape of a2 (2,2)

a1+a2 → We can perform addition because the broadcast rule matches.

  • The first axis is the same
  • The second axis is 1 in one of the arrays (a1)
    [If the axis are not the same, one of them should be 1]

 

How to check broadcast rules [Image by Author]

https://gist.github.com/IndhumathyChelliah/e4b76cfdb637160c147530d4170838a4

Example 2: Three-dimensional arrays

Let’s add two 3-D arrays (a1,a2) of different shapes. 
The shape of a1 (2,3,1)
The shape of a2 (2,3,2)

a1+a2 → We can perform addition because the broadcast rule matches.

  • The first axis is the same
  • The second axis is the same
  • The third axis is 1 in one of the arrays (a1)
    [If the axis are not the same, one of them should be 1]

 

Checking broadcast rules on three-dimensional arrays [Image by Author]

https://gist.github.com/IndhumathyChelliah/6e9bf91bffcb63af561f873c7e779abb

Scenario 2: Dimensions of both the arrays are different

If the dimensions of the two arrays are different, then the shape of one with fewer dimensions is padded with ones on its leading side (left side)

Example: Let’s add two arrays of different dimensions.

The shape of a1 → (3,2)
The shape of a2 →(2,3,2)

Since dimensions of both arrays (a1,a2) are different, then the shape of a1 (a1 having fewer dimensions than a2) is padded with ones on its leading side (left side)

The shape of a1 becomes → (1,3,2) 
[a1 only has fewer dimensions than a2]

Now check the broadcast rules in a1,a2

  • The first axis is 1 on one of the arrays (a1)
    [If the axis are not the same, one of them should be 1]
  • The second axis is the same
  • The third axis is the same

Now we can perform a1+a2 → because it matches broadcast rules.

 

Broadcasting on arrays having different dimensions [Image by Author]

https://gist.github.com/IndhumathyChelliah/42e820eab093519802fcf5e1ca889a9d

If we need to perform arithmetic operations on two arrays of different shapes, then it should match broadcast rules. Otherwise, it will throw an error. (Not able to perform arithmetic operations)

Example:
The shape of a1 (2,3)
The shape of a2 (2,4)

We can’t perform a1+a2, because it doesn’t match broadcast rules.

[Second axis is different and it is not 1 in one of the arrays]


Conclusion

In this article, I have covered most of the standard NumPy operations required to kickstart your journey in DataScience.


My Blog on Pandas

View at Medium.com


Watch this space for more articles on Python and DataScience. If you like to read more of my tutorials, follow me on Medium, LinkedIn, Twitter.

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

Buy Me a CoffeeBuy Me a CoffeeBuy Me a Coffee

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s