Data Science
Learn about NumPy from scratch

NumPy
NumPy stands for Numerical Python.NumPy is one of the powerful python libraries that support large multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.
In this article, let’s learn about the basics of NumPy needed for DataScience.
Table of Content
- Different Ways to Create NumPy Arrays
- Python List vs NumPy Arrays
- Attributes of NumPy Array
- Indexing and Slicing NumPy Array
- NumPy Axis in Detail
- Operations on NumPy Arrays.
Different Ways to Create NumPy Arrays
1. Creating NumPy arrays from a python list
- How to create a vector from python lists?
A vector is a one-dimensional array.
https://gist.github.com/IndhumathyChelliah/75062b88f6418131eee4b8494ea67b9d
- How to create matrices from a list of lists?
To create an n-dimensional array from the python list, the list should be
n-level deep.
https://gist.github.com/IndhumathyChelliah/4f683c9f1b7bebc49e093ff83138d3f1

Similarly, we can create NumPy arrays from python tuples using
np. array()
2. Creating Numpy arrays using arange function
np.arange(start,stop,step)
→ Returns an array in the given range. It is similar to the Python range() function.
https://gist.github.com/IndhumathyChelliah/0ba2c0fad3f22b30ebb03d8d89c9a217
np.arange() will return a one-dimensional array.
To convert it to a multi-dimensional array, reshape()
function is used.
https://gist.github.com/IndhumathyChelliah/b458cb06772d7ea188a28a9ed239a451
Difference between range() and arange() function
range()
→ Step can’t be a float number. It returns a range object which can be accessed like python list.arange()
→ Step can be a float number. It returns a NumPy array.

3. Creating a NumPy array using np.linspace()
np.linspace(start,stop,n)
→ Returns an array containing an evenly spaced n
number of elements in the given range.
https://gist.github.com/IndhumathyChelliah/305dd74d8ac0abe6a93b2ce4690be680
Like arange(), linspace() also returns an one-dimensional array.
To convert it to a multi-dimensional array, reshape()
function is used.
https://gist.github.com/IndhumathyChelliah/2697e386e097df8fa2f920c2c8651848

Difference between arange() and linspace()

4. Creating an array of 0’s, an array of 1’s
np.zeros(shape,dtype)
→ Return an array of given shape and type filled with zeros.
By default, it will create a float array
https://gist.github.com/IndhumathyChelliah/9e52b69e95f192a9cbb73403acfec414
np.ones(shape,dtype)
→ Return an array of given shape and type filled with ones.
https://gist.github.com/IndhumathyChelliah/8a8816bc61036199ebd09c42628bc0ca

5. Creating an array of random numbers
np.random.random(shape)
→ Returns an array of given shape filled with random numbers between o and 1
https://gist.github.com/IndhumathyChelliah/e686fe8965498d1d7711bc9cd863d554
6. Creating an array filled with the number ‘n’
np.full(shape,n)
→Returns an array of given shape filled with number ‘n’
https://gist.github.com/IndhumathyChelliah/2e2caf0c3b34b87e617e9ec58b25b0b8

Python List vs NumPy Array
- Python List is heterogeneous. It can contain different data types.
NumPy array is homogenous. It contains the same datatypes. The commonly used array is int array and float array. - In NumPy Array, we can perform an element-wise operation.

https://gist.github.com/IndhumathyChelliah/82cc222a767cb1c19889c9ed9e68c043
3. NumPy array is faster than Python lists. Numpy Arrays are fixed in size whereas Python List can change in size.
Attributes of NumPy Array
If we create a NumPy array from a csv file or if we work with large NumPy arrays, we can get the information about the array by using some of the attributes of the NumPy array.
np.shape
→ Returns the shape of the array
np.dtype
→Returns the data type
np.ndim
→Returns the number of dimensions of the array
https://gist.github.com/IndhumathyChelliah/3bfb0375609eb9458b16ca857086d720
Indexing and Slicing NumPy Arrays
Indexing and Slicing NumPy array are similar to Python List Indexing and Slicing.
We can access the elements from an array by mentioning a particular index of an element, list of index, slice or slice with step.

- Accessing elements from a one-dimensional array

https://gist.github.com/IndhumathyChelliah/77292af6640396d8a1ecb196e791c9cf
2. Accessing elements from a 2-dimensional array

All arrays generated by basic slicing are always “views” of the original array.
Views → An array that does not own its data, but refers to another array’s data instead.
https://gist.github.com/IndhumathyChelliah/8c1539970d27dbcba3c7212ea603ade5
3. Accessing elements from a 3-Dimensional array


https://gist.github.com/IndhumathyChelliah/5597a14e3e0057480892d665081242e0
Boolean Indexing
Boolean Indexing → We can access and modify data using certain conditions.

https://gist.github.com/IndhumathyChelliah/31aade5af9d90ec10950269910ac8365
Operations on NumPy Arrays
- Arithmetic Operations on NumPy Arrays
- Statistical Methods on Numpy
- Sorting NumPy Arrays
- Set Operations on NumPy Arrays
- Transposing Arrays
- Stacking NumPy Arrays
- Broadcasting NumPy Arrays
Arithmetic Operations on NumPy Arrays
- Addition
Using +
or np.add()
NumPy performs element-wise addition. So, we can perform the addition only if two arrays have the same dimensions.
https://gist.github.com/IndhumathyChelliah/5d01e4886917e0dea3593f530cd4410c
2. Subtraction
Using -
or np.subtract()
https://gist.github.com/IndhumathyChelliah/3fdd3c56eedcc946ba751f99c1d009f2
3. Multiplication
Using *
or np.multiply()
https://gist.github.com/IndhumathyChelliah/62b0551e119bfac8c8912d77eaaf590a
4. Division
Using / or np.divide()
https://gist.github.com/IndhumathyChelliah/d96d1af8775e4e91fc1990a404f72d9a
5.Square root
np.sqrt()
→ Calculating square root for all elements in the array.
https://gist.github.com/IndhumathyChelliah/80eb8a8bbe51185de937eaa29f33fcde
6.Exponential
np.exp()
→Calculating exponential for all elements in the array.
https://gist.github.com/IndhumathyChelliah/cdb292e7a51ca7d4d8416230e768f218
NumPy Axis in Detail
Before going into statistical methods, sorting we will learn about the NumPy axis. It’s little bit tricky.
3-Dimensional arrays
Let’s take a 3-D array of shape (2,2,3) which indicates 2 →matrices/array, 2 → rows, 3 → columns.

If we sum across axis=0 means, it will collapse the specified axis — here in a 3D array, it will collapse the matrices/array. So the shape of the sum matrix will be (2,3). The corresponding axis will be collapsed and that axis will be lost in the sum matrix.

If we sum across axis=0, it will collapse the matrices/array. It’s done by doing element-wise addition.

If we sum across axis=1, it will collapse the row axis. It’s done by doing column-wise addition.
For axis=2, it will collapse the column axis. It’s done by doing row-wise addition.

So, in NumPy, we have to remember axis parameter like “It will collapse along the specified axis”
2-D array

To know more about the Numpy axis, refer to this medium article “Understanding NumPy sum” written by Kshitij Bajracharya.
Statistical Methods on NumPy
We can compute statistical methods like mean, median, sum on Numpy Arrays.
np.mean(a)
→ Computes the mean of an array a
np.median(a)
→ Computes the median of an array a
np.sum(a)
→ Computes the sum of array a
np.mean(a,axis=0)
→ If we mention axis=0, column_wise computation. np.mean(a,axis=1)
→ If we mention axis=1, row-wise computation.
https://gist.github.com/IndhumathyChelliah/cce0ef3fa859073294d7822a27b112d9
Sorting NumPy Arrays
ndarray.sort()
→ It will sort the original arraynp.sort(ndarray)
→ It will return the sorted copy of the array
To sort along the particular axis, have to mention the parameter axis

https://gist.github.com/IndhumathyChelliah/b268bc0087afe02b297f86a4a83a8781
Set Operations on NumPy Arrays
Set operations can be done on 1-D NumPy Arrays
https://gist.github.com/IndhumathyChelliah/e63642f1bd6ae4de089c89a461cc108b
Transposing NumPy Array
https://gist.github.com/IndhumathyChelliah/7497f058647721b18c4007504dbeafd2
Stacking NumPy Arrays
np.hstack(a1,a2) → Horizontal Stacking. Two arrays should have the same number of rows
np.vstack(a1,a2) → Vertical Stacking. Two arrays should have the same number of columns.

https://gist.github.com/IndhumathyChelliah/c1865d08030ad9746f774417a2b9032b
Broadcasting NumPy Arrays
In NumPy, arithmetic operations are done element-wise. To achieve this, arrays should be of the same size/shape.
To perform arithmetic operations on arrays of different sizes/shapes, broadcasting is used.
Imagine that broadcasting means stretching the array to the required shape/size to perform arithmetic operations on it.
Broadcasting Rules:
- The size of each dimension should be the same.
- The size of one of the dimensions should be one.
Scenario 1: Dimensions of both the arrays are the same
Example 1: Two-dimensional arrays
Let’s add two 2-D arrays (a1,a2) of different shapes.
The shape of a1 (2,1)
The shape of a2 (2,2)
a1+a2 → We can perform addition because the broadcast rule matches.
- The first axis is the same
- The second axis is 1 in one of the arrays (a1)
[If the axis are not the same, one of them should be 1]

https://gist.github.com/IndhumathyChelliah/e4b76cfdb637160c147530d4170838a4
Example 2: Three-dimensional arrays
Let’s add two 3-D arrays (a1,a2) of different shapes.
The shape of a1 (2,3,1)
The shape of a2 (2,3,2)
a1+a2 → We can perform addition because the broadcast rule matches.
- The first axis is the same
- The second axis is the same
- The third axis is 1 in one of the arrays (a1)
[If the axis are not the same, one of them should be 1]

https://gist.github.com/IndhumathyChelliah/6e9bf91bffcb63af561f873c7e779abb
Scenario 2: Dimensions of both the arrays are different
If the dimensions of the two arrays are different, then the shape of one with fewer dimensions is padded with ones on its leading side (left side)
Example: Let’s add two arrays of different dimensions.
The shape of a1 → (3,2)
The shape of a2 →(2,3,2)
Since dimensions of both arrays (a1,a2) are different, then the shape of a1 (a1 having fewer dimensions than a2) is padded with ones on its leading side (left side)
The shape of a1 becomes → (1,3,2)
[a1 only has fewer dimensions than a2]
Now check the broadcast rules in a1,a2
- The first axis is 1 on one of the arrays (a1)
[If the axis are not the same, one of them should be 1] - The second axis is the same
- The third axis is the same
Now we can perform a1+a2 → because it matches broadcast rules.

https://gist.github.com/IndhumathyChelliah/42e820eab093519802fcf5e1ca889a9d
If we need to perform arithmetic operations on two arrays of different shapes, then it should match broadcast rules. Otherwise, it will throw an error. (Not able to perform arithmetic operations)
Example:
The shape of a1 (2,3)
The shape of a2 (2,4)
We can’t perform a1+a2, because it doesn’t match broadcast rules.
[Second axis is different and it is not 1 in one of the arrays]
Conclusion
In this article, I have covered most of the standard NumPy operations required to kickstart your journey in DataScience.
My Blog on Pandas
Watch this space for more articles on Python and DataScience. If you like to read more of my tutorials, follow me on Medium, LinkedIn, Twitter.
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
Buy Me a CoffeeBuy Me a CoffeeBuy Me a Coffee