An Introduction to Python Pandas

Let’s learn about creating data frames and accessing rows and columns in dataframes

Photo by chuttersnap on Unsplash

How to create DataFrame and Access rows and columns from DataFrame.

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of Python programming language.

Installing Pandas Library:

pip install pandas

Installing Jupyter notebook.

By using jupyter notebook, it will be more easy to visualize data

pip install jupyterlab

  1. Creating DataFrames
  2. Attributes and Methods
  3. Accessing
  4. Slicing

Creating DataFrames.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

By using pandas, we can read data from different file formats like csv,json,sql,excel etc.

How to read data from CSV file into Pandas DataFrame.

First we have to import pandas.

import pandas as pd

Then we have to specify the path of csv file.

df=pd.read_csv(‘C:datadeveloper.csv’)

Similarly if we have to read data from json file, we have to specify the path of json file.

df1=pd.read_json(‘dataDeveloper.json’)

https://gist.github.com/IndhumathyChelliah/bc74559c6d886bcba4a13e3b31b67a12

Creating DataFrame from dictionary.

We can also create Dataframe from dictionary.

import pandas as pd
developer={‘firstname’:[‘Indhu’,’Karthi’,’Sarvesh’],
 ‘lastname’:[‘mathy’,’Palani’,’Palani’],
 ‘EmpId’:[12,15,21],
 ‘Pay’:[5000,10000,15000],
 ‘Skill’:[‘Python,SQL’,’Java,Hadoop’,’C,Java’]}
df=pd.DataFrame(developer)

https://gist.github.com/IndhumathyChelliah/a538048aac3effdbd9d82e974f327dae

Attributes and methods:

df.shape — Returns the number of columns and rows.

df.info() -Returns the information of all columns and data types.

df.head()-Returns the top 5 rows by default.We can also specify the number of rows needed.

df.head(1) -Returns the top 1 row

df.tail()-Returns the bottom 5 rows by default. We can also specify the number of rows needed.

df.tail(1)-Returns the bottom 1 row.

df.columns– Returns list of all column names.

Index(['firstname', 'lastname', 'EmpId', 'Pay', 'Skill'], dtype='object')

Accessing Data from DataFrame:

We can access data from DataFrame. We can access rows and columns.

Accessing columns from DataFrame:

  1. We can access column by two ways.
df.firstname
df[‘firstname’]

If we specify, one column name,it will return Series object.

Series:

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).Series contain rows of single column.

If we access ‘ firstname’ column alone, it return Series object containing rows of that column.Both df[‘firstname’] and df.firstname returns the same.

df[‘firstname’]
Output:Series Object
0      Indhu
1     Karthi
2    Sarvesh
Name: firstname, dtype: object
df.firstname
Output:
0      Indhu
1     Karthi
2    Sarvesh
Name: firstname, dtype: object

2. We can access two or more column by specifying the list of column names. More than one column will return a DataFrame-which is filtered down DataFrame. DataFrame is a two dimensional array.

df[[‘firstname’,’lastname’]]
Output:
firstname lastname
0 Indhu mathy
1 Karthi Palani
2 Sarvesh Palani

Accessing rows from DataFrame:

We can access rows from DataFrame by 2 ways.

  • loc
  • iloc

loc- Accessing single or multiple rows by using integer — index based

iloc-Accessing single or multiple rows by using label-(row or column label)

iloc

df.iloc[0] — Returns row 0. It sets column as index.Returns Series object that contains value of first row data.

df.iloc[0]
Output:
firstname         Indhu
lastname          mathy
EmpId                12
Pay                5000
Skill        Python,SQL
Name: 0, dtype: object

df.iloc[[0,1]]- Returns row 0, row 1

df.iloc[[0,1]]
Output:
firstname lastname EmpId Pay Skill
0 Indhu mathy 12 5000 Python,SQL
1 Karthi Palani 15 10000 Java,Hadoop

df.iloc[[0,1],2] — We can specify rows and columns also.

0,1- represents row 0 ,row 1.

2 — represents second column.

It will return second column “EmpID” of row 0 and row 1.

df.iloc[[0,1],2]
Output:
0 12
1 15
Name: EmpId, dtype: int64

loc

df.loc[0]-Returns row 0 data.It sets column as index.Returns Series object that contains value of row 0 data.

df.loc[0]
Output:
firstname Indhu
lastname mathy
EmpId 12
Pay 5000
Skill Python,SQL
Name: 0, dtype: object

df.loc[[0,1]]- Returns row 0, row 1.

df.loc[[0,1]]
Output:
firstname lastname EmpId Pay Skill
0 Indhu mathy 12 5000 Python,SQL
1 Karthi Palani 15 10000 Java,Hadoop

We can access specific columns alone from specific rows.

df.loc[[0,1],['firstname','lastname']]
Output:
firstname lastname
0     Indhu    mathy
1    Karthi   Palani

Slicing:

Slicing using the [] operator selects a set of rows and/or columns from a DataFrame. To slice out a set of rows, you use the following syntax: data[start:stop].

df.iloc[0:2]
Output:
  firstname lastname  EmpId    Pay        Skill
0     Indhu    mathy     12   5000   Python,SQL
1    Karthi   Palani     15  10000  Java,Hadoop

df.iloc[:] — Returns all rows.

df.iloc[:1]-Returns Row 0

df.iloc[1:]-Returns row 1 and row 2.

Resources:

https://pandas.pydata.org/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s