Sorting a Python Pandas DataFrames by Index and Value

Better sorting

Photo by Sharon McCutcheon from Pexels

Sorting Pandas Dataframes

Pandas dataframes can be sorted by index and values:

Image by Author

We can sort pandas dataframes by row values/column values. Likewise, we can also sort by row index/column index.

Image by Author

Pandas DataFrame Sorted by Values

DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
Image Source: Author

Return Type is DataFrame or None.

If sorted inplace return type is None, otherwise DataFrame.

1. Sorting dataframe by one column

Creating DataFrame by reading from the CSV file.

import pandas as pd
df=pd.read_csv("C:pandas_experimentpandas_sortingdata1.csv")
df
dataframe

Now, Sorting the dataframe by values in one column(EmpId).

df.sort_values('EmpID')
Sorting dataframe by values in “EmpID”

By default, it will sort in ascending order. The index also will be maintained.
The return type is a dataframe. It won’t modify the original dataframe.

2. Sorting dataframe by one column in descending order

Sorting the dataframe by column EmpID in descending order. Have to mention ascending=False.

The return type is a dataframe. It won’t modify the original dataframe.

df.sort_values(‘EmpID’,ascending=False)

Sorting df by values in “EmpID” in desc order

3. Sorting dataframe by two columns

Sorting the dataframe by column Skill and EmpID. If values in column Skill is the same then it will sort based on values in the second column EmpID

df.sort_values(by=['Skill','EmpID'])
Sorting df based on values in “Skill” and “EmpID”

First, it will sort based on values in “Skill” column. Since the values JavaScript in column Skill is the same then it will sort again by values in EmpID in ascending order.

The return type is a dataframe. It won’t modify the original dataframe.

4. Sorting dataframe by two columns in a different order

Sorting the dataframe by column Skill and EmpID. One column in ascending and the other in descending order.

df.sort_values(by=['Skill','EmpID'],ascending=[True,False])
Sorting df by values in two columns in two diff order

First, dataframe will be sorted based on values in the“Skill” column in ascending order. Since the values JavaScript in column Skill is the same then it will sort by values in EmpID in descending order.

The return type is a dataframe. It won’t modify the original dataframe.

5. Sorting dataframe by putting NaN first

Sorting the dataframe by putting missing values NaN first.Have to mention the parameter na_position=”first”

df.sort_values("Salary",na_position='first')
Sorting df by putting missing values (NaN) first

The return type is a dataframe. It won’t modify the original dataframe.

6. Sorting dataframe inplace

If the parameter inplace is set to True(inplace=True), it will perform the operation inplace. Modify the original dataframe itself. The return type is None.

df.sort_values("Salary",inplace=True)
df
Sorting inplace

7. Sorting dataframe by ignoring index.

Sorting the dataframe will maintain the same index. If we want to ignore the index and then have to mention ignore_index=True
The resulting axis will be labeled 0, 1, …, n-1.

df.sort_values("Salary",ignore_index=True)
Sorting dataframe by ignoring index.

8. Sorting dataframe by using the key function

Apply the key function to the values before sorting. This is similar to the key argument in the built-in sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently. –Python docs

df.sort_values(by=["Skill"],key=lambda x:x.str.lower())
Sorting with key function

The key function (str.lower()) is applied to all values in “Skill” column and then sorted. If containing the same values, sorted based on the row index.

If the key function is not called on “Skill” column, upper case letters will be sorted first.

df.sort_values(by=["Skill"])
Sorting df by values in “Skill” column without key function

9. Sorting dataframe by string length

Sorting a dataframe by string length. The key function is given as
 key=lambda x:x.str.len() which will calculate the length of the values in the given column (“SKill”). Based on the length of the values in “Skill” column, it will sort the dataframe.

df.sort_values(by=["Skill"],key=lambda x:x.str.len())
Sorting dataframe based on string length in “Skill” column

10. Sorting dataframe by row values

We can sort dataframes by specific row values also.

Creating a dataframe containing numbers.

df = pd.DataFrame(data={'x':[10,30,20], 'y':[1,2,3],'z':[5,15,10]})
df
Dataframe

We can sort the dataframe based on the values in a specific row. Let’s sort row 0 By default, it will sort the values in the row in ascending order.

row to be sorted
df.sort_values(by=0,axis=1)

by=0 indicates row 0.

axis=1 indicates the column to be sorted.

Sorting dataframe based on row0 values

If the row to be sorted contains different data types, it will raise TypeError.

df1 = pd.DataFrame(data={'x':[10,10,"d"], 'y':['a',5,'c'],'z':[5,15,'a']})
df1
Dataframe df1

If we sort the dataframe by row0, it will raise TypeError. row 0 contains string and int.

sorting by row 0
df1.sort_values(by=0,axis=1)

Output:
 TypeError: ‘<’ not supported between instances of ‘str’ and ‘int’

But we can sort the dataframe df1 based on row1 and row2. Row1 contains int only and row 2 contains str only.

row 1 and row2
df1.sort_values(by=2,axis=1)
Sorting Dataframe by row2

Pandas.DataFrame.sort_index

DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

Sort objects by labels (along an axis).

Returns a new DataFrame sorted by the label if inplace argument is False, otherwise updates the original DataFrame and returns None.

1. Sorting dataframe by row index

Creating a dataframe.

df=pd.read_csv("C:pandas_experimentpandas_sortingdata2.csv") 
df

To set EmpIDcolumn as row index.

df.set_index('EmpID',inplace=True)
EmpId → row-index

We can sort the data by row index and also by column index.

row index
df.sort_index()
or
df.sort_index(axis=0)

Both are the same. axis=0 means sort by row index.

Sorted by row_index

By default, it will sort in ascending order.

2. Sorting dataframe by column index

Sorting a dataframe by column index by mentioning axis=1.

column index
df.sort_index(axis=1)
sorted by column_index

By default, it will be sorted in ascending order.

3. Sorting dataframe having multi-index

Creating a dataframe and setting two-column as row_index.

df=pd.read_csv("C:pandas_experimentpandas_sortingdata2.csv") 
df.set_index(["EmpID","Age"],inplace=True)
dataframe having multi_index

Now we can sort the dataframe by row_index EmpID and Age.
If level=0 is given, it will sort based on row_index “EmpID”

df.sort_index(level=0)
Sorted only based on row_index “EmpID”

If level=1 means it will sort the dataframe based on row_index “Age”

By default,sort_remaining=True, that means if sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.

In this example, it will sort by level 1 (“Age”), if values are the same means then it will sort by another index (“EmpID”)

df.sort_index(level=1)
Sorting by level,multi-index

4. Sorting dataframe having multi-index by one level only

By mentioning one level and sort_remaining=False, we can sort the dataframe having multi-index by one level only.

df.sort_index(level="Age",sort_remaining=False)
Sorting by one level only,multi_index df

In this example,EmpID index is not sorted. Dataframe is sorted only based on “Age” index.

5. Sorting dataframe by index in descending order

If we mention ascending=False, it will sort the dataframe in descending order based on the index mentioned.

df.sort_index(axis=0,ascending=False)
Sorting dataframe by row-index in desc order

Conclusion:

  • Sorting pandas dataframes will return a dataframe with sorted values if inplace=False.Otherwise if inplace=True, it will return None and it will modify the original dataframe itself.
  • By default, all sorting done in ascending order only. If we mention, ascending=Falseit will sort in descending order.

GitHub Link:

My jupyter notebook and CSV file used in the code can be downloaded from my GitHub link.

My other blogs on Pandas:

Indexing and Slicing Python Pandas Dataframe

Resources(Python documentation):

pandas.DataFrame.sort_values

pandas.DataFrame.sort_index

multi_column sorting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s