Better sorting

Sorting Pandas Dataframes
Pandas dataframes can be sorted by index and values:

We can sort pandas dataframes by row values/column values. Likewise, we can also sort by row index/column index.

Pandas DataFrame Sorted by Values
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)

Return Type is DataFrame or None.
If sorted inplace
return type is None
, otherwise DataFrame.
1. Sorting dataframe by one column
Creating DataFrame by reading from the CSV file.
import pandas as pd df=pd.read_csv("C:pandas_experimentpandas_sortingdata1.csv") df

Now, Sorting the dataframe by values in one column(EmpId
).
df.sort_values('EmpID')

By default, it will sort in ascending order. The index also will be maintained.
The return type is a dataframe. It won’t modify the original dataframe.
2. Sorting dataframe by one column in descending order
Sorting the dataframe by column EmpID in descending order. Have to mention ascending=False
.
The return type is a dataframe. It won’t modify the original dataframe.
df.sort_values(‘EmpID’,ascending=False)

3. Sorting dataframe by two columns
Sorting the dataframe by column Skill
and EmpID
. If values in column Skill
is the same then it will sort based on values in the second column EmpID
df.sort_values(by=['Skill','EmpID'])

First, it will sort based on values in “Skill”
column. Since the values JavaScript
in column Skill
is the same then it will sort again by values in EmpID
in ascending order.
The return type is a dataframe. It won’t modify the original dataframe.
4. Sorting dataframe by two columns in a different order
Sorting the dataframe by column Skill
and EmpID
. One column in ascending and the other in descending order.
df.sort_values(by=['Skill','EmpID'],ascending=[True,False])

First, dataframe will be sorted based on values in the“Skill”
column in ascending order. Since the values JavaScript
in column Skill
is the same then it will sort by values in EmpID
in descending order.
The return type is a dataframe. It won’t modify the original dataframe.
5. Sorting dataframe by putting NaN first
Sorting the dataframe by putting missing values NaN
first.Have to mention the parameter na_position=”first”
df.sort_values("Salary",na_position='first')

The return type is a dataframe. It won’t modify the original dataframe.
6. Sorting dataframe inplace
If the parameter inplace is set to True(inplace=True
), it will perform the operation inplace. Modify the original dataframe itself. The return type is None.
df.sort_values("Salary",inplace=True) df

7. Sorting dataframe by ignoring index.
Sorting the dataframe will maintain the same index. If we want to ignore the index and then have to mention ignore_index=True
The resulting axis will be labeled 0, 1, …, n-1.
df.sort_values("Salary",ignore_index=True)

8. Sorting dataframe by using the key function
Apply the key function to the values before sorting. This is similar to the key argument in the built-in sorted()
function, with the notable difference that this key function should be vectorized. It should expect a Series
and return a Series with the same shape as the input. It will be applied to each column in by
independently. –Python docs
df.sort_values(by=["Skill"],key=lambda x:x.str.lower())

The key function (str.lower()
) is applied to all values in “Skill”
column and then sorted. If containing the same values, sorted based on the row index.
If the key function is not called on “Skill”
column, upper case letters will be sorted first.
df.sort_values(by=["Skill"])

9. Sorting dataframe by string length
Sorting a dataframe by string length. The key function is given as
key=lambda x:x.str.len()
which will calculate the length of the values in the given column (“SKill”
). Based on the length of the values in “Skill”
column, it will sort the dataframe.
df.sort_values(by=["Skill"],key=lambda x:x.str.len())

10. Sorting dataframe by row values
We can sort dataframes by specific row values also.
Creating a dataframe containing numbers.
df = pd.DataFrame(data={'x':[10,30,20], 'y':[1,2,3],'z':[5,15,10]}) df

We can sort the dataframe based on the values in a specific row. Let’s sort row 0
By default, it will sort the values in the row in ascending order.

df.sort_values(by=0,axis=1)
by=0
indicates row 0.
axis=1
indicates the column to be sorted.

If the row to be sorted contains different data types, it will raise TypeError
.
df1 = pd.DataFrame(data={'x':[10,10,"d"], 'y':['a',5,'c'],'z':[5,15,'a']}) df1

If we sort the dataframe by row0, it will raise TypeError
. row 0
contains string and int.

df1.sort_values(by=0,axis=1)
Output:
TypeError: ‘<’ not supported between instances of ‘str’ and ‘int’
But we can sort the dataframe df1 based on row1 and row2. Row1 contains int
only and row 2 contains str
only.

df1.sort_values(by=2,axis=1)

Pandas.DataFrame.sort_index
DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
Sort objects by labels (along an axis).
Returns a new DataFrame sorted by the label if inplace
argument is False
, otherwise updates the original DataFrame and returns None
.
1. Sorting dataframe by row index
Creating a dataframe.
df=pd.read_csv("C:pandas_experimentpandas_sortingdata2.csv") df

To set EmpID
column as row index.
df.set_index('EmpID',inplace=True)

We can sort the data by row index and also by column index.

df.sort_index()
or
df.sort_index(axis=0)
Both are the same. axis=0
means sort by row index.

By default, it will sort in ascending order.
2. Sorting dataframe by column index
Sorting a dataframe by column index by mentioning axis=1
.

df.sort_index(axis=1)

By default, it will be sorted in ascending order.
3. Sorting dataframe having multi-index
Creating a dataframe and setting two-column as row_index.
df=pd.read_csv("C:pandas_experimentpandas_sortingdata2.csv") df.set_index(["EmpID","Age"],inplace=True)

Now we can sort the dataframe by row_index EmpID and Age.
If level=0
is given, it will sort based on row_index “EmpID”
df.sort_index(level=0)

If level=1
means it will sort the dataframe based on row_index “Age”
By default,sort_remaining=True
, that means if sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
In this example, it will sort by level 1 (“Age”
), if values are the same means then it will sort by another index (“EmpID”
)
df.sort_index(level=1)

4. Sorting dataframe having multi-index by one level only
By mentioning one level and sort_remaining=False
, we can sort the dataframe having multi-index by one level only.
df.sort_index(level="Age",sort_remaining=False)

In this example,EmpID
index is not sorted. Dataframe is sorted only based on “Age”
index.
5. Sorting dataframe by index in descending order
If we mention ascending=False
, it will sort the dataframe in descending order based on the index mentioned.
df.sort_index(axis=0,ascending=False)

Conclusion:
- Sorting pandas dataframes will return a dataframe with sorted values if
inplace=False
.Otherwise ifinplace=True
, it will returnNone
and it will modify the original dataframe itself. - By default, all sorting done in ascending order only. If we mention,
ascending=False
it will sort in descending order.
GitHub Link:
My jupyter notebook and CSV file used in the code can be downloaded from my GitHub link.
My other blogs on Pandas:
Indexing and Slicing Python Pandas Dataframe
Resources(Python documentation):
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
Buy Me a CoffeeBuy Me a CoffeeBuy Me a Coffee