Introduction to Index in Pandas

Before going any further with explaining how to delete the index it is pertinent to have a quick tour of what the index in pandas is.

In pandas, an index is a label used to reference the individual rows in a DataFrame or Series structure. By default, Pandas numerates the rows using an integer index starting from 0 in the DataFrame. Still, it is possible to define an index for your DataFrame, which is a number, or several numbers, as well as a single column or a few columns in this case. This index is utilized in many activities like filtering, selecting data, and even aligning data.

For example, consider the following simple DataFrame:

import pandas as pd 
data = {'Name': ['John', 'Alice', 'Bob', 'Charlie'], 'Age': [23, 30, 25, 35]} 
df = pd.DataFrame(data) 
print(df)

Output

IndexNameAge
0John23
1Alice30
2Bob25
3Charlie35

From the above output, it can be realized that by default, pandas use an integer as an index beginning from 0. However, in some cases, your series will be created from the index and it would help to change that to something more meaningful such as a column in the data. Moreover, you may wish to start over or alter its index for data analysis, display, and computation among others.


How to Reset the Index in Pandas

The reset_index() function is used in Pandas to reset the index of a DataFrame. The syntax for this method is:

#pandas reset index
DataFrame.reset_index(level=None, drop=False, inplace=False)

Here’s a breakdown of the parameters:

  • level: Indicates which level of the index the method is to reset in DataFrame (useful in the case of MultiIndex DataFrame).
  • drop: If set to True, the current index is deleted and is not added as another column in the dataframe. With a default value of false, the current index is appended as a new column’s name.
  • inplace: When it is set as True, it alters the original DataFrame and in turn, does not create a new DataFrame.

To simplify it let us look at some examples.


Options for DataFrame reset_index()

1. Level Parameter

The ‘level’ parameter of the DataFrame can be used for vectors that have MultiIndex - that is the index of a certain level of the index in a dataframe is reset instead of all of them.

Example:

# Create a MultiIndex DataFrame 
arrays = [['A', 'A', 'B', 'B'], ['foo', 'bar', 'foo', 'bar']] 
index = pd.MultiIndex.from_arrays(arrays, names=('letter', 'word')) 

#pandas reset index
data = {'Value': [1, 2, 3, 4]} 
df = pd.DataFrame(data, index=index) 
print(df)

Output:

Letterwordvalue
Afoo1
bar2
Bfoo3
bar4

To reset the index in the dataframe only on the letter level, we can use the level parameter:

#pandas reset index
df_reset = df.reset_index(level='letter') 
print(df_reset)

Output:

letter word Value
A foo 1
A bar 2
B foo 3
B bar 4

2. Drop Parameter

Once the origin index has been reset and the DataFrame has the integer index as its default. The drop=True option serves the purpose of avoiding the inclusion of the old index column of the new DataFrame.

Example:

import pandas as pd 

# Create DataFrame 
data = {'Name': ['John', 'Alice', 'Bob', 'Charlie'], 'Age': [23, 30, 25, 35]} 
df = pd.DataFrame(data) 

# dataframe reset index and drop the old index column 
df_reset = df.reset_index(drop=True) 
print(df_reset)

Output:

Index Name Age
0 John 23
1 Alice 30
2 Bob 25
3 Charlie 35

Notice that being an identity function the reset_index() method returns by default a new DataFrame. If you want to apply the changes to the original DataFrame and want the resulting DataFrame object itself, you can use the inplace parameter set as True.

3. Inplace Parameter

The reset_index() by default returns the index into a new DataFrame. To write the changes toward the principal or main DataFrame it is feasible to use the inplace=True option.

Example:

# Reset the index in place
df.reset_index(drop=True, inplace=True) 
print(df)

However, in the updated version df itself transforms and there is no new DataFrame anymore.


When to Use reset_index()

You might need to reset the index in the following scenarios:

After Sorting Data

When you align an axis of a DataFrame in terms of one or more than one column then your index number will not be sequential. It is a usual practice to reindex the data after sorting so that the next index to be assigned would be the next number successively.

Example:

df_sorted = df.sort_values(by='Age')

#dataframe reset index
df_sorted.reset_index(drop=True, inplace=True) 
print(df_sorted)

After Dropping Rows or Filtering

When you delete some rows in the data frame or select data using some conditions, the index will be discontinuous. It will also be useful for resolving non-consecutive index issues, meaning that resetting the index will be useful.

Example:

df_filtered = df[df['Age'] > 25] 

#dataframe reset index
df_filtered.reset_index(drop=True, inplace=True) 
print(df_filtered)

After GroupBy or Aggregation

When you perform a group-by operation, the resulting DataFrame can have a more complex index (often a MultiIndex). Resetting the index can flatten the DataFrame, which is useful for further analysis or presentation.

Example:

grouped = df.groupby('Age').sum() 

#dataframe reset index
grouped.reset_index(inplace=True) 
print(grouped)

Practical Examples of Resetting the Index

Basic Reset

Here’s a basic example of resetting the index after performing some data manipulation:

import pandas as pd 

# Sample DataFrame 
data = {'Name': ['John', 'Alice', 'Bob', 'Charlie'], 'Age': [23, 30, 25, 35]} 
df = pd.DataFrame(data)

# dataframe reset index after sorting
df_sorted = df.sort_values(by='Age', ascending=False) df_sorted.reset_index(drop=True, inplace=True) 
print(df_sorted)

Output:

Index Name Age
0 Charlie 35
1 Alice 30
2 Bob 25
3 John 23

Reset After Grouping

It is also possible to reset the index after a groupby operation is conducted. For example, if you want to group by the Age column and calculate the sum of Age values:

data = {'Name': ['John', 'Alice', 'Bob', 'Charlie', 'David'], 'Age': [23, 30, 25, 30, 25], 'Score': [88, 92, 85, 90, 84]} 
df = pd.DataFrame(data)

# Group by 'Age' and calculate the sum of 'Score'
grouped = df.groupby('Age')['Score'].sum() 
print(grouped)

# dataframe reset index
grouped_reset = grouped.reset_index()
print(grouped_reset)

Output:

Age Score (Sum)
23 88
25 169
30 182
Age Score (Sum)
0 23 88
1 25 169
2 30 182

Handling Multi-Index DataFrames

A MultiIndex is another stunning feature of pandas that enables using more than one index level, which is called a MultiIndex (or hierarchical index). This is quite helpful when analyzing data with many attributes or when you need to map data structures with varied dimensions.

Example: MultiIndex with More Than One Level

Here is an example of where more details are added to form a DataFrame that is further incorporated with multiple indexing criteria that include Country and City.

arrays = [['USA', 'USA', 'Canada', 'Canada'], ['New York', 'Los Angeles', 'Toronto', 'Vancouver']] 

index = pd.MultiIndex.from_arrays(arrays, names=('Country', 'City'))

data = {'Population': [8_336_817, 3_979_576, 2_731_571, 631_000]}


#dataframe reset index
df_multi = pd.DataFrame(data, index=index) 
print(df_multi)

Output:

Country City Population
USA New York 8,336,817
USA Los Angeles 3,979,576
Canada Toronto 2,731,571
Canada Vancouver 631,000

Now, let’s suppose that you wish to clear all levels of that index and make them normal columns instead. You can do this as follows:

#dataframe reset index
df_reset = df_multi.reset_index() 
print(df_reset)

Output:

Country City Population
0 USA New York 8,336,817
1 USA Los Angeles 3,979,576
2 Canada Toronto 2,731,571
3 Canada Vancouver 631,000

Resetting Index in GroupBy Operations

The features in the output DataFrame may be ‘grouped by’ some level, which will create a more complex index. Sometimes it is even more convenient to reset the index in such cases..

For example, consider the following DataFrame where we calculate the mean score of students grouped by their class:

data = {'Name': ['John', 'Alice', 'Bob', 'Charlie', 'David'], 'Class': ['Math', 'Math', 'Science', 'Science', 'Math'], 'Score': [85, 92, 88, 95, 78]}

df = pd.DataFrame(data)

# Group by 'Class' and calculate the mean score
grouped = df.groupby('Class')['Score'].mean() 
print(grouped)

Output:

Class Score
Math 85.0
Science 91.5

To convert this Series back into a DataFrame and reset the index:

#dataframe reset index
grouped_reset = grouped.reset_index() 
print(grouped_reset)

Output:

Class Score
0 Math 85.0
1 Science 91.5

Alternatives to Resetting the Index

Despite the fact that reset_index() is a very useful operation in pandas, there are other ways to work with the index, based on the specific requirements for a given task.

1. Setting a New Index

If you have to change the existing index to a new one then set_index() method can be taken into consideration. It enables you to set one or more than one column of the table as the new index.

Example:

#dataframe reset index
df_new_index = df.set_index('Name') 
print(df_new_index)

2. Using sort_index()

Regarding the problem of the index being unordered, there is the function sort_index() which will sort the DataFrame by its index.

Example:

#dataframe reset index
df_sorted_index = df.sort_index() 
print(df_sorted_index)

Conclusion

In this blog, we have learned about the reset_index() function in pandas; a highly useful function concerning indices in data frames. Whether you want to continue sorting the index or drop rows, convert a MultiIndex DataFrame’s index into columns, or reset the index after grouping, it is possible with this method’s help.