Mastering Pandas: Practical Data Handling in Python
iloc() function: Learn to extract rows and columns
What is the iloc() function in Python?
Pandas Library in Python offers various ways for data manipulation and preprocessing. One of the efficient tools is the DataFrame which is used to arrange the datasets in tabular form. A DataFrame has three main components- Data, Index, and Columns, among which Index and Columns are used for accessing the data contained in DataFrame.
It is a well-known fact that DataFrame contains huge datasets. So, it is not mandatory to always retrieve all the rows and columns. In some Data Operations, only specific rows or columns are needed. That is where loc() and iloc() functions come into the picture. loc() and iloc() function in Python are used to access a group of rows and columns by their labels.
Is iloc() different from loc() function?
Both loc() and iloc() in DataFrame are used to retrieve the rows and columns. But, their implementation is different from each other. Let’s first understand the main difference between loc() and iloc(). loc() method is label-based and can be used with a Boolean array. Whereas the Pandas iloc() function in python is a pure integer-based indexing method to select a particular cell of a DataFrame.
It takes arguments in the form of a single label or a list of the label. For slicing too, labels are used. For example, we have created a DataFrame with five columns and ten rows and inserted random numbers using the NumPy Random method.
Output
Now, a particular column or row can be accessed in the following way:
Output
If you want to access a particular column, they use the following code. Here, we have fetched column label ‘a.’
You can also retrieve particular rows and columns together using the slicing concept.
Output
You might have noticed that in loc() function, we have always used the labels of the rows and columns of DataFrame. But, the iloc() function in Python takes the pure integer-based index as an argument irrespective of the labels. DataFrame.iloc() method is used in the following way:
Output
Output
How to use the iloc() function in Pandas?
There are many ways to use DataFrame.iloc() function. The indexing in the iloc() function is from 0 to (length of axis-1) and raises IndexError if the index goes out of bounds. It can be invoked in the following ways:
- Using a Scaler Integer
- Using a List of Integer
- With a Slice Object
- Using Boolean Array
Let’s see the implementation of the above ways.
With scaler Integer, the iloc() method returns the rows with the given index. For example, to access the
Fourth Row, the statement would be df.iloc[3].
Output
But, if you want to access a single element, you can pass the row index with the column index also.
Output
Now, if a list of integers is passed, then a DataFrame is returned.
Output
With a slice object also, a DataFrame is returned.
Output
You can also pass a Boolean array into the function iloc in DataFrame which will return all the rows with True values. The Boolean array should be of length equal to the total number of rows in the DataFrame. Otherwise, an error will be raised.
Output
These were different ways to invoke the iloc() method while working with the DataFrame objects. One more method is to use the iloc() using a callable function with one argument. For example, we can use the iloc() function in Python with the lambda function in the following way.
Output
In the above code, the x is a DataFrame passed to the lambda function to be sliced. This selects the first and third rows. In this way, the DataFrame.iloc() method can be utilized to manipulate the datasets in the DataFrame.