Mastering Pandas: Practical Data Handling in Python
Read xlsx File using Pandas
Introduction
Data Analysis has become one of the fastest-growing Industry trends these days. Therefore, fetching huge datasets and analyzing them has been a crucial skill for aspiring developers. However, these datasets do not come from a single source, they are retrieved from different file formats and storage.
One of the most commonly used file formats for storing data in Microsoft Excel File which has .xlsx extension. According to Acuity Training a UK Excel training provider, the average worker spends 38% of their time working in Excel.
In this article, we will learn how you can read xlsx files in Python so that you can retrieve the required data and analyze it to gain some useful insights.
Pandas Read Excel File functionality allows you to load the data from an Excel Spreadsheet and manipulate it by reading, writing, or modifying it to perform the Data Analysis Operations. Let’s see how we can read xlsx files in Python using Pandas Library.
Pandas Read Excel Method
Pandas provide its method to read the Excel Spreadsheet called ‘read_excel()’ which reads the excel file into a Pandas DataFrame. This method can read the file from the local filesystem or URL. It supports file extensions of xls, xlsx, xlsm, xlsb, odf, ods and odt files. You can also pass a path object into the read_excel() function to read the xlsx file in Python. For example,
We have taken a sample Excel Sheet to demonstrate how to read xlsx files in Python. We first need to import the Pandas Package and then call the read_excel() function to read the file. You may get an error like this.
This error occurs because the module ‘xlrd’ has not been installed. It is a library used for reading the input Excel files in Python. If you get this error, simply open Command Prompt and type the following command:
pip install xlrd
After this you need, to copy the file name and pass it into the Pandas read Excel function to read the file. Since the sheet has many rows, we have used the .head() method, which returns the first five rows of the Excel Sheet. You will get the following output.
An important point to note is that our Excel file exists in the same folder in which our Python program is saved. But, what if we store our files in some different location? Would it always be required to keep the file in the same directory?
The answer is no. Because in such cases we can pass the path object to Pandas Read Excel function. You need to copy the path of the file which is stored under a different directory and pass the path into the read excel method in Python. You will get the following output.
You might be thinking why we have put ‘r’ before the path. The reason behind this is that a single backslash(\) refers to the escape character.
This is the reason ‘\n’ means a new line, not a backslash. So, if you use a normal string as a path, it won’t work because a character after a single backslash will act like an escape sequence. So, we have put ‘r’ before the path to make it an unescaped string.
Now, you have learned Pandas Read Excel Function to read the xlsx file in Python. There are some other methods also to read the Excel file in Python.
Using openpyxl library
Openpyxl library provides a load_workbook() function to read the file. After this, the object ‘data.active’ is created to read the properties such as max_row and max_columns of the sheets.
Then, a nested for loop is used to iterate over the Excel sheet to print the values.
Using xlwings Library
The xlwings library in Python is capable of both reading as well as writing into the Excel file in Python, which is different from the Pandas Read Excel Function. It is a very useful package that provides some Data Analysis techniques in the Excel Instance itself. Below is the Python implementation of xlwings read Excel file in Python.
These were the ways to read the data from the Excel file and modify it as per the requirements.