Python for Data Science

Count Rows and Column with Pandas

Count Rows and Column with Pandas

What is a Pandas DataFrame?

Lets first understand what a DataFrame is before proceeding to understand how we can count rows and columns in the Pandas’ DataFrame.

Pandas DataFrame is a two-dimensional, size-mutable, potentially heterogeneous table-like data structure that has row and column labels. It looks more like a database, or better yet, a spreadsheet with rows and columns for inserting and storing data. The column headers may be assigned different data type (integer, string, float, etc) which means that each line contains data records.

Counting Rows and Columns: Basic Concepts

Rows are Record or Observations, and Columns are the fields or the characteristics of the records processed in a DataFrame. The number of rows in dataframe and total number of columns can be essential for several tasks, such as:

  1. After forming a relationship between them, the shape of the data needs to be checked.
  2. Knowing the size of the data before going through alteration operations.
  3. Using breakpoints to debug or check profile of a large set of objects to confirm if they got the expected size.

Now, let us discuss how Pandas allows us to count the rows and the columns.

Method 1: Using the .shape Attribute

The methods of counting the rows and columns of DataFrame are relatively straightforward, and the most widely used one is .shape. This attribute returns a tuple which represents the dimensionality of this DataFrame. The first element of the tuple contains the value of the rows, and the second element contains the value of the columns.

#Example

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [24, 27, 22, 32],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}

df = pd.DataFrame(data)

# Count the total number of rows in data frame and count total columns
shape = df.shape
print(f"Rows: {shape[0]}, Columns: {shape[1]}")
Output:
Rows: 4, Columns: 3

In the above example df.shape gives us the output as (4,3) which means that our data set has four rows in it and three columns.

Method 2: Using the len() Function

The second way to count rows is doing it with help of Python’s built-in len() function which counts the number of items in an object. If used with Pandas DataFrame, then, len(df) will give the total number of rows in dataframe, which contains the data.

To get the number of columns as an integer we can use len(df.columns) which gives the number of columns by using the number of items in the DataFrame’s columns attribute.

# Count rows using len()
rows = len(df)
print(f"Number of rows: {rows}")

# Count columns using len(df.columns)
columns = len(df.columns)
print(f"Number of columns: {columns}")
Output:
Number of rows: 4
Number of columns: 3

Method 3: Using the .count() Method

Pandas: the .count() method gives the non-null entries in the column and also the count of different entries in every column of DataFrame. In its basic form, it has an option of counting non-null (non-NaN) rows in a DataFrame by default.

# Count non-null entries in each column
column_counts = df.count()
print(column_counts)
Output:
Name    4
Age     4
City    4
dtype: int64

The use of this method can be useful when your data contains missing values (NaN) because it allows to count only valid data entries per column. If you want to count the total number of rows in dataframe with non-null values you can use .count() on the whole DataFrame.

Counting Total Non-Null Rows:

total_non_null_rows = df.count(axis=0).sum()
print(f"Total number of non-null rows: {total_non_null_rows}")

Method 4: Using .info()

The .info() function offers information about the entries into the DataFrame; that is, entries (rows) into the frame and the number of non-missing entries in each of the columns. It is also showing the data type of the columns in list format.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   City    4 non-null      object
dtypes: int64(1), object(2)
memory usage: 128.0+ bytes

absence of data in this specific DataFrame.

Method 5: Counting Rows and Columns with .shape and len() Together

You can use the .shape and len() function together in order to get a detailed count of the number of rows in dataframe and total columns count in a specific set quickly. Here’s how to do it:

# Get the shape of the DataFrame
num_rows, num_columns = df.shape

# Output the result
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_columns}")
Output:
Number of rows: 4
Number of columns: 3

Additional Considerations

Counting Rows and Columns in Specific Subsets

At times, it is necessary to determine the number of rows in dataframe and total columns count in a given portion of the data. This could be done through sub-setting the DataFrame: partly the whole array or the individual columns and/or rows.

Counting Rows after Filtering:

# Filter the DataFrame for rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]

# Count rows and columns after filtering
filtered_rows, filtered_columns = filtered_df.shape
print(f"Rows after filtering: {filtered_rows}")

Counting Columns with Specific Data Types:

# Count columns of a specific data type (e.g., numerical columns)
numeric_columns = df.select_dtypes(include=['int64', 'float64'])
print(f"Number of numeric columns: {len(numeric_columns.columns)}")

Counting Rows and Columns in a Large DataFrame

When dealing with large data set it is important to know the best way to count the number of rows in dataframe or the total count of columns. Methods like .shape and len() are optimized for the speed and memory, that’s why they are perfect for big data sets.

Counting Rows and Columns in MultiIndex DataFrames

The DataFrames can be more advanced where you get the DataFrames in rows as well as in columns which are actually called MultiIndex. The same methods (.shape, len(), .count()) can be used this will include the ability to deal with a hierarchal index.

# Create a MultiIndex DataFrame
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
index = pd.MultiIndex.from_arrays(arrays, names=('Letter', 'Number'))

df_multi = pd.DataFrame({'Value': [10, 20, 30, 40]}, index=index)

# Count rows and columns
print(df_multi.shape)
Output:
(4, 1)

Here, .shape returns 4 rows and 1 column.