Mastering Pandas: Practical Data Handling in Python
How to Get Column Names in Pandas?
What are Column Names in Pandas?
In Pandas, a column name is simply the name that is given to a particular column in a data frame. A DataFrame, similar to some kinds of tables, is a two-dimensional structure, in which data is arranged in rows and columns. The names given to the columns appear at the topmost part of the DataFrame - “Pandas Headers” to give you an idea of what the data, which is encapsulated in each column, represents.
In Pandas, the names of the columns (pandas headers) can be of any type, but they are most often string or any other hash type. It can be defined at the time of creating the DataFrame as well as can have default values while loading the data from some other source such as a CSV or Excel file.
For instance, in a situation where the data set is a DataFrame, which holds data about people, the labels of the columns can be ’Name’, ‘Age’, ‘City’, etc.
Methods to Access and Print Pandas Dataframe Column Names
1. Using dataframe.columns
This is the most straightforward way to get dataframe column names pandas as an Index object, and you can convert it to a list:
column_names = dataframe.columns.tolist()
2. Using list(dataframe)
The other easy way to get dataframe column names pandas is to directly pass the dataframe to list():
column_names = list(dataframe)
3. Using dataframe.columns.values
This will return the dataframe column names as a 1-D NumPy array. A NumPy array containing strings can become helpful for large batches of textual data. It supports operations of elements in a string array, including all kinds of vectorized transformations (lowercase, sub-string, or character replacement) and array-like data processing for string data.
column_names = dataframe.columns.values
4. Using list (dataframe.columns)
This is a somewhat more detailed version of the previous method - list(dataframe)
column_names = list(dataframe.columns)
5. Using sorted(dataframe)
There are many scenarios where you need to print column names of pandas in sorted order. For that, you can make use of the sorted() function
column_names = sorted(dataframe)
6. Using for loop over dataframe.columns
This is the iteration methodology where you iterate over column names and fetch them one by one:
for column_name in datafrme.columns:
print(column_name)
7. Using dataframe.keys()
The keys() function is an alternative of dataframe.columns which helps to convert column_name into a list. This conversion to a list can later be used easily for modification or iteration.
column_names = dataframe.keys().tolist()
8. Using dataframe.dtypes for extracting dataframe column names on the basis of data types
In Pandas or Python dataframe - column name, you can use the function called dataframe.dtypes to get dataframe column names pandas and filter them by data type. Here’s how you can do it:
# Pandas dataframe get columns where data type is integer
int_columns = dataframe.select_dtypes(include='int').columns
int_columns = dataframe.dtypes[df.dtypes == "int64"].index
# Pandas dataframe get columns where data type is float
float_columns = dataframe.select_dtypes(include='float').columns
float_columns = dataframe.dtypes[df.dtypes == "float64"].index
# Pandas dataframe get columns where data type is string
object_columns = dataframe.select_dtypes(include='object').columns
object_columns = dataframe.dtypes[df.dtypes == "object"].index
# Pandas dataframe get columns where data type is boolean
bool_columns = dataframe.select_dtypes(include='bool').columns
bool_columns = dataframe.dtypes[df.dtypes == "bool"].index
9. Using _get_numeric_data() to extract numeric columns
If you were to perform df._get_numeric_data(), then it would give you exactly the DataFrame that has been composed of float and int type columns.
If you wish to get only the names of the fields of the numeric columns you can use - .columns with it.
numeric_columns = dataframe._get_numeric_data().columns
10. Using list comprehension to filter pandas column names based on some condition
If you have some criteria where you only want a pandas column name that contains a specific substring you can use list comprehension
column_names = [col for col in dataframe.columns if ‘substring’ in col]
11. Using dataframe.columns.str
You can get column names in pandas based on starting or ending with any specific string
column_names = dataframe.columns[dataframe.columns.str.startswith(‘prefix’)]
column_names = dataframe.columns[dataframe.columns.str.endswith(‘suffix’)]
12. Using dataframe.filter()
The filter method in Pandas programming language which is abbreviated as dataframe.filter() is one of the most useful methods as with the help of this you are allowed to select one or more than one column (or rows) of the DataFrame on the basis of their label. It’s most valuable in the context of blocking/permitting columns based on name specifically or based on some pattern.
column_names = dataframe.filter(regex=’^user’, axis =1).columns.tolist()
13. Using dataframe.iloc[0]
This method is not a standard method for accessing column names in pandas DataFrame, and it might lead to confusion, as it's primarily used for accessing the index labels of the first row of the DataFrame, not the actual column names. But anyway, it works well, so we thought to put this method as well in this list.
column_names = dataframe.iloc[0].index.tolist()
Column Operations in Pandas: String Manipulation Techniques
When referring to the column names in pandas DataFrame you have to apply several string manipulations to clean up, reformat, or modify the column pandas headers. Here are some of Panda’s string manipulation methods that you can use directly on columns by using dataframe.columns. Here are some common operations you can use on column names:
1. Replacing Substrings
There are situations for analysis or operational purposes where changing the substring of a column name might be required.
For example, if a column name is ‘First Name’, using this method you can change it to ‘First_Name’ - replacing ‘ ’ with ‘_’.
dataframe.columns = dataframe.columns.str.replace(' ', '_')
2. Changing letters to Uppercase
Using str.upper() you change column names to uppercase for consistency.
dataframe.columns = dataframe.columns.str.upper()
3. Changing letters to Lowercase
Using str.lower() you can change column names to lowercase for consistency.
dataframe.columns = dataframe.columns.str.lower()
4. Stripping Whitespace in Column Names
Removing leading and trailing whitespaces is an important step in the ETL cycle. Stripping whitespaces from column names in particular is a key step in data cleaning in the pandas. It eliminates the possibility of making mistakes when naming certain columns, enhances order in the way we name them, and lastly reduces chances of mismatch especially when merging or filtering data.
When removing whitespace you enhance the readability of column names, make them more predictable, and more suitable for further uses such as compatibility with other applications as well as databases.
dataframe.columns = dataframe.columns.str.strip()
Conclusion
Firstly, it is crucial to understand the knowledge of how to read column names in Pandas to perform data analysis effectively. A column name is a label that is used when choosing or removing data in a DataFrame. Whichever way you may be using them; if it is something like analyzing affordance, wanting more concise column names, or doing dynamic data filtering, such methods described in this blog provide quick access to the names of the columns.
Also by being able to change column names to forms that are easy to understand, the code you write becomes easy to read and managable. When implemented in practice, simple and consistent naming of the columns makes it easier to manage the dataflows even in large and complicated operations. The mastering of this particular skill is something that goes a long way towards data science.
Wishing you all the best — Happy coding! 🚀