List Files i n a Directory in Python
Listing the files in a directory is a common task in many programming activities involving file manipulation, data processing, or automation. Fortunately, Python has different methods to perform this task, and each has its own merits depending on what exactly is to be achieved.
Following are some of the common libraries and functions that are used to list the files in a directory:
- os: Makes it easy to work with the file system.
- glob: Useful for pattern matching in filenames.
- pathlib: A modern, object-oriented way to work with file paths.
In the following section, we go through how each of these libraries is used in listing files in the directory.
Python List Files in Directory Using the os Library
The os module is one of the most used Python libraries since it does the interaction between the code written in Python with the operating system used. It includes functions that enable you to do different file system operations like listing files in a directory. The os.listdir() method returns a list of all files and directories located in a given path. By default, it's going to list both files and directories, so you will need to filter out the directories if you want to list only the files.
os.scandir() is more efficient compared to os.listdir(), especially for big-sized directories. It iterates over os.DirEntry objects, which contain more info than just file names, like file attributes. Next, os.scandir() also allows reading of file metadata like file size or modification date, which makes it a really powerful tool when working with directories.
Python List Files in Directory Using the glob Module
The glob module is one of the most flexible and popular tools for file listing in Python script. It has entry points to retrieve filenames with wildcard patterns since the return type of ListFiles is an array, which makes it easy for a program to list files that match a certain extent of naming or file extension.
Example:
import glob
# List all .txt files in the current directory
txt_files = glob.glob('*.txt')
print(txt_files)
Key Points:
- The glob.glob() function takes a pattern as an argument (it may be * or ?)
- Relative or absolute paths are allowed.
- Wildcards can complement file extensions, prefixes, etc.
Pros:
- It is relatively easy to use when it comes to pattern matching.
- Especially when it comes to basic file listing functions - it is very effective and handy.
Cons:
- It is quite basic, with limited compatibility with directories, and provides no Program Information records, file modification time, and so on.
Python List Files in Directory Using pathlib Module
The pathlib module was introduced from the Python 3.4 and provides a modern Way of file and directory management. It is a contemporary method for listing a directory and can filter and maneuver paths with a Path object.
Example:
from pathlib import Path
# Show all the files in the folder
path = Path('.')
files = [file for file in path.iterdir() if file.is_file()]
print(files)
Key Points:
- Path.iterdir() yields a list of all entries in this directory which includes the files as well as the directories.
- There are ways to filter files such as .is_file()
- Supports easy working with paths with methods like .suffix for the extensions.
Pros:
- Ordered, more tolerant of change, and easier to use.
- Faster for simple sequential file and directory access manipulation than random access manipulation, which is more robust for complex file and directory manipulations.
Cons:
- Requires Python 3.4 or higher
How to List Files in a Directory in Python Recursively
There are instances, however, when you develop a need to list files in subdirectories too. As we shall see, Python offers various methods for directory listing, including the recursive one. We will discuss here how to add this capability for both os and pathlib modules.
Example with os:
import os
# Recursively list all files in the current directory and subdirectories
for root, dirs, files in os.walk('.'):
for file in files:
print(os.path.join(root, file))
In many cases, consecutive similar lines of codes are replaced by a single ‘for’ line. Like in this case the two lines of codes are combined into a single for loop as follows.
Example with pathlib:
from pathlib import Path
# Recursively list all files in the current directory and subdirectories
path = Path('.')
files = [file for file in path.rglob('*') if file.is_file()]
print(files)
Key Points:
- os.walk function returns a directory path and directories and every file in the directory as a tuple.
- pathlib.Path.rglob() searches for files matching a pattern, which can be ’*’ as in all files.
- Both give path to the files and you can screen the files based on extensions or even properties.
Pros:
- Quite useful while dealing with nested directories.
- Convenient for usage with both os and pathlib.
Cons:
- Recursion can be slow especially where there are many subdirectories (as a result, if the directory to be traversed is very large, it is advisable to limit the depth of the recursion or make some enhancements to the code).
Python List Files in Folder with Filtering Options
There are instances when you do not want to display information about all the files in the directory. You might want to filter them by their type, extension, or even some attributes of the file, such as mod time. Here’s how to do that:
Example:
from pathlib import Path
# List all .txt files that were modified in the last 7 days
path = Path('.')
files = [file for file in path.glob('*.txt') if file.stat().st_mtime > time.time() - 7*86400]
print(files)
Filtering Options:
- Extension filtering: To list specific file types it is recommended to use path.glob ‘*.txt’ or path.rglob ‘*.txt’.
- Modification time: Sort files according to the time of their last modification as returned by file.stat().st_mtime
- Size filtering: To filter files by size, use file.stat().st_size.
Key Points:
- With the stat() method, you have an ability to obtain file information – modification time (st_mtime), file size (st_size), etc.
- Files can be filtered by any attribute, for example by name, size, or time of modification.
Pros:
- Organized filtering by numerous characteristics of the file that is being selected.
- Offers more control with regard to the files listed.
Cons:
- It can be slightly more complicated when you apply a filter by one or several attributes, for instance, time and extension.
Conclusion
One of the basic operations in programming is listing files in a directory, in Python there exist a number of methods for it depending on the situation. Even in the most basic version of the glob module and linear development to the pathlib module, and building on os.walk() or even pathlib.rglob(), Python offers powerful solutions for working with directories. When you know how to filter, deal with recursion, and optimize performance, you can easily list and work with the files required in your Python programs.
Anyway, you should keep in mind that your code should be as readable as fast as possible, and flexible to use.
Wishing you all the best — Happy coding! 🚀