Finding your path in Python

See what I did there?

Anup Joseph
Level Up Coding

--

Literally the first result on Unsplash of the word path — Photo by Stephen Leonardi on Unsplash

Python’s standard library has so much cool functionality in it that some of it inevitably gets missed. I think that’s what happened to the pathlib. Now don’t get me wrong, I don’t mean to say that pathlib is some esoteric, unknown library but neither is it the first think that most Python programmers would think of when handling paths(definitely string manipulation or os.path). So in this post let’s explore this beautiful module a bit.

The pathlib module gathers all the tools you would need to work with paths in one convenient place and provides a very flexible object-oriented API to work with the same. Pathlib abstracts away unnecessary complexity of different operating system paths(the whole Window using \ and Mac and Linux using / thing) using the Path API but is flexible enough to give the access to different OS level API’s using the PosixPath and WindowsPath APIs.

Let’s make a Path

Making a path in pathlib is pretty straightforward. It doesn’t matter too much as to whether the string passed to Path is Windows based or Linux based path:

from pathlib import Path
path = Path("files/file.txt")

From here on I am going to assume that Path is imported

To construct a new path, we use the / operator. Say we had another folder on top of that files folder in above example. To construct that path we can simple use:

new_path = "all_files"/path

So long as at least one of the operands to the / operator is a Path object this works.

Some Basic Uses

A very common usecase for paths is to pick out the file extensions and name of the file to which the path leads to. Some of the workarounds I have seen and used to get this done include:

simple_path = "all_files/files/file.txt"# To get the full file name
simple_path.rsplit("/")[-1]
# To get the extension
simple_path.split(".")[1]
# This of course work under the assumption that no one will be monstrous enough to put a . in the folder name(of course they won't says .git). To make it more umm... robust you could do
simple_path.rsplit("/")[-1].split(".")[1]
# To get just the file name
simple_path.rsplit("/")[-1].split(".")[0]
My God that’s a lot of screenshots

Not quite obviously these aren’t the only ways to do these operations in Python(keeeping in the spirit of the Zen of Python). We should have used os.path.split to ensure cross platform compatibility. This exactly the sort of case where pathlib shines. Replicating this in pathlib:

To get the full file name:

path.name

To get the extension:

path.suffix # Note this returns the extension with dot included

And finally to get just the file name without extension

path.stem

Pathlib has none of the verbosity or complications involved in the earlier examples. Your code remains clean, pretty and I’d say more readable.And if you just want the string representation of the path you can simle use

str(path)

Similarly if you want to find the directory where a given files is that’s pretty simple too:

path.parent
>>> PosixPath('files')

Now this is arguably more trickier to pull off with os.path. Let’s say you want all the parents of the path. If you just want the full path to the file, you can use this:

path.resolve()
>>> PosixPath('/content/files/file.txt')

And if you want all the paths till root:

list(path.resolve().parents) 
>>> [PosixPath('/content/files'), PosixPath('/content'), PosixPath('/')]

Iterating, Matching and More

Another very common operation is to iterate through directories. Now pathlib does have a method called iterdir to handle that, but I prefer to use the Path().glob technique. To iterate through all the files in a directory:

dir_path = Path("/content")
for p in dir_path.glob("*"):
print(p.name)
# .config
# sample_data

If you wanted instead to recursively iterate through the directory instead i.e. go into each individual subdirectory and print all the paths in them, you can easily use glob to do that

for p in dir_path.glob("**/*"):
print(p.name)
# .config
# sample_data
# .last_update_check.json
# gce
# config_sentinel
# configurations
# .last_opt_in_prompt.yaml
# .last_survey_prompt.yaml
# active_config
# logs
# config_default
# 2021.11.01
# 13.34.28.082269.log
# 13.34.35.080342.log
# 13.34.08.637862.log
# 13.34.55.836922.log
# 13.34.55.017895.log
# 13.33.47.856572.log
# anscombe.json
# README.md
# mnist_test.csv
# california_housing_test.csv
# mnist_train_small.csv
# california_housing_train.csv

Now this is basically the glob module in Python meaning that all the functionality of glob is present here as well. So if want only the csv files in the directory we can simply do this

for p in dir_path.glob("**/*.csv"):
print(p.name)
# mnist_test.csv
# california_housing_test.csv
# mnist_train_small.csv
# california_housing_train.csv

This is an extremely small sampling of all the functionality present in pathlib. If you want a more indepth guide to pathlib, this RealPython guide might be of help:

And if you want a set of recepies to plug and play in your own projects this giant list is probably what you are looking for:

That’s it for today! Hope you have a nice day.

If you have any question feel free to drop me a line on LinkedIn or Twitter.

--

--