File and Directory Backup, Part 1

Using Python and PyQt5

ZennDogg
Level Up Coding

--

Photo by Jan Antonin Kolar from Unsplash

We all know backing up our files is important. I wanted to write my own backup program in python. This project can be downloaded from Github at :

https://github.com/zazen000/File-and-Directory-Backup

My requirements were that it be unobtrusive and display a small message while the program was running. Easy, right? After all, this is python with its plethora of modules, both internal and external.

Well, I had a heck of a time with that last part, requiring three separate files, as we’ll see. But, first, the imports for the main backup program.

import osimport sysimport timeimport filecmpimport subprocess

We need to set up the source and destination directories we want to back up. I have 10 source/destination pairs in a tuple:

dirs = (
(r”C:\Users\mount\source\repos”, r”M:\_BACKUP\REPOS”),
(r”D:\mount\Downloads”, r”M:\_BACKUP\DOWNLOADS”),
(r”D:\mount\Documents”, r”M:\_BACKUP\DOCUMENTS”),
(r”D:\Internet-Marketing”, r”M:\_BACKUP\IM”),
(r”D:\_HOLOSYNC”, r”M:\_BACKUP\HOLOSYNC”),
(r”D:\mount\Music”, r”M:\_BACKUP\MUSIC”),
(r”D:\_BIZ”, r”M:\_BACKUP\BIZ”),
(r”D:\_PWA”, r”M:\_BACKUP\PWA”),
(r”C:\data\db”, r”M:\_BACKUP\DB”),
(r”C:\ProgramData\MongoDB”, r”M:\_BACKUP\MONGODB “),
)

For each line, the first parameter is the source directory and the second parameter is the destination directory. As you can see, this is pretty much every data directory on my PC. Now to write some functions.

I’m using robocopy.exe as the copy engine. Robocopy copies existing source directories to existing destination directories, but what if the destination directory doesn’t exist? We need a function to check for and build a sub-directory if it does not.

def ensure_directory( destination ):    directory = os.path.dirname( destination )
if not os.path.exists( directory ):
os.makedirs( directory )

The function, ensure_directory, checks that the destination folder exists. If not, the os creates the directory.

Next, we need to compare the contents of both the source and destination directories. If they do not match (size, date, etc) a copy routine is initiated. So, of course, we need a function to compare directories. I call it, compare_directories.

def compare_directories( source, destination ):    try:
comp = filecmp.dircmp( source, destination )
common = sorted( comp.common )
except:
return False

left = sorted( comp.left_list )
right = sorted( comp.right_list )
if left != common or right != common:
return False

if len( comp.diff_files ):
return False

for subdir in comp.common_dirs:
left_subdir = os.path.join( source, subdir )
right_subdir = os.path.join( destination, subdir )
return compare_directories( left_subdir, right_subdir )

return True

Using filecmp.dircmp, the source directories are compared with the destination directories and the object is sorted. There are three opportunities for a boolean False to be returned, initiating the backup sequence.

If sorted(comp.common)has an error, a False is returned.

If the sorted( comp.left_list ) or the sorted( comp.right_list ) objects are different than the sorted(comp.common) object, a False is returned.

And last, if the number of files in the source and destination directories are not the same, a False is returned.

When all three are True, a True is returned; no backup is needed.

I decided to add logging capability to my project. While researching robocopy, I found it had an extensive logging toolkit. That means we need another function, log_directory.

def log_directory():    now = time.strftime(“%Y-%m-%d___%H-%M”)
direct = os.path.dirname(“C:\\Users\\mount\\source\\repos\\MyDashboard\\LOG\\”)
directory = direct + ‘\\’ + now + ‘\\’
if not os.path.exists( directory ):
os.makedirs( directory )
return directory

The first line creates the date/time string. The second line of the function points to the path the log directory is to be stored. The third line names the folder and the if statement writes the folder.

Robocopy uses _BACKUP.log as its log file name, and is written for every source/destination pair copy sequence, or in our case, 10 times. Then I found that the individual logs could be appended to one log file for each backup run. Problem solved!

Not quite.

I ran a backup in the morning and then one in the afternoon. When I went to check the log file, there was only one, because the older one was written over.

How does one change the filename for every backup? At the time, I didn’t know.

The solution I came up with is to create a folder and name it for the date and time of the backup. The _BACKUP.log for that run is placed in that folder.

Or you can name the file itself for date and time and forego the multiple directories.

This is what the log file looks like for one source/destination pair.

 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 
ROBOCOPY :: Robust File Copy for Windows
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Started : Monday, January 18, 2021 7:53:59 PM
Source : C:\Users\mount\source\repos\
Dest : M:\_BACKUP\REPOS\

Files : *.*

Options : *.* /S /DCOPY:DA /COPY:DAT /XX /XO /MT:128 /R:2 /W:5

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Newer 353 C:\Users\mount\source\repos\trading\get_symbols.py
100%
Newer 26307 C:\Users\mount\source\repos\trading\ubStock_Research.py
100%
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Total Copied Skipped Mismatch FAILED Extras
Dirs : 618 618 612 0 0 89
Files : 34096 9 34087 0 0 119
Bytes : 2.964g 76.6k 2.964g 0 0 108.21 m
Times : 0:01:09 0:00:16 0:00:00 0:00:15

One of these logs is appended to the log file for each of the ten source/destination pairs in the sequence. Now to write some logic.

count = 0
logg = log_directory()
lenn = len(dirs)

First, we initialize some variables. logg calls log_directory, count is initialized to 0 and lenn is the length of dirs, our list of source/destination pairs.

Before continuing, I wanted to show the syntax for robocopy:

>robocopy source_directory destination_directory switches

We assign the variables for the above parameters.

src = source directory
dst = destination directory
swt = switches for robocopy

These are the switches I used and what they are for:

 /xo = only newer versions of file,
/s = all occupied sub-directories,
/MT:nn = # of threads (maximum=128, default=8)
/xx = copy source file even when destination file does not exist
/LOG+ = a log is created for every d in dirs. /LOG+ appends all logs to one file
/r:n = number of times to retry, default = 1,000,000
/w:nn = number of seconds to wait before retrying

Notice the switches used were annotated in the backup log under Options: .

Now, we can iterate through dirs.

for dir in dirs:    count += 1
src, dst, swt = dir[0], dir[1], f”/XX /r:2 /xo /s /w:5 /MT:128 /LOG+:{logg}_BACKUP.log”
status = (compare_directories(src, dst))

And the rest of the code.

    if status == False:

ensure_directory( dst )
compare_directories( src, dst )
cmnd = f”robocopy {src} {dst} {swt}”
copi = subprocess.Popen( cmnd, shell=False )
code = copi.wait()

codes = range(0,9)

if count == lenn and code in codes:
write_txt_file( “oo.txt”, “this file has changed!” )

sys.exit()

When compare_directories returns a False, the if status == False statement takes effect. First, we determine if the destination directories exist and create them if they don’t. Then, compare_directories is called again.

Next, the variable for the robocopy copy sequence is initialized. The rest of the code is a solution to a problem I encountered when trying to integrate this program with a GUI for displaying the “Working…” label.

If we were to remove the five lines of code above the sys.exit(), the program should run quite well on it’s own (actually, it runs fine as it is). So, how do we tell when the backups are finished? How do we even know if it’s running?

It took me a while, but I figured out a way.

But that is for Part 2.

If you enjoy reading stories like these and want to support me as a writer, consider subscribing to Medium for $5 a month. As a member, you have unlimited access to stories on Medium. If you sign up using my link, I’ll earn a small commission.

--

--

Retired military, Retired US Postal Service, Defender of the US Constitution from all enemies, foreign and domestic, Self-taught in python