core

Utility for automating backups of a specific file or directory

Automated Backups

We want a script to back up a specific file/folder over different intervals. Specifically, it should

  • Copy to some destination dir every hour (e.g. a different drive)
  • Keep the last 5, and one every day, week and month (for example)

We can then rsync the destination dir to keep a remote backup.

!mkdir -p demo_src
!mkdir -p demo_dst
!rm -rf demo_dst/*
!rm -rf demo_src/*
!echo "content" > "demo_src/test_text.txt"
!echo "## content" > "demo_src/test_two.md"

The core functionality

The plan has two main steps:

  • Create a new backup
  • Clean up any old backups that are no longer needed.

For step 1 we want to go file by file in case of errors, and support a matching pattern for what to include. So, take 2:

globtastic("demo_src", file_glob="*.md") # Finding files
(#1) ['demo_src/test_two.md']
globtastic("demo_src")
(#2) ['demo_src/test_text.txt','demo_src/test_two.md']

source

create_backup

 create_backup (src, dest_dir, dry_run=False, recursive:bool=True,
                symlinks:bool=True, file_glob:str=None, file_re:str=None,
                folder_re:str=None, skip_file_glob:str=None,
                skip_file_re:str=None, skip_folder_re:str=None)
create_backup('demo_src', 'demo_dst')
!ls demo_dst
20241127_151709
!ls demo_dst/20241127_151709
test_text.txt  test_two.md
# Test single file
create_backup('demo_src/test_text.txt', 'demo_dst', dry_run=True)
Copy from demo_src/test_text.txt to demo_dst/20241127_151712
# Test pattern
create_backup('demo_src', 'demo_dst', file_glob='*.md', dry_run=True)
Copy from demo_src/test_two.md to demo_dst/20241127_151721/test_two.md
# Test skip_pattern
create_backup('demo_src', 'demo_dst', skip_file_glob='*.md', dry_run=True)
Copy from demo_src/test_text.txt to demo_dst/20241127_151737/test_text.txt

The harder part is the cleanup. Let’s start by generating some dates to test with.

def generate_test_dates(num_dates, base_date):
    return [(base_date + timedelta(hours=i)).strftime("%Y%m%d_%H%M%S") for i in range(num_dates)]
test_dates = generate_test_dates(2400, datetime.now() - timedelta(days=100))
print(test_dates[:5], test_dates[-5:])
['20240819_151740', '20240819_161740', '20240819_171740', '20240819_181740', '20240819_191740'] ['20241127_101740', '20241127_111740', '20241127_121740', '20241127_131740', '20241127_141740']
# Can I get all dates < 2 months old?
[d for d in test_dates if (datetime.now() - datetime.strptime(d, '%Y%m%d_%H%M%S')).days < 60][:3]
['20240928_161740', '20240928_171740', '20240928_181740']

Now we want to grab the most recent 5, and then the oldest below some threshold.


source

clean_dates

 clean_dates (dates, now=None, max_ages=(2, 14, 60))
clean_dates(test_dates)
['20240928_161740',
 '20241113_161740',
 '20241125_161740',
 '20241127_101740',
 '20241127_111740',
 '20241127_121740',
 '20241127_131740',
 '20241127_141740']

Now we want code that starts with the same test dates etc as above, but then simulates time passing by adding an hour to ‘now’ and a date to test dates every step then printing out a (prettified) version of clean_dates to check it’s doing as I expect over a simulated month.

# # Initialize
# now = datetime.now()
# test_dates = generate_test_dates(2400, now - timedelta(days=100))

# # Simulate time passing
# for _ in range(30 * 24):  # Simulate a month (30 days * 24 hours)
#     now += timedelta(hours=1)
#     test_dates.append(now.strftime("%Y%m%d_%H%M%S"))
#     test_dates = clean_dates(test_dates, now)  # Clean up old dates
#     if _ % 24 == 0:  # Print once a day
#         print(f"\nDay {_ // 24 + 1}:")
#         pprint.pprint(test_dates)

NB: Yay, it looks to be doing mostly what I want! I can collapse the output, if you’re viewing this in a notebook my apologies :)

Turning it into a script

Now that those two pieces of functionality seem to be working, we can wrap this up as a script using fastcore’s call_parse, have it run the backup, clean up old files, and log any errors or messages to backup.log


source

run_backup

 run_backup (src:str, dest:str, max_ages:str='2,14,60',
             log_file:str='backup.log', dry_run:<function bool_arg>=False,
             recursive:<function bool_arg>=True, symlinks:<function
             bool_arg>=True, file_glob:str=None, file_re:str=None,
             folder_re:str=None, skip_file_glob:str=None,
             skip_file_re:str=None, skip_folder_re:str=None)

Run backup and cleanup old files. Takes globtastic args.

Type Default Details
src str The source to be backed up
dest str The destination directory
max_ages str 2,14,60 The max age(s) in days for the different backups
log_file str backup.log
dry_run bool_arg False Dry run?
recursive bool_arg True
symlinks bool_arg True
file_glob str None
file_re str None
folder_re str None
skip_file_glob str None
skip_file_re str None
skip_folder_re str None
!ls demo_src
test_text.txt  test_two.md

Testing a directory:

!rm -r demo_dst/*
run_backup('demo_src', 'demo_dst')
!ls demo_dst
20241127_151747
!ls demo_dst/20241127_151747
test_text.txt  test_two.md

Testing a pattern

!rm -r demo_dst/*
run_backup('demo_src', 'demo_dst', skip_file_glob="*.md", dry_run=True)
Copy from demo_src/test_text.txt to demo_dst/20241127_151801/test_text.txt