!mkdir -p demo_src
!mkdir -p demo_dst
!rm -rf demo_dst/*
!rm -rf demo_src/*
core
Automated Backups
We want a script to back up a specific file/folder over different intervals. Specifically, it should
- Copy to some destination dir every hour (e.g. a different drive)
- Keep the last 5, and one every day, week and month (for example)
We can then rsync the destination dir to keep a remote backup.
!echo "content" > "demo_src/test_text.txt"
!echo "## content" > "demo_src/test_two.md"
The core functionality
The plan has two main steps:
- Create a new backup
- Clean up any old backups that are no longer needed.
For step 1 we want to go file by file in case of errors, and support a matching pattern for what to include. So, take 2:
"demo_src", file_glob="*.md") # Finding files globtastic(
(#1) ['demo_src/test_two.md']
"demo_src") globtastic(
(#2) ['demo_src/test_text.txt','demo_src/test_two.md']
create_backup
create_backup (src, dest_dir, dry_run=False, recursive:bool=True, symlinks:bool=True, file_glob:str=None, file_re:str=None, folder_re:str=None, skip_file_glob:str=None, skip_file_re:str=None, skip_folder_re:str=None)
'demo_src', 'demo_dst')
create_backup(!ls demo_dst
20241127_151709
!ls demo_dst/20241127_151709
test_text.txt test_two.md
# Test single file
'demo_src/test_text.txt', 'demo_dst', dry_run=True) create_backup(
Copy from demo_src/test_text.txt to demo_dst/20241127_151712
# Test pattern
'demo_src', 'demo_dst', file_glob='*.md', dry_run=True) create_backup(
Copy from demo_src/test_two.md to demo_dst/20241127_151721/test_two.md
# Test skip_pattern
'demo_src', 'demo_dst', skip_file_glob='*.md', dry_run=True) create_backup(
Copy from demo_src/test_text.txt to demo_dst/20241127_151737/test_text.txt
The harder part is the cleanup. Let’s start by generating some dates to test with.
def generate_test_dates(num_dates, base_date):
return [(base_date + timedelta(hours=i)).strftime("%Y%m%d_%H%M%S") for i in range(num_dates)]
= generate_test_dates(2400, datetime.now() - timedelta(days=100))
test_dates print(test_dates[:5], test_dates[-5:])
['20240819_151740', '20240819_161740', '20240819_171740', '20240819_181740', '20240819_191740'] ['20241127_101740', '20241127_111740', '20241127_121740', '20241127_131740', '20241127_141740']
# Can I get all dates < 2 months old?
for d in test_dates if (datetime.now() - datetime.strptime(d, '%Y%m%d_%H%M%S')).days < 60][:3] [d
['20240928_161740', '20240928_171740', '20240928_181740']
Now we want to grab the most recent 5, and then the oldest below some threshold.
clean_dates
clean_dates (dates, now=None, max_ages=(2, 14, 60))
clean_dates(test_dates)
['20240928_161740',
'20241113_161740',
'20241125_161740',
'20241127_101740',
'20241127_111740',
'20241127_121740',
'20241127_131740',
'20241127_141740']
Now we want code that starts with the same test dates etc as above, but then simulates time passing by adding an hour to ‘now’ and a date to test dates every step then printing out a (prettified) version of clean_dates to check it’s doing as I expect over a simulated month.
# # Initialize
# now = datetime.now()
# test_dates = generate_test_dates(2400, now - timedelta(days=100))
# # Simulate time passing
# for _ in range(30 * 24): # Simulate a month (30 days * 24 hours)
# now += timedelta(hours=1)
# test_dates.append(now.strftime("%Y%m%d_%H%M%S"))
# test_dates = clean_dates(test_dates, now) # Clean up old dates
# if _ % 24 == 0: # Print once a day
# print(f"\nDay {_ // 24 + 1}:")
# pprint.pprint(test_dates)
NB: Yay, it looks to be doing mostly what I want! I can collapse the output, if you’re viewing this in a notebook my apologies :)
Turning it into a script
Now that those two pieces of functionality seem to be working, we can wrap this up as a script using fastcore’s call_parse, have it run the backup, clean up old files, and log any errors or messages to backup.log
run_backup
run_backup (src:str, dest:str, max_ages:str='2,14,60', log_file:str='backup.log', dry_run:<function bool_arg>=False, recursive:<function bool_arg>=True, symlinks:<function bool_arg>=True, file_glob:str=None, file_re:str=None, folder_re:str=None, skip_file_glob:str=None, skip_file_re:str=None, skip_folder_re:str=None)
Run backup and cleanup old files. Takes globtastic args.
Type | Default | Details | |
---|---|---|---|
src | str | The source to be backed up | |
dest | str | The destination directory | |
max_ages | str | 2,14,60 | The max age(s) in days for the different backups |
log_file | str | backup.log | |
dry_run | bool_arg | False | Dry run? |
recursive | bool_arg | True | |
symlinks | bool_arg | True | |
file_glob | str | None | |
file_re | str | None | |
folder_re | str | None | |
skip_file_glob | str | None | |
skip_file_re | str | None | |
skip_folder_re | str | None |
!ls demo_src
test_text.txt test_two.md
Testing a directory:
!rm -r demo_dst/*
'demo_src', 'demo_dst')
run_backup(!ls demo_dst
20241127_151747
!ls demo_dst/20241127_151747
test_text.txt test_two.md
Testing a pattern
!rm -r demo_dst/*
'demo_src', 'demo_dst', skip_file_glob="*.md", dry_run=True) run_backup(
Copy from demo_src/test_text.txt to demo_dst/20241127_151801/test_text.txt