TLDR; After years of searching around I finally found a Linux backup setup that fits my needs: Borg + Vorta + BorgBase
As long as I can think back, backing up my Linux system always was a problem for me. I tried various solutions, from gnomes Déjà Dup over various more or less maintained open source solutions up to manually file system backups with rsync.
Requirements
All the aforementioned solutions lacked either in one or the other of my following requirements.
Automatic in regular intervals
Some people go with the quote: “No backup, no mercy”, but I like to go even further “No automatic backup, no mercy”. You might claim that your manual scheduled backup might be a good solution for you…for me it’s not. Like with New Year’s resolutions: After a short period I forget doing them or the time in between the single backups is too big to really call it an up-to-date backup. And to be honest: I know no one in my circle for whom this manual strategy works. They all lost important data at some point because of their last backup being too old.
I figured, as long as the backup system doesn’t do its work on its own in the background it is of no use to me.
Off-site (cloud storage)
Influenced by the automatic nature of my optimal backup solution I want to have a cloud storage as backup endpoint. This has two reasons:
- Convenience: I don’t want to carry around and plugin an external hard-drive whenever I do a backup
- Fire & theft scenarios: If the rare case happens, that both my laptop and hard-drive get stolen (someone stealing my backpack in the underground) the whole backup solution was of no use. Same for both devices being destroyed in a fire.
Client-side encrypted
Storing all my backup data in a cloud storage comes with a negative privacy implication. Why would I have my laptops hard-drive encrypted, but my backup is stored in clear text in the cloud ?(yep looking at you iCloud Backups)
Only if my backups are encrypted on my machine before being transferred to the cloud storage I can assume (as long as the 5-eyes have no encryption breaking quantum-computers yet) no-one apart from me (not even the cloud provider) can access my data. The requirement for a client side backup key comes with the new problem BackupInception: How to back up this backup-key? More on this at the end of this article.
Open-source client
To facilitate the client-side encryption and to give me confidence that the backup tool is not cheating on me, at least the client of the solution should be open source. The server-side being open source is a nice-to have for me as it guarantees that if my chosen provider is vanishing another could jump into that gap. With a closed source server I depend on the merchant not going away.
Differential backups for Time Machine like file version access
I don’t want to use the backup only for disaster recovery like my laptop breaking or going down in flames, but also for a daily “simon-fucked-up-his-files”-scenario. Sometimes it happens that I delete files from my work git directories or forget to commit. My backup should help me in situation where I fucked up my files and give me the peace of mind to do faster experimenting.
In order to enable this I want my backup system to give me easy access to different states of my file system
Borg backup appears on the scene
As I searched for a viable backup solution I stumbled upon Borg already about 2 years ago and thought it finally solves all my problems. Even me using quite a lot of terminal in my daily work enabling Borg backup on all machines in my household and the cronjob for it was a tidies process. After finally managing to do so, I thought I am all fine and settled. But apparently something with the cronjob (not a real initd but systemd service cronjob thingy) went wrong and the PC of my parents did not get backed up for month without anyone realizing. This is not per se borgs fault, but showed me again that this setup is still no at a stable-don’t-think-about state.
Continuing with my journey through different solutions (Duplicacy,Duplicity) all not being stable or easy to use enough I asked the HackerNews crowd what they would recommend and someone guided me towards Borg+Vorta.
Vorta is a GUI for Borg that runs on all major systems and in combination with this GUI Borg is the fulfillment of all my backup dreams. It ticks all the boxes I opened in the Requirements section and runs smoothly for about a month now. Saved my ass three times and the setup was easy and without problems.
Why Borg backup is so great
Borg is written in C/Cython which gives it a nice performance for the backup creation and the compression of the backup files. It works with differential backups and de-duplication to keep the required resources network and cloud storage small. (e.g. My current backup holds 14TB of backup information but only needs 110GB) Only changed files are send to the cloud storage and thereby a lot of performance is gained.
On top of solid encryption and compression Borg offers a backup mounting feature (FUSE system) which is really great for the time-machine like access. I can simply mount my backup from a certain timestamp and access it via my normal file browser.
With Vorta as GUI the configuration of a Borg backup can be considered as user friendly.
By being 100% open source the source code can be monitored and the bigger the project gets the better it’s code can be. The main developer of Borg Thomas Waldmann offers commercial services around Borg. This gives me hope that he has an ongoing interest in pushing Borg and continuous working on it. On the other hand the founding sources for the Borg project are summing up to about 150€/month at the moment of writing. This is no sufficient funding to enable any of the developers to work full time on Borg in a closer future.
How to use Borg backup
Native CLI
Borg itself offers no GUI and runs completely in the command line. As laid out in the previous paragraph I am definitely not a Borg-CLI expert and don’t want to dig deeper into that topic. Here a three nice sources for getting started with the Borg cli:
You have to configure Borg to use encryption in order to do so
If you are planning to back up a wide array of machines with Borg you should give borgmatic a look. With borgmatic you can configure Borg from a YAML file.
Vorta: A GUI for Borg backup
As pointed out Vorta changed Borg for me and finally made it a viable day-to-day solution. The GUI offers an easy interface to most of the Borg command line parameters. Setting up a backup was straight forward and took me about 3 minutes.
In the Vorta Documentation is described how the setup works.
You have to configure vorta to use encryption in order to do so
Borg backup backend hosting
One downside of Borg: In order to make Borg work with a cloud storage the cloud storage needs to run the Borg backup server. This comes with the limitation that not every cloud storage can be used and the storage itself might be a bit more pricey than e.g. S3 glacier solutions.
There are two different way in order to get a backup server:
Host it yourself
If you are a tech-savvy user and are willing to manage and update a Linux server you can host a Borg backend yourself as described in the Borg documentation
Use a hosted solution
If you are like me and enjoy systems that “just work”™ a hosted solution might be the better way for you. As my Borg backup is encrypted anyways I do not care who has that data, as they have no access to it.
I have three hosting providers at hand which offer a Borg backend. I have no affiliation with any of these providers:
- Rsync.net: $2.5/100GB/month (minimum 400GB)
- Hetzner storage box: 3.5$/100GB/month (minimum 100GB)
- BorgBase: $2.1/100GB/month (minimum 100GB)
For myself I did choose BorgBase, as of two reasons:
- They support the development of Vorta (I hope that Vorta stays around and gets updated as long as BorgBase exists)
- The have a notification functionality which emails me after X days of not doing a backup preventing the aforementioned silent backup failures.
My setup
My backup runs on following setup and configuration:
- Vorta on Arch Linux (dark mode enabled :D)
- BorgBase as backend with a notification if there wasn’t a backup for 5 days
- Keeping 3 hourly, 3 daily, 4 weekly, 6 monthly, 2 annual archives and all archives of the last 72h. (If I fucked something up on a Friday I can scroll trough my hourly backups on Monday and find a working state)
- Validating the repository weekly
A nice trick I discovered with Vorta: You can define to not backup directories which contain a certain file (e.g. .nobackup
). So if I have directories which should not be included in the backup cd [directory];touch .nobackup
excludes them. This process feels a lot easier than entering all the different file paths into the “exclude config”. If you want to find out which directories you excluded with this method you can use find / -name .nobackup 2>&1 | grep -v "Permission"
for that.
How to store the backup encryption key?
The joy of having a client side encrypted backup comes with the downside of having to store that key somehow and accessible to you. If you lose the key, your backup is lost as well!
Sure the key is on your machine which does the backup and for your daily routines of restoring single files you deleted this might be enough. But for the very probable case of you loosing access to your machine (theft, fire, disk failure, laptop falling to the ground and breaking,…) we need to have the key stored somewhere else.
For this scenario I have following setup: The backup encryption key is saved in my password storage (KeepassXC). This password storage itself has a random generated long master password which I consider secure. In order to not lose access to that password storage I save the storage manually a few times a year into an online storage and automatically also back it up to another device.
When my machine breaks a four-step process guarantees me access to my files again:
1) Download password storage file
2) Get backup key from password storage
3) Install Borg
4) Restore data
There is a last case you might want to consider: You neither being able to have access to the encryption key, nor being able to decrypt it from the password storage (coma, death, alzheimer,…). If you fear this case you should give a relative you trust the key and also instructions on how to get access to the data. If you are living in a scenario where there is not a single relative you can trust, systems like Shamirs secret sharing might help you.