Why backups
Having backups of your data is extremely important and you need to take some minutes to brainstorm about it (unless you have no important data on your computer and you use it for online things like checking your email or your data is saved in Google Drive or similar places, where you can recover them from any machine.... unless you forget your passwords!)
Unfortunately sometimes we have a disaster happen (like a hard disk which died suddenly) and is then when we realize that we should have taken care about it...
Types of backups and tools
Over the years I have used many backup techniques and ideas, so I wanted to share on this howto my preferred ones, why, and for which purpose is each one, for example:
- Simple backup: you copy all the contents from one place to another
- suggested tool: rsync, because it only copy the contents changed and not all of them, which can be a waste of time and resources to copy all of them over and over again
- Incremental backups: they are "snapshots" and you can recover a specific state / date of the backup
- suggested tool: rdiff-backup, its like rsync but featuring snapshot states
- Selected backup: you backup only some things and leaving other that may be bigger in size and useless
- suggested tool: your own script / tool
- Secured backups: they are protected with encryption, so that nobody can have access to this data
- suggested tool: duplicity, uses the rsync protocol and encrypts the data using your ssh or gpg key
- Updated computers: this is the tool I use the most, since I share the same work / tools between computers and I need them to have the tools always updated
- suggeted tool: unison, it transfer files between computers, networks or disks, keeping a memory of what has been deleted, you can also merge contents that has been updated in both ends at the same time, etc...
- suggested tool: git, it can be much more manual and for advanced users but stills a powerful option if well brainstormed
Examples and suggestions:
-
For websites: Use a snapshot of the files using rdiff-backup and a dump copy of the database, these are 2 different things but you can easily restore an entire state of a website with this method
-
For personal work:
- Structure correctly your files in your computer (a directory for code, other for images, other for bigger files, for backups, etc..), and then setup unison to share the specific dirs of your important data between computers (or to an external HD, or to a server, remote machine over network, etc...)
- Use duplicity to send the most important data to a remote server, since its encrypted you are safe
- Use always your hard disks with encryption
- Use an rsync or rdiff-backup command to send important data each day to your external disk, from a cronjob, so you don't need to think about it and they are made all the time
Tips
Unison configuration
Update your system (apug) and use a demo configuration for unison, since it can be painful and slow to setup and review all the files:
cd ~
elive-skel upgrade .unison
Then read and edit the files inside ~/.unison
Home structure
Some years ago I have started to use an own Home structure for my files, since then, it worked very good and very compatible with any OS (especially Elive :P), so let me share this concept here that may help others
So, the main structure is separated in 3 main directories:
- Git: small size: this directory contains all the files that must require a history of their changes: mostly for script, confs, and small files
- Inside there's another "git" directory which does automated git commits every 30 minutes, so there's an automated git history that I can go back and check old states in any moment
- Inside that, there's a "home_linked" directory which contains all the files from my home that I want to have saved in changes, for example:
- ~/bin: symbolic link to Git/git/home_linked/bin: all my scripts on my home
- ~/.ssh: symbolic link to Git/git/home_linked/.ssh: my ssh dir and confs
- .crontab , .zshrc, .Xdefaults, .gitconfig, etc... many conf files linked on my home
- You can use the stow command to automate this setup
- Data: medium size: this directory contains all the files that my environment needs to work, including source files and works, let's think about it as "everything that is not needed in the previous one or is too big to be on the previous one"
- Inside, there's another "home_linked" dir for things that doesn't needs a git history, like .config, Downloads, .gimp, .local, .wine, .zsh_history, etc...
- There's also other folders for e.g. source codes, temporary files, and chroots that I constantly use
- DataExtra: bigger size: this directory contains all the non-needed data that is very big, like music or big downloads, backups, chroots, mirrors, virtual machines, etc...
Some years ago I made this structure on my home in order to differentiate correctly what is each thing and where is put, because many applications and other things bloats your home directory with files that you are not meant to know or keep a track of them, so this structure makes things much cleaner and correctly updated / backed up
The confs of Git and Data are of course nicely shared between computers with unison
Both the first and the second one are meant to be located on my SSD disk (which is 256 GB), so they can fit correctly just like the rest of the OS. DataExtra instead, is meant to include links from an external or second hard disk, probably mechanical since is more cheap for the data size
Final notes
Backups are important, and we need to brainstorm "how" they needs to be done, for example:
- is one backup (so, two copies) enough? what if both disks breaks? if our data is really important we need to have at least a "small" third copy of the most important parts
- where are located your backups? you should not have them "in your house", let's imagine you get robbed (this means: your computer and external hard disk), so all your data would be gone, if your data is important you need to think in a way to have at least a backup in a remote place
- do we are limited by some way ? for example if we don't have network (or is very expensive) we need to use smart tools like rsync which only sends the modified data (and not big useless files that we don't need to backup), or to put (update) a copy of your backups physically once per month in a remote place (like, in the house of a friend or family, etc..)