Rsync Backup System [3]: Using Rsync For Full Backup

So last time we talked about how to set up a single disk with ZFS to use compression. Having got our backup disks sorted, the next step is to work out how to use rsync to do the actual backups.

rsync is written by the same people that devised samba and is a very powerful piece of software. What we need to do is to first of all set up a dedicated backup user on the system that is to be backed up, that has only read access to the filesystem. Having this is extremely important in case we make a mistake and accidentally overwrite the files on the source filesystem (which is quite possible with such a powerful system).
Using useradd we can set up backupuser in this case with a very simple password consisting of four consecutive digits and then by default it will have read only access to other users’ home drives.  This means it can read my home directory. backupuser in this case is used with SSH to access the home drive over the network for creating the backup.
The next step is to look at the required form of the rsync command. There are certainly a lot of options for this. Assuming the backup destination zpool has been mounted to /mnt/backup/fullbackup and the source is on 192.168.x.y at /home/patrick then a good starting point for the backup would be along the following lines:
rsync -arXvz –progress –delete backupuser@192.168.x.y:/home/patrick/ /mnt/backup/fullbackup/patrick –log-file=/home/patrick/rsync.log
This looks to cover all bases needed. The options are -a which means copy in archive mode (preserving all source file attributes. -r which means recurse into source directories, -v which means verbose and -z which means compress during transfers (which can speed things up). –progress means during each file transfer you will see the bytes transferred, speed, percentage and ETA which is useful for large files. –delete will remove extraneous files from destination directories (which means files that aren’t in the source). This option is obviously useful when a file gets deleted or moved in the source. -X means to preserve extended file attributes if any exist.
At the moment I am testing this with a full backup of serverpc with this running in a terminal window. There are some issues with some directories not having permission for backupuser which so far has only affected a few hidden folders but will have to be looked into further. Previous backups always used patrick as the user but it is pretty important to have a special backup user with restricted permissions which is really a best practice for any kind of professional computer setup. An example is because a mistake with the rsync command could wipe out the source directory if there was read-write access to it.

After looking at log file options there is this option that we can set the log file format for rsync using an extra parameter and this is what I came up with in deciding what format of information would be useful in each line of the rsync log:

rsync -arXvz –progress –delete backupuser@192.168.x.y:/home/patrick/ /mnt/backup/fullbackup/patrick –log-file=/home/patrick/rsync.log –log-file-format “%a|%f|%M|%l|%b|%o|%U”

Using in particular the log file format option, it seems to be usefully logging the information we need for each file. However the %a logfile option does not actually seem to be recognised by rsync. 

There is one thing that rsync does not do by itself and that is incremental backups to a separate disk. rsync is designed to be able to by default create incremental backups in a separate directory from the one where the full backup is stored, but the full backup directory must be online at the time when the incremental is done so that it knows which files have changed. This is a problem when, as I intend, you want to be able to use separate disks for the incrementals. Here you see the fundamental issue with the Linux file system (at least ext4) that does not have the separate archive flag for a file that NTFS has. The argument goes that it isn’t necessary but that archive bit is the way you can tell that a file has been modified since the backup last ran, and increment it. 
There are several possible solutions to this and one of them is to create an extended attribute for each source file. This is possible using the setfattr command. So we could create this on each of the source files following the full backup (it would have to be a separate script process following the execution of rsync). Maybe the script would be part of a verify process that we run after the backup is carried out, to verify that the source and destination files both exist, and then write the extended attribute to the source file. The issue is that it may well prove difficult to be sure the source file was backed up unless we can have a look at a log file and prove that it lists the source files and then feed that into a script that produces the extended attribute. Anyway, the point of this is to write an extended attribute to each source file that says when that file was last backed up. This could be a part of an incremental script that uses find to get all files that were last backed up since a certain date. Or we can just use a range of dates for the find command to get the incremental file list using the last modified file time. This will all be looked at in more detail when I start working on incrementals, because for now I am just doing a full backup like I did before with rdiff-backup.