Playing With ZFS (on Linux) Encryption
Setup
In order to have a simple way to play with the new features of ZFS, it makes sense to have a safe "sandbox". You can pick an old computer, but in my case I decide to use a VM. It is tempting to use docker, but it won't work because we need a special kernel module to be able to use the zfs tools.
For the setup, I've decide to use VirtualBox and Archlinux, since those are a few tools that I'm more familiar with. And modifying the zfs-dkms package to build from the branch that hosts the encryption PR is really simple.
VirtualBox/vagrant
In order to make this even simpler, we're going to use vagrant to spin up a ready to use archlinux box. I'm going to assume that you have vagrant installed in your system. If you don't know what vagrant is, it is "a tool for building and managing virtual machine environments in a single workflow" (vagrantup.com).
Once you have vagrant, getting the archlinux box running is very
simple and you can use this Vagrantfile
to get a box with zfs with
encryption running in a few minutes.
Vagrant.configure("2") do |config| config.vm.box = "archlinux/archlinux" config.vm.provision "shell", privileged: false, inline: <<-SHELL sudo pacman -Sy --noconfirm yajl base-devel curl -LO https://aur.archlinux.org/cgit/aur.git/snapshot/yaourt.tar.gz curl -LO https://aur.archlinux.org/cgit/aur.git/snapshot/package-query.tar.gz tar xzf yaourt.tar.gz && tar xzf package-query.tar.gz cd /home/vagrant/package-query && makepkg -i --noconfirm . cd /home/vagrant/yaourt && makepkg -i --noconfirm . yaourt -Sy --noconfirm spl-dkms-git yaourt -Sy --noconfirm zfs-dkms-git # install zfs-dkms-git from tcaputi's branch cd /home/vagrant yaourt -G zfs-dkms-git cd /home/vagrant/zfs-dkms-git sed -i 's/github.com\/zfsonlinux\/zfs.git/github.com\/tcaputi\/zfs.git/' PKGBUILD sed -i '24 d' PKGBUILD sed -i '23 a \ \ \ \ echo "${pkgver%%_*}"' PKGBUILD makepkg -i --noconfirm . SHELL end
Put that file somewhere and then run vagrant up
:
$ vagrant up --provider virtualbox ... wait $ vagrant ssh [vagrant@archlinux ~]$ uname -a Linux archlinux 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux
The last line is going to connect you to the already working box. Now that we have zfs running, we can start playing with it. For the remainder of this post, I'll asume you're connected to your new archlinux VM.
Creating a pool and a dataset
Let's create a pool and mount a dataset (a "partition" in zfs lingo). To create the pool, and just as an example, I'll simulate two disks using two 500M files. In a real computer, you might want to use real devices.
NOTE: this section assumes you know how to manage a zfs pool. If you want a more detailed introduction, you can read the docs from Oracle or the FreeBSD handbook. Both are great resources.
# create two 500M files, we will use these files as storage for the # pool, you could use raw devices as well if you want. $ truncate -s 500M zfs_test_1.raw $ truncate -s 500M zfs_test_2.raw $ sudo zpool create zfs_test raidz $PWD/zfs_test_1.raw $PWD/zfs_test_2.raw $ sudo zpool status pool: zfs_test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zfs_test ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /home/vagrant/zfs_test_1.raw ONLINE 0 0 0 /home/vagrant/zfs_test_2.raw ONLINE 0 0 0 errors: No known data errors $ mount | grep zfs zfs_test on /zfs_test type zfs (rw,xattr,noacl) $ sudo zfs create -o compression=on -o atime=off zfs_test/aaa $ sudo chown -R vagrant: /zfs_test/aaa $ mount | grep zfs zfs_test on /zfs_test type zfs (rw,xattr,noacl) zfs_test/aaa on /zfs_test/aaa type zfs (rw,noatime,xattr,noacl) # add some data to the new dataset $ cd /zfs_test/aaa && git clone https://github.com/zfsonlinux/zfs.git $ sudo zfs list -t all NAME USED AVAIL REFER MOUNTPOINT zfs_test 153K 360M 24K /zfs_test zfs_test/aaa 24K 360M 24K /zfs_test/aaa $ sudo zfs snapshot zfs_test/aaa@$(date +%s) $ sudo zfs list -t all NAME USED AVAIL REFER MOUNTPOINT zfs_test 49.1M 311M 25K /zfs_test zfs_test/aaa 48.9M 311M 48.9M /zfs_test/aaa zfs_test/aaa@1497383599 0B - 48.9M -
What we did in this quick zfs session was to create a pool with two files as "devices", they are using a raidz to distribute the info.
After that, a new dataset inside the pool is created and then we add some data to it. Finally, a snapshot of the current dataset is taken so we can back it up later.
Now that we have a working zfs playground, let's add the encryption part.
ZFS: using tcaputi's branch for encryption
Since the zfsonlinux encryption is not yet merged (see PR#5769), we need to use the current working branch, which is in Tom Caputi's repo.
Fortunately, if you used the Vagrantfile
specified at the beginning
of this post, you should already be using this branch:
$ sudo zpool get all zfs_test | grep encryption zfs_test feature@encryption disabled local $ sudo zfs get all zfs_test/aaa | grep encryption zfs_test/aaa encryption off default
Create an encrypted dataset
Ok, now we're ready to finally create an encrypted dataset. For the encryption key, you have different options. From the man-page:
keyformat=raw | hex | passphrase Controls what format the user's encryption key will be provided as. This property is only set when the dataset is encrypted. Raw keys and hex keys must be 32 bytes long (regardless of the chosen encryption suite) and must be randomly generated. Passphrases must be between 8 and 64 bytes long and will be processed through PBKDF2 before being used (see the pbkdf2iters property). Even though the encryption suite cannot be changed after dataset cre‐ ation, the keyformat can be with zfs change-key.
For this example, we'll use a random file of 32 bytes long:
$ dd if=/dev/urandom of=zfs.key.raw bs=1 count=32 32+0 records in 32+0 records out 32 bytes copied, 0.000479432 s, 66.7 kB/s $ hexdump -C zfs.key.raw 00000000 44 88 d2 a7 fd 44 5a 09 f6 5b bc 5b 67 b4 43 52 |D....DZ..[.[g.CR| 00000010 85 a5 c7 59 20 fc 34 bc 49 2b d9 65 8c e4 d9 5b |...Y .4.I+.e...[| 00000020
Now that we have the key, we can create our new shiny encrypted dataset.
$ sudo zfs create \ -o compression=on \ -o encryption=on \ -o keyformat=raw \ -o keylocation=file:///home/vagrant/zfs.key.raw \ zfs_test/bbb $ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT zfs_test 50.4M 310M 25K /zfs_test zfs_test/aaa 48.9M 310M 48.9M /zfs_test/aaa zfs_test/bbb 1.33M 310M 1.33M /zfs_test/bbb $ sudo zfs get encryption zfs_test/aaa NAME PROPERTY VALUE SOURCE zfs_test/aaa encryption off default $ sudo zfs get encryption zfs_test/bbb NAME PROPERTY VALUE SOURCE zfs_test/bbb encryption aes-256-ccm - $ sudo chown -R vagrant: /zfs_test/bbb $ cd /zfs_test/bbb && git clone ... $ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT zfs_test 94.6M 265M 25K /zfs_test zfs_test/aaa 48.9M 265M 48.9M /zfs_test/aaa zfs_test/bbb 45.5M 265M 45.5M /zfs_test/bbb
We can now unmount zfs_test/bbb
and try to mount it again without a
key, it should not be possible:
$ sudo zfs unmount zfs_test/bbb
$ sudo zfs unload-key zfs_test/bbb
$ sudo zfs mount zfs_test/bbb
cannot mount '/zfs_test/bbb': encryption key not loaded
Two ways to solve this: pass the -l
argument to mount
or load the
key beforehand. By passing the -l
argument, you can keep the key in
a secure space (for instance, in a usb drive while booting) and it
will use the property keylocation
that was specified when creating
the dataset to find the key and load it. If the key type is
passphrase
, it will prompt for the key at this point. The other
solution is to use the zfs load-key
command to load the key and then
attempt to mount the volume. The key needs to be loaded if you want to
access the filesystem.
Behavior changes for send and recv
The final part is trying to send
and recv
and how the behavior has
changed from a regular dataset. But basically they respect the
following idea:
- if you
send
a raw (--raw
or-w
) stream, it will be encrypted. This means the receiving dataset/pool does not need to be encrypted, but you will need the same key that it was used to encrypt the original dataset if you want to read the data on the other side. This seems to be the recommended way to create backups (at least for now). - if you
send
a regular stream, it will be sent unencrypted. If you want to encrypt it, then the receiving dataset needs to have an encryption root and it will be encrypted, but it will use another key. This might be useful when storing sensitive information.
In my case, I'm testing the first option for my personal backup. My current setup is as follows:
- laptop running encrypted
/home
with a single disk. - snapshots are sent as frequent as I want to my backup server, incrementally (through ssh).
- snapshots are sent in a raw stream, so they are encrypted. The first encrypted stream was sent complete, not incremental because the previous one was non-encrypted. I'll keep the non-encrypted snapshots around for a bit before deleting them.
- the receiving pool was upgraded and supports encryption, but the new
streams are received in a different tree (it was
backup/rolando
, now it'sbackup/encrypted/rolando
).
Conclusion
Finally we are able to enjoy encryption in zfs natively in linux. This is a feature that was long due. The good thing is that this new implementation improved a few of the problems that the original one had, especially around key management. It is not binary compatible, which is fine in most cases and still not ready to be used in production, but so far I really like what I see.
If you want to follow progress, you can watch the current PR in the official git repo of the project. If everything keeps going ok, I would hope this feature to land in version 0.7.1 (this is just my estimate, I'm not a dev in the project, just a very interested outsider :P).
Update: I just found out this great post by Phillip C. Heckel that explain more in detail the crypto aspects of the implementation. You should definitively read it if you want to learn more about it.