The special thing about Node is that it is image based. And we would like to keep it this way.
An image - or the rootfs in the image - is a set of packages which can be tested as a whole and delivered as a whole. So we can be sure that the package combination on the host where Node is running is the same combination of packages we tested.
imgbased is now an idea where we get a similar functionality to Node’s read-only rootfs but one layer down, using LVM.
Previously we operated on files, allowing change sby using bind-mounts to a writable partition.
By using LVM we just provide a write-able volume atop the read-only volume carrying the rootfs. At boot time you then decide into which (write-able) volume you would like to boot. In ascii art:
+--+ Config (LV)
+--+ Base-0 (LV, ro)
| + Base-0.1 (LV, rw)
| + Base-0.2 (LV, rw)
+ Base-1 (LV, ro)
| + Base-1.1 (LV, rw)
This makes much stuff easier. Yum can work. No need for bind mounts, selinux has no hassles, and the persistence idea also works between “bases”.
Having the original rootfs around makes it easy to create a delta between the original rootfs and the write-able volume.
Upgrades work by adding a new volume with the contents of a new rootfs, which then get’s a new “layer” by adding the write-able volume atop. Partially changes can be persisted by copying files (in a whitelist fashion) between the current and new write-able layer.
With Node we also keep the last image around, to provide a fallback in case that the new image has some kind of regression.
This is also possible with imgbased. You can basically keep as many bases (and their write-able layers) around and discard them at some point later on.
The drawback so far is, that you’ve got a slightly higher space requirement at runtime, because we don’t store the “squashed” rootfs (~250MB), but the sparse rootfs (~800MB). But maybe we can mitigate this a bit my optimizing our minimizer, or maybe it’s even a viable way to go, as the deployment size (the size of the ISO) does not change.
There are still unsolved issues
- Getting anaconda to really do the installation
- Verify that all parts work as imagined
- Make the user aware that not all changes are persisted between upgrades
What I like most about this solution is, is that it is so - upstream. It only use mature available technologies.
LVM thin volumes, ext4 with the discard option (this will free space again as soon as you delete a file) and the BootLoaderSpec.
Well, we will need to go with real grub later on, because the BLS has no way to define the default boot entry, which is needed to always boot into the latest layer.
In general this approach also works for regular hosts. And if you are brave you can even look at the sources and setup your host in a similar fashion :)
For the interested reader - some more background.
Our current implementation is using bind mounts to make the roots partially write-able - well, we are actually bind mounting write-able places to targets in the rootfs - which is technically more correct. But this approach has limitations:
- It doesn’t work in the early boot process e.g. with /etc/fstab or with systemd
- We maintain our own installer to handle the image based installation
- We need to maintain custom SELinux rules because we bind-mount many files on tmpfs or somewhere else
- It works well for individual files, but is becoming messy for many files.
The first three items are actually causing the most pain. Lately I started to look at the second point. The initial idea was to use python-livet for storage handling. I continued to surf and thought that we could reuse even larger parts of anaconda … I surfed on the waves of trying to find ways were we could build our Node on the shoulders of others, so we can concentrate on other part sof Node - making it solid.
Once I landed there I tried to rethink the whoel Node, but based on available concepts which are already there in the Fedora land, which we can reuse - and don’t have to reinvent.