imgbased - or keeping the nature of Node

The special thing about Node is that it is image based. And we would like to keep it this way.

An image - or the rootfs in the image - is a set of packages which can be tested as a whole and delivered as a whole. So we can be sure that the package combination on the host where Node is running is the same combination of packages we tested.

imgbased is now an idea where we get a similar functionality to Node’s read-only rootfs but one layer down, using LVM.

Previously we operated on files, allowing change sby using bind-mounts to a writable partition.

By using LVM we just provide a write-able volume atop the read-only volume carrying the rootfs. At boot time you then decide into which (write-able) volume you would like to boot. In ascii art:

+ VG
|
+--+ Config (LV)
|
+--+ Base-0 (LV, ro)
   |\
   | \
   |  + Base-0.1 (LV, rw)
   |  |
   |  + Base-0.2 (LV, rw)
   |
   |
   + Base-1 (LV, ro)
   |\
   | \
   |  + Base-1.1 (LV, rw)
   :

This makes much stuff easier. Yum can work. No need for bind mounts, selinux has no hassles, and the persistence idea also works between “bases”.

Having the original rootfs around makes it easy to create a delta between the original rootfs and the write-able volume.

Upgrades work by adding a new volume with the contents of a new rootfs, which then get’s a new “layer” by adding the write-able volume atop. Partially changes can be persisted by copying files (in a whitelist fashion) between the current and new write-able layer.

With Node we also keep the last image around, to provide a fallback in case that the new image has some kind of regression.

This is also possible with imgbased. You can basically keep as many bases (and their write-able layers) around and discard them at some point later on.

The drawback so far is, that you’ve got a slightly higher space requirement at runtime, because we don’t store the “squashed” rootfs (~250MB), but the sparse rootfs (~800MB). But maybe we can mitigate this a bit my optimizing our minimizer, or maybe it’s even a viable way to go, as the deployment size (the size of the ISO) does not change.

There are still unsolved issues

  • Getting anaconda to really do the installation
  • Verify that all parts work as imagined
  • Make the user aware that not all changes are persisted between upgrades

What I like most about this solution is, is that it is so - upstream. It only use mature available technologies.

LVM thin volumes, ext4 with the discard option (this will free space again as soon as you delete a file) and the BootLoaderSpec.

Well, we will need to go with real grub later on, because the BLS has no way to define the default boot entry, which is needed to always boot into the latest layer.

In general this approach also works for regular hosts. And if you are brave you can even look at the sources and setup your host in a similar fashion :)

For the interested reader - some more background.

Our current implementation is using bind mounts to make the roots partially write-able - well, we are actually bind mounting write-able places to targets in the rootfs - which is technically more correct. But this approach has limitations:

  • It doesn’t work in the early boot process e.g. with /etc/fstab or with systemd
  • We maintain our own installer to handle the image based installation
  • We need to maintain custom SELinux rules because we bind-mount many files on tmpfs or somewhere else
  • It works well for individual files, but is becoming messy for many files.

The first three items are actually causing the most pain. Lately I started to look at the second point. The initial idea was to use python-livet for storage handling. I continued to surf and thought that we could reuse even larger parts of anaconda … I surfed on the waves of trying to find ways were we could build our Node on the shoulders of others, so we can concentrate on other part sof Node - making it solid.

Once I landed there I tried to rethink the whoel Node, but based on available concepts which are already there in the Fedora land, which we can reuse - and don’t have to reinvent.

Automatically testing VMs using pexpect and qemu

Igor does his job, testing Node images on virtual machines and real hardware. But one drawback is that it’s “big”. It is not huge, but it is not directly usable out of the box.

Out of the blue - while playing with some ideas - I coupled pexpect (now updated to 3.1 in Fedora) with qemu (and a stdio serial).

What I got is a simple mechanism to control a VM. With much less assumptions then Igor has.

The drawback is obviously that only VMs can be tested, but the benefit is that it is very simple, and there no high requirements. And this is also limited to the console - which is optimal for Node.

Long story short, the snippet below takes an image, boot’s it and tries to login via the serial console.

This can further be extended by using 9pfs to exchange data between the host and guest.

A minimal functional test is now also in imgbased to test some basic functionality at runtime.

This doesn’t seem to be a new idea.

Drop down terminal for GNOME and taskwarrior

There are a couple of terminals out there which slide down from the top of the screen.

image

But this one was amazingly easy to install and performs ver well on my Fedora 19 machine. It’s a GNOME Shell extension so will - obviously - not work with any other DE than GNOME.

My usecase is mainly to have quick access to a shell and it’s very nice to have this in conjunction with taskwarrior (yum install -y task) which is a quite nice alternative to the GUI based todo lists.

Amazing is the tab-completion on tags and arguments.

image

Raising the append line char limit by upgrading pxelinux

IIUIC the number of chars appended to the kernel during PXE boot is limited by two things: The kernel itself and the bootloader - pxelinux in my case.

Ancient kernels had a limit of 256 chars, but this limit was raised and doesn’t exist with 2.6 anymore.

What I did not know was that the pxelinux limitation was also raised. Now it is possible to pass up to 2048 chars - I also read 4096 chars somewhere else. At least many more then previously (1024 or so, at least lower).

So updating pxelinux to some recent version will allow you to pass more arguments to the booted kernel. This is especially relevant for igor, when testing Node.

Insight in how the early boot process works

Beeing in the situation to write a new man page, I rememberd that dracut is using asciidoc. Looking at dracuts manpages I found this great diagram illustrating the early boot process in - at least - Fedora.

It seems an artist did some work.

                                    systemd-journal.socket
                                               |
                                               v
                                    dracut-cmdline.service
                                               |
                                               v
                                    dracut-pre-udev.service
                                               |
                                               v
                                     systemd-udevd.service
                                               |
                                               v
local-fs-pre.target                dracut-pre-trigger.service
         |                                     |
         v                                     v
 (various mounts)  (various swap  systemd-udev-trigger.service
         |           devices...)               |             (various low-level   (various low-level
         |               |                     |             services: seed,       API VFS mounts:
         v               v                     v             tmpfiles, random     mqueue, configfs,
  local-fs.target   swap.target     dracut-initqueue.service    sysctl, ...)        debugfs, ...)
         |               |                     |                    |                    |
         \_______________|____________________ | ___________________|____________________/
                                              \|/
                                               v
                                        sysinit.target
                                               |
                             _________________/|\___________________
                            /                  |                    \
                            |                  |                    |
                            v                  |                    v
                        (various               |              rescue.service
                       sockets...)             |                    |
                            |                  |                    v
                            v                  |              rescue.target
                     sockets.target            |
                            |                  |
                            \_________________ |                                 emergency.service
                                              \|                                         |
                                               v                                         v
                                         basic.target                             emergency.target
                                               |
                        ______________________/|
                       /                       |
                       |                       v
                       |            dracut-pre-mount.service
                       |                       |
                       |                       v
                       |                  sysroot.mount
                       |                       |
                       |                       v
                       |             initrd-root-fs.target
           (custom initrd services)            |
                       |                       v
                       |             dracut-mount.service
                       |                       |
                       |                       v
                       |            initrd-parse-etc.service
                       |                       |
                       |                       v
                       |            (sysroot-usr.mount and
                       |             various mounts marked
                       |               with fstab option
                       |                x-initrd.mount)
                       |                       |
                       |                       v
                       |                initrd-fs.target
                       \______________________ |
                                              \|
                                               v
                                          initrd.target
                                               |
                                               v
                                    dracut-pre-pivot.service
                                               |
                                               v
                                     initrd-cleanup.service
                                          isolates to
                                    initrd-switch-root.target
                                               |
                                               v
                        ______________________/|
                       /                       |
                       |        initrd-udevadm-cleanup-db.service
                       |                       |
           (custom initrd services)            |
                       |                       |
                       \______________________ |
                                              \|
                                               v
                                   initrd-switch-root.target
                                               |
                                               v
                                   initrd-switch-root.service
                                               |
                                               v
                                          switch-root
----

AUTHOR
------
Harald Hoyer

Clover / Meas OpenCL for AMD / ATI r600 lands in rawhide

This patch enables libOpenCL in mesa which means that libOpenCL from mesa will land inrawhide soon.

    # To test libOpenCL
    $ yum insall -y clinfo mesa-libOpenCL
    $ clinfo

Time to really get the showcase applications into place to test the implementations.

rust-0.9 (as in Mozilla’s rust-lang) in Fedora

rust-0.9 has been released some time ago and now also an updated package is available for Fedora through Copr.

Go and grab it.

Mesa’s OpenCL state tracker on Fedora is around the corner

Thanks Igor. (Not the oVirt Node related one). ignatenkobrain form Fedora’s mesa fame.

He invested some time and prepared a build with the correct build switches to build mesa-10 with OpenCL support. After a bit of fiddling (installing necessary deps [libclc-devel]) the result was this:

Number of platforms                               1
  Platform Name                                   Default
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 10.0.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd

  Platform Name                                   Default
Number of devices                                 1
  Device Name                                     AMD PALM
  Device Vendor                                   X.Org
  Device Version                                  OpenCL 1.1 MESA 10.0.2
  Driver Version                                  10.0.2
  Device OpenCL C Version                         OpenCL C 1.1
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               1
  Max clock frequency                             0MHz
  Device Partition                                (n/a)
  Max work item dimensions                        3
    Max work item size[0]                         256
    Max work item size[1]                         256
    Max work item size[2]                         256
  Max work group size                             256
  Preferred work group size multiple              1
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (n/a)
  Half-precision   Floating-point support         (n/a)
  Single-precision Floating-point support        
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Correctly-rounded divide and sqrt operations  No
    Support is emulated in software               No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              201326592 (   192MB)
  Error Correction support                        No
  Max memory allocation                           50331648 (    48MB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       128 bits (16 bytes)
  Global Memory cache type                        None
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             32768x32768x32768 pixels
    Max number of read image args                 32
    Max number of write image args                32
  Local memory type                               Local
  Local memory size                               32768 (    32KB)
  Max constant buffer size                        65536 (    64KB)
  Max number of constant args                     13
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               

This is amazing. The first run of clinfo on Fedora with Mesas OpenCL implementation.

  • Upstream - fix mesa.icd to contain absolute path to .so
  • Add correct mesa-libOpenCL dependencies on libclc-devel, mesa-libgbm
  • Discuss necessary global mesa changes
  • Koji builds

Spec changes are on github.

Mesa’s clover has now ICD support

Thrilled! After a long time - and much much work by curros and others ICD support has landed for Mesa’s clover OpenCL state tracker. This will allow us to install Mesa as an opencl provider in parallel to others (which support OpenCL ICD) - like pocl.

fn main() // Rusts entry point - and an update to 0.8

Rust 0.8 was released two some days ago. The spec is already updated and a build was triggered in Copr. Sadly Copr currently fails to build rust on all arches. But I’ve got faith that there will be a complete build ready within the next days. Besides that there are builds for some arches which can be used.

Anyhow, besides 0.8 being ready I’ve also played with rustpkg - a high-level tool for rustc and friends.

If you follow the intended filesystem layout you get benchmarks and tests for free - well, you’ll still have to write them, but they - benchmark and tests - are part of the regular build.

And that is what the example.rs layout is about. To practice what rustpkg excepts.

And never ever underestimate decent tests. Start with them right from the beginning - and you’ll have an easier life :) That’s actually a nice habbit Ive seen in software around all those popular languages in popular niches (aka mobile or nodejs) .. That quite some really take care about writing tests.

Having fun with exporting automake variables - or how it was obviously a pyflakes bug (no, it wasn’t) …

Each line of a Makefile target is executed in a sub-shell, this has pros and cons, but at least the following side-effect can be observed:

fun:
    export FOO="wunder bar"
    echo $$FOO

This doesn’t echo “wunder bar” to the cmdline when you run make because the export happens in a different shell then the echo.

You can anyhow set environment variables using the export keyword:

export FOO="wunder bar"
fun:
    echo $FOO

This now echos "wunder bar" to the cmdline when you run make.

So what is problematic with this? The quotes. The environment variable FOO will contain the string "wunder bar" - including the quotes! This happens because export is now the automake keyword, and that means the whole part on the right of the equals sign (=) is the value of the variable named on the left hand side (FOO). And in the example above this includes the quotes.

What happened to me earlier on was that I tried to set an environment variable (PYFLAKES_BUILTINS) which got picked up by pyflakes. I used:

export PYFLAKES_BUILTINS="_"
fun:
    pyflakes file.py

Now, spot the problem.

Solution: pyflakes (correctly) didn’t behave as expected (handle _ as a built-in), because the value of the variable was “_”, instead of _.

$ export PYFLAKES_BUILTINS="_"
$ pyflakes file.py

The above snippet on the other hand works oin the shell, because the quotes (in the export line) are interpreted by the shell, and are nto part of the PYFLAKES_BUILTIN variable.

Binary diff between libc from ScientificLinux and CentOS

In our IRC channel the question arose how we can see the difference between binaries build on different systems (SL and CentOS in this case).

Our initial test binary was created using this snippet:

yum install vala glib2-devel
echo 'print("Hello");' > a.vala
valac a.vala && objdump -D a

We came then up with this small script to automate the diff creation:

dump() {
TMPFILE=$(mktemp)
objdump -D $1 > $TMPFILE
cat $TMPFILE | sed "s/^[[:space:]]\+[0-9a-f]\+//"
rm $TMPFILE
}
 
cmp_dump() {
dump $1 > dump_a
dump $2 > dump_b
diff -u dump_a dump_b
rm dump_a dump_b
}
 
# Usage: cmp_dump first-binary second-binary
$@

So to compare two binaries and return the diff between their obj dump output (without addresses) you run:

bash cmp_dump.sh libc-sl libc-centos

This lead us to this result for libc from CentOS and ScientificLinux:

    --- dump_a      2013-09-11 12:39:30.799452944 +0200
    +++ dump_b      2013-09-11 12:39:32.061467634 +0200
    @@ -1,5 +1,5 @@
     
    -libc-2.12.so-centos6:     file format elf64-x86-64
    +libc-2.12.so-sl6:     file format elf64-x86-64
     
     
     Disassembly of section .note.gnu.build-id:
    @@ -13,13 +13,18 @@
     :      00 00                   add    %al,(%rax)
     :      47                      rex.RXB
     :      4e 55                   rex.WRX push %rbp
    -:      00 34 95 f7 bc a6 d9    add    %dh,-0x26594309(,%rdx,4)
    -:      9e                      sahf  
    -:      30 1d 7c 9f 95 9a       xor    %bl,-0x656a6084(%rip)        # ffffffff9a95a209 <_end+0xffffffff9a5c7961>
    -:      87 0f                   xchg   %ecx,(%rdi)
    -:      8e 77 c0                mov    -0x40(%rdi),%?
    -:      f0                      lock
    -:      b7                      .byte 0xb7
    +:      00 9a df 3d a2 9c       add    %bl,-0x635dc221(%rdx)
    +:      0c 20                   or     $0x20,%al
    +:      70 7f                   jo     308 <data.10540+0x2a8>
    +:      ed                      in     (%dx),%eax
    +:      6d                      insl   (%dx),%es:(%rdi)
    +:      49 9a                   rex.WB (bad)
    +:      2b 20                   sub    (%rax),%esp
    +:      0e                      (bad)  
    +:      67                      addr32
    +:      6b                      .byte 0x6b
    +:      68                      .byte 0x68
    +:      b1                      .byte 0xb1
     
     Disassembly of section .note.ABI-tag:
     
    @@ -486756,6 +486761,5 @@
     :      62                      (bad)  
     :      75 67                   jne    79 <data.10540+0x19>
     :      00 00                   add    %al,(%rax)
    -:      c7                      (bad)  
    -:      fc                      cld    
    -:      87 37                   xchg   %esi,(%rdi)
    +:      da 38                   fidivrl (%rax)
    +:      73 e0                   jae    fffffffffffffff8 <_end+0xffffffffffc6d750>

This ain’t a big difference for a 2M+ file, is it?

So is this a valuable way to compare binaries?

At the end there is the question: Can this be used to compare if binaries from an RPM were really created from the sources of it’s SRPM?

FrOSCon, St. Augustin, 2013

It’s the first time that I’m attending FrOSCon - a nice and small conference in St. Augustin, near Bonn in Germany.

Fedora booth with Felix and Aleksandra - right beside our friends from CentOS.

In the front - and a closeup right below - you can see the RepRap which Miro used to create the nice Fedora key fobs.


On Sunday - right now - there is also a Fedora devroom, with some Fedora related talks. Sadly I missed Felix “The Cat Is Alive And Running Out Of The Box”.

PHP still seems to be very popular - at least it has a very prominent presence at the conference. In general a nice small conference.

I also gave a talk about Igor - a tool to test a distribution, the corresponding slides can be found here.

Secure Simple Pairing with IPSec - that would be nice.

Why can’t establishing a local (or global) IPSec connection be as easy as Bluetooths Secure Simple Pairing?

This would be ideal for quickly establishing a secure connection between two devices. After the pairing the last challenge in keeping the global conneciton is traversing the middle boxes. But at least the secure key exchange already happened, and with numeric comparison you even can prevent MITM attacks.

This can happen at e.g. conferences or so. Anytime we meet face to face, or got another out-of-band channel.