Legato doesn't clean up after itself. Let's teach it some manners


#1

tl;dr Legato changes the smack access label of the crucial /etc/passwd and /etc/group files. Plus, legato stop does not unmount all the directories that Legato mounts. Because of this, you can’t gracefully shutdown and reboot.

Long Version

I’ve made a lot of progress on getting Legato to run on a generic Linux system, largely as a by-product of improvements in Legato 17. These days I’m mainly using an Ubuntu server on a very generic VM instance. I can build the legato system, install it, and run it without much difficulty. This is pretty nice for development of non-hardware-dependent apps. If people are interested, I can post notes on how to do this, plus a small patch.

The main problem I’m having now is that Legato’s cleanup leaves a lot to be desired. After you stop Legato, the system is so badly out of whack that you can’t even shut down properly. You have to power off the machine, and then it doesn’t start back up cleanly either. There aren’t really that many issues, they’re just weird.

Important files have their Smack access labels changed to “framework”

root@leguntu:~# chsmack /etc/* | grep "framework"
/etc/group access="framework"
/etc/passwd access="framework"
/etc/ld.so.cache access="framework"
/etc/ld.so.conf access="framework"

The result of the changed labels is some startup processes fail, and non-root processes run into constant problems.

The /etc/ld.so.* files are intermittent. I suspect they only get changed when the updateDaemon is used.

Legato does not unmount all of its file systems

I took a snapshot of the /etc/mtab file before, during, and after running Legato. The diffs are educational:

In the “before vs during” comparison, you can see a bunch of cgroups, /home, /legato, a couple of smackfs instances, and others.

root@leguntu:~# diff mtab-before mtab-during
16c16
< cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
---
> cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer,release_agent=/legato/systems/current/bin/_appStopClient 0 0
32a33,42
> /dev/mapper/leguntu--vg-root /legato ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> /dev/mapper/leguntu--vg-root /home ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> smack /legato/smack smackfs rw,relatime 0 0
> smack /mnt/flash/legato/smack smackfs rw,relatime 0 0
> cgroupsRoot /sys/fs/cgroup tmpfs rw,relatime 0 0
> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct cgroup rw,relatime,cpu,cpuacct 0 0
> memory /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0
> freezer /sys/fs/cgroup/freezer cgroup rw,relatime,freezer,release_agent=/legato/systems/current/bin/_appStopClient 0 0
> /dev/mapper/leguntu--vg-root /legato/systems/current ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> /dev/mapper/leguntu--vg-root /mnt/flash/legato/systems/current ext4 rw,relatime,errors=remount-ro,data=ordered 0 0

Then, in the “before vs after” comparison, you can see what didn’t get cleaned up by legato stop: /home and /legato are still hanging around, and all the new cgroups. I don’t think the cgroups are a problem, but /home and /legato are.

root@leguntu:~# diff mtab-before mtab-after
16c16
< cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
---
> cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer,release_agent=/legato/systems/current/bin/_appStopClient 0 0
32a33,38
> /dev/mapper/leguntu--vg-root /legato ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> /dev/mapper/leguntu--vg-root /home ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> cgroupsRoot /sys/fs/cgroup tmpfs rw,relatime 0 0
> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct cgroup rw,relatime,cpu,cpuacct 0 0
> memory /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0
> freezer /sys/fs/cgroup/freezer cgroup rw,relatime,freezer,release_agent=/legato/systems/current/bin/_appStopClient 0 0

Not quite a graceful shutdown

I have a script that does some cleanup after running legato stop. It fixes the smack labels of those files in /etc back to floor, and umounts the /home and /legato directories. It almost works! I end up with two errors that keep the machine from shutting down gracefully. I’m still working on these.

[FAILED] failed unmounting /run/user/0
[FAILED] failed unmounting /smack

The good news is that the machine will boot cleanly, since we fixed the passwd and group file labels.

What next?

The calls to umount during shutdown might be easy to add to the source code, and I’ll be happy to push the changes upstream if Sierra wants them. The smack label changes to /etc/passwd and /etc/group are another matter, as I suspect it is really a side-effect of supervisor playing games with users and groups while managing applications.

If someone wants to offer a hint, I’m all ears. Otherwise, I’m back to digging in the source…


#2

Hi @cholmes,

That’s some very nice analysis!

Any chance you can push some changes (even uncleaned) on a fork on GitHub so I can have a look?
I might be able to review and possibly check them in for you.

Only potential issue that I see with a proper clean-up sequence is some potentially longer shutdown time.
On target, performing things like restoring the SMACK label for /etc/passwd could be considered a bit like a waste of time. A bit the same for umount although that’s more debatable.
We could imagine have legato stop --fast and legato stop --clean or something like that though.


#3

I can branch the code easily enough, but where should I push it? I assume my github account doesn’t have access to the legato-af repo. (On github I am slashingweapon.)

Changes so far:

  • Fix the way mktools assembles the compiler path from the x_TOOLCHAIN_DIR and x_TOOLCHAIN_PREFIX. This used to break if the prefix was empty.
  • Add a function to smack.c for retrieving smack labels of files.
  • Update start.c:WriteToFile() to always preserve the Smack label of the modified file.
  • Fix smack label of /etc/ld.so.cache after start is done with it. (modified via ldconfig)
  • legato stop unmounts certain directories. I’m hoping to move this code closer to the entities responsible for mounting them in the first place. (Processes should clean up their own messes.)

In Progress:

  • Update the atomic file library to preserve Smack labels. This is where the /etc/passwd and /etc/group file labels are being modified.

#4

fork legato-af and then push changes to your github fork. Once you think they are ready, update this forum post with links to the relevant commits.


#5

Here is a commit for the /etc/passwd problem.

https://github.com/slashingweapon/legato-af/commit/61089af7bf589c1b512c092203fe066460793e37


#6

Minor update to targetFiles/shared/bin/saveLogs script.

https://github.com/slashingweapon/legato-af/commit/35a0544c7d1ba5dfe97729d82787f5badcbfeecc


#7

Thanks for the changes, as we don’t really have an external submission process yet I’ll cherry-pick them on our internal Gerrit and push them through review & such.

BTW could you please take care of https://github.com/legatoproject/legato-af/blob/master/CONTRIB_INDIVIDUAL.md (or https://github.com/legatoproject/legato-af/blob/master/CONTRIB_INDIVIDUAL.md ) as I would need that to accept your contribution.

Thanks!


#8

https://github.com/slashingweapon/legato-af/commit/41e5872f25c9f21a4e8e548d3bfd40f087e44ff7


#9

Finally, here are the build-system changes. These are the ones I’m less sure of, but they’re important if you want to build a system you can actually deploy and run on localhost. I think the ultimate goal should be to make building for localhost a lot less special.

https://github.com/slashingweapon/legato-af/commit/8868e80d35cde193a225e2ce0b7a7d450cfab2c8