Checking in with my Starux project this morning, I found it unresponsive. A physical check shows a flashing red light on the Pi… something has gone haywire, and I couldn’t SSH into the little thing, so a really hard cold restart was in order.
Following the restart I quickly shut Stratux back down to start performing diagnostics:
pi@raspberrypi:~ $ sudo su -
root@raspberrypi:~# service stratux stop
Diagnostics 101
Check Filesystem Health
The last time the little Stratux suddenly had problems it was a filesystem space issue ( previous article ). It turns out the boot drive space was OK but the new Logging partition was again, completely consumed:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 1815440 1391284 331164 81% /
devtmpfs 469688 0 469688 0% /dev
tmpfs 474004 0 474004 0% /dev/shm
tmpfs 474004 6340 467664 2% /run
tmpfs 5120 4 5116 1% /run/lock
tmpfs 474004 0 474004 0% /sys/fs/cgroup
/dev/mmcblk0p1 61384 20400 40984 34% /boot
/dev/mmcblk0p4 8125880 8109496 0 100% /var/log
Locate the Culprit
There are a number of ways to locate large file on a *NIX system. My favorite tool is find. First thing I want to do is locate any file that is larger than 1 Gigabyte, and sure enough it located a massive sqlite database file. The same one that ate up all the space on the boot drive. So.. this is going to require some more extrodinary measures to maintain 100% 24×7 operational status.
root@raspberrypi:~# cd /var/log
root@raspberrypi:/var/log# find . -size +1G -exec ls -l {} \;
-rw-r--r-- 1 root root 7463211008 Apr 29 21:48 ./stratux.sqlite
Start Solving
First order of business is to move aside the massive database, but try to preserve the data for examination. Since the filesystem is full, I can’t zip this thing in place, so first some space needs to cleared on the device. The things I’m least interested in go first.. like the zipped syslogs and any other ‘archived’ file (those with a .# suffix).
-rw-r----- 1 root adm 728756 Apr 25 06:25 syslog.5.gz
-rw-r----- 1 root adm 2552845 Apr 26 06:25 syslog.4.gz
-rw-r----- 1 root adm 2447263 Apr 27 06:25 syslog.3.gz
-rw-r----- 1 root adm 2498089 Apr 28 06:25 syslog.2.gz
-rw-r----- 1 root adm 33735478 Apr 29 06:25 syslog.1
-rw-r----- 1 root adm 54307 Apr 30 05:19 debug.1
-rw-r----- 1 root adm 380928 Apr 30 05:19 kern.log.1
-rw-r----- 1 root adm 332365824 Apr 30 06:18 daemon.log.1
-rw-r----- 1 root adm 48723 Apr 30 06:25 auth.log.1
-rw-r----- 1 root adm 0 Apr 30 06:25 syslog.1.gz
-rw-r----- 1 root adm 1032192 Apr 30 06:25 messages.1
root@raspberrypi:/var/log# rm -f *.gz *\.[0-9]
But.. that’s not going to doe the complete trick, especially if after deleting files df still shows 100% utilization. You need to figure out what is holding which deleted file(s).
Normallyh, the best way to do that on *NIX is with lsof. Much to my chagrin, it was not available on the OS… so I had to go grab it. Thankfully I had the main filesystem on a different partition that still had enough space to install more tools! Once lsof was installed, grep through the list of open file handles and find those marked for delete.. and those are the processes that have handles pointing to those files. However, this didn’t help me with Rasperian Jessie. I know that a reboot will recover the space so.. that was the next step.. a brutal warm-boot.
root@raspberrypi:/var/log# lsof
-su: lsof: command not found
root@raspberrypi:/# apt-get install lsof
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libperl4-corelibs-perl
The following NEW packages will be installed:
libperl4-corelibs-perl lsof
...
root@raspberrypi:/var/log# lsof | grep deleted
root@raspberrypi:/var/log#
root@raspberrypi:/var/log# init 6
Archiving Massive SQLite Database
The first thing was to move aside the current database, and then restart stratux to verify it can created a new empty database for it’s purposes.. then shut it right back down again.
root@raspberrypi:/var/log# mv stratux.sqlite stratux.sqlite.1
root@raspberrypi:/var/log# service stratux start
root@raspberrypi:/var/log# ls -l
total 7291480
[...]
-rw-r--r-- 1 root root 4096 Apr 30 14:27 stratux.sqlite
-rw-r--r-- 1 root root 7466160128 Apr 30 14:24 stratux.sqlite.1
[...]
root@raspberrypi:/var/log# service stratux stop
A new empty database file has been created [ 4096 Apr 30 14:27 stratux.sqlite ]. This tells me that moving aside the current database file on a periodic basis, compressing and then archiving it should be sufficient to maintain operational status.
Logging Insanity
Before restarting Stratux, I zerod out these log files. Running for first a few moments these files were already reading up a lot of space. Tailing one of them I see that Stratux, with my current settings is logging A LOT of data to these log files. I feel this was my first error.. enabling too much logging. My settings look like this:
With those settings enabled, there are a lot of GPS and other events that I don’t really have a use for, being dumped into the Statux log.
-rw-r--r-- 1 root root 1361778 Apr 30 14:39 stratux.log
Turning OFF ‘Verbose Message Log’ made that insanity stop.
Replay logging is what is writing to the SQLite database. So the question is. how much of that data do I want to keep, and how much will I lose if I turn off the replay logs. I think that will be research for another day… right now the goal is to recover disk space by compressing the massive database file that was moved aside, and get Stratux stabilized again. Once compressed, check filesystem and file size!
root@raspberrypi:/var/log# gzip stratux.sqlite.1
root@raspberrypi:/var/log# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 1815440 1392384 330064 81% /
devtmpfs 469688 0 469688 0% /dev
tmpfs 474004 0 474004 0% /dev/shm
tmpfs 474004 12272 461732 3% /run
tmpfs 5120 4 5116 1% /run/lock
tmpfs 474004 0 474004 0% /sys/fs/cgroup
/dev/mmcblk0p4 8125880 681792 7008276 9% /var/log
/dev/mmcblk0p1 61384 20400 40984 34% /boot
root@raspberrypi:/var/log# ls -lktr --color --block-size=M stratux.sqlite*
-rw-r--r-- 1 root root 643M Apr 30 14:24 stratux.sqlite.1.gz
-rw-r--r-- 1 root root 8M Apr 30 15:21 stratux.sqlite
-rw-r--r-- 1 root root 1M Apr 30 15:22 stratux.sqlite-shm
-rw-r--r-- 1 root root 5M Apr 30 15:22 stratux.sqlite-wal
With the file compressed, it could be copied elsewhere for analysis.