STRATUX – Filesystem Full; Managing disk space redux

Checking in with my Starux project this morning, I found it unresponsive. A physical check shows a flashing red light on the Pi… something has gone haywire, and I couldn’t SSH into the little thing, so a really hard cold restart was in order.

Following the restart I quickly shut Stratux back down to start performing diagnostics:


pi@raspberrypi:~ $ sudo su -

root@raspberrypi:~# service stratux stop

Diagnostics 101

Check Filesystem Health

The last time the little Stratux suddenly had problems it was a filesystem space issue ( previous article ). It turns out the boot drive space was OK but the new Logging partition was again, completely consumed:

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root        1815440 1391284    331164  81% /
devtmpfs          469688       0    469688   0% /dev
tmpfs             474004       0    474004   0% /dev/shm
tmpfs             474004    6340    467664   2% /run
tmpfs               5120       4      5116   1% /run/lock
tmpfs             474004       0    474004   0% /sys/fs/cgroup
/dev/mmcblk0p1     61384   20400     40984  34% /boot
/dev/mmcblk0p4   8125880 8109496         0 100% /var/log

Locate the Culprit

There are a number of ways to locate large file on a *NIX system. My favorite tool is find. First thing I want to do is locate any file that is larger than 1 Gigabyte, and sure enough it located a massive sqlite database file. The same one that ate up all the space on the boot drive. So.. this is going to require some more extrodinary measures to maintain 100% 24×7 operational status.


root@raspberrypi:~# cd /var/log
root@raspberrypi:/var/log# find . -size +1G -exec ls -l {} \;
-rw-r--r-- 1 root root 7463211008 Apr 29 21:48 ./stratux.sqlite
Start Solving

First order of business is to move aside the massive database, but try to preserve the data for examination. Since the filesystem is full, I can’t zip this thing in place, so first some space needs to cleared on the device. The things I’m least interested in go first.. like the zipped syslogs and any other ‘archived’ file (those with a .# suffix).

-rw-r----- 1 root adm      728756 Apr 25 06:25 syslog.5.gz
-rw-r----- 1 root adm     2552845 Apr 26 06:25 syslog.4.gz
-rw-r----- 1 root adm     2447263 Apr 27 06:25 syslog.3.gz
-rw-r----- 1 root adm     2498089 Apr 28 06:25 syslog.2.gz
-rw-r----- 1 root adm    33735478 Apr 29 06:25 syslog.1
-rw-r----- 1 root adm       54307 Apr 30 05:19 debug.1
-rw-r----- 1 root adm      380928 Apr 30 05:19 kern.log.1
-rw-r----- 1 root adm   332365824 Apr 30 06:18 daemon.log.1
-rw-r----- 1 root adm       48723 Apr 30 06:25 auth.log.1
-rw-r----- 1 root adm           0 Apr 30 06:25 syslog.1.gz
-rw-r----- 1 root adm     1032192 Apr 30 06:25 messages.1

root@raspberrypi:/var/log# rm -f *.gz *\.[0-9]

But.. that’s not going to doe the complete trick, especially if after deleting files df still shows 100% utilization. You need to figure out what is holding which deleted file(s).

Normallyh, the best way to do that on *NIX is with lsof. Much to my chagrin, it was not available on the OS… so I had to go grab it. Thankfully I had the main filesystem on a different partition that still had enough space to install more tools! Once lsof was installed, grep through the list of open file handles and find those marked for delete.. and those are the processes that have handles pointing to those files. However, this didn’t help me with Rasperian Jessie. I know that a reboot will recover the space so.. that was the next step.. a brutal warm-boot.


root@raspberrypi:/var/log# lsof
-su: lsof: command not found

root@raspberrypi:/# apt-get install lsof
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libperl4-corelibs-perl
The following NEW packages will be installed:
libperl4-corelibs-perl lsof
...

root@raspberrypi:/var/log# lsof | grep deleted
root@raspberrypi:/var/log#

root@raspberrypi:/var/log# init 6

Archiving Massive SQLite Database

The first thing was to move aside the current database, and then restart stratux to verify it can created a new empty database for it’s purposes.. then shut it right back down again.


root@raspberrypi:/var/log# mv stratux.sqlite stratux.sqlite.1
root@raspberrypi:/var/log# service stratux start
root@raspberrypi:/var/log# ls -l
total 7291480
[...]
-rw-r--r-- 1 root root 4096 Apr 30 14:27 stratux.sqlite
-rw-r--r-- 1 root root 7466160128 Apr 30 14:24 stratux.sqlite.1
[...]
root@raspberrypi:/var/log# service stratux stop

A new empty database file has been created [ 4096 Apr 30 14:27 stratux.sqlite ]. This tells me that moving aside the current database file on a periodic basis, compressing and then archiving it should be sufficient to maintain operational status.

Logging Insanity

Before restarting Stratux, I zerod out these log files. Running for first a few moments these files were already reading up a lot of space. Tailing one of them I see that Stratux, with my current settings is logging A LOT of data to these log files. I feel this was my first error.. enabling too much logging. My settings look like this:

With those settings enabled, there are a lot of GPS and other events that I don’t really have a use for, being dumped into the Statux log.


-rw-r--r-- 1 root root 1361778 Apr 30 14:39 stratux.log

Turning OFF ‘Verbose Message Log’ made that insanity stop.

Replay logging is what is writing to the SQLite database. So the question is. how much of that data do I want to keep, and how much will I lose if I turn off the replay logs. I think that will be research for another day… right now the goal is to recover disk space by compressing the massive database file that was moved aside, and get Stratux stabilized again. Once compressed, check filesystem and file size!


root@raspberrypi:/var/log# gzip stratux.sqlite.1

root@raspberrypi:/var/log# df

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root        1815440 1392384    330064  81% /
devtmpfs          469688       0    469688   0% /dev
tmpfs             474004       0    474004   0% /dev/shm
tmpfs             474004   12272    461732   3% /run
tmpfs               5120       4      5116   1% /run/lock
tmpfs             474004       0    474004   0% /sys/fs/cgroup
/dev/mmcblk0p4   8125880  681792   7008276   9% /var/log
/dev/mmcblk0p1     61384   20400     40984  34% /boot

root@raspberrypi:/var/log# ls -lktr --color --block-size=M stratux.sqlite*

-rw-r--r-- 1 root root 643M Apr 30 14:24 stratux.sqlite.1.gz
-rw-r--r-- 1 root root   8M Apr 30 15:21 stratux.sqlite
-rw-r--r-- 1 root root   1M Apr 30 15:22 stratux.sqlite-shm
-rw-r--r-- 1 root root   5M Apr 30 15:22 stratux.sqlite-wal

With the file compressed, it could be copied elsewhere for analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.