If you got multiple computers with some simple naive software writing a small statistics log file every minute to a write-only shared directory, it is not hard to get there.
Note that this is obviously not a good solution for logging, and I don't endorse this practice, but this is what we had, and simple commands like the following were no longer completing in reasonable time:
cp /mnt/stat/data/2019_06_23_* /home/kalmi/incident_06_23/
Running ls or find was taking "forever", and was affecting the performance of the rest of the system. I had never considered before that ls by default has to load the whole directory into memory (or swap 👻) for sorting the entries.
It was getting quite inconvenient. Something needed to be done, because I wanted to do some processing on some of the existing files, and for that I needed commands to complete in reasonable time.
I decided to move the existing files into the following folder structure: year/month/day
I decided I would write a Python script to do the job of moving the existing files. The original filenames contained a timestamp, and that's all that is needed to determine the destination. I couldn't find a Python library that allowed for unsorted listing of such large directories in reasonable time without buffering the whole thing. I expected to be able to just iterate with glob.glob, but it wasn't performing as expected.
I already figured out earlier that ls with right flags can do that, so I just piped the output of ls to my Python script that actually moved the files, and the following solution was born:
cd /mnt/stat/data && ls -p -1 -f | grep -v / | python3 ../iter.py3
This previous line got added to cron, so that we wouldn't get to a billion files again.
This previous line got added to cron, so that we wouldn't get to a billion files again.
The contents of iter.py3:
import fileinput
import string
import os
import errno
for line in fileinput.input():
print(line.strip())
if len(line.strip().split('_')) != 4:
print('SKIPPED')
continue
name, date, hour, ns = line.strip().split('_')
if len(date.split('-')) != 3:
print('SKIPPED')
continue
year, month, day = date.split('-')
if not year.isdigit() or not month.isdigit() or not day.isdigit():
print('SKIPPED')
continue
try:
os.mkdir(year)
except OSError as e:
if e.errno == errno.EEXIST:
pass
else:
raise
try:
os.mkdir(os.path.join(year,month))
except OSError as e:
if e.errno == errno.EEXIST:
pass
else:
raise
try:
os.mkdir(os.path.join(year,month,day))
except OSError as e:
if e.errno == errno.EEXIST:
pass
else:
raise
os.rename(line.strip(), os.path.join(year,month,day,line.strip()))
After running the script, the directory only contained a few folders, but running ls in that directory would still take a few seconds. It's interesting to me that the directory remained affected in this way. I guess ZFS didn't clean-up its now empty structures.
After running the script, the directory only contained a few folders, but running ls in that directory would still take a few seconds. It's interesting to me that the directory remained affected in this way. I guess ZFS didn't clean-up its now empty structures.