The earlier problem was caused by the cronjob which builds the Policy zone running multiple concurrent copies which caused the generated files to become corrupted, which in turn caused rbldnsd to exit when it attempted to load them. Those corrupted zone files were also deployed on our rsync mirrors.
I have put in place additional measures to prevent this from reoccurring and we will be making further changes based on what we learned from this outage:
- migrate away from Kubernetes Cronjobs to regular cron. - add an additional switch to rbldnsd to prevent it from exiting on reload if syntax errors are found and instead continue serving old data and log a warning. - add an additional switch to rbldnsd to allow it to syntax check a file and exit, so we can use this to check the created zone files before they are uploaded. - improve our rbldnsd containers for a faster start-up and to stagger their initial rsync.
Sorry for any inconvenience this caused.
Kind regards, Steve.
-- Steve Freegard Senior Product Owner Abusix Intelligence
Posted Aug 22, 2019 - 20:34 UTC
A fix has been implemented and we are monitoring the results.
Posted Aug 22, 2019 - 18:30 UTC
We are continuing to work on a fix for this issue.
Posted Aug 22, 2019 - 18:21 UTC
The issue has been identified and a fix is being implemented.