Forever growing zabbix database
Starting with zabbix 6.0 some new featuers have been added - for example audit logging. This post shows downsides / limitations of audit logging on this release and how to avoid negative impact of this.
We're heavily using zabbix internally for monitoring purposes. One thing that has changed over the last few months is that the growth of our database has been accelerating. As database we're using timescaledb to be able to get the required ingestion rates of our setup (we're persisting about 3000 - 4000 values per second)
First, we expected that the reason for that is that we have been adding more and more items to our templates, as we're continuously investing on improvements here. So - as our storage has again been reaching a higher allocation (database growth ~ 1.2TB in 6 months) despite the fact we're quite nitpicky on allocating only necessary values and recording them in optimal format (like numbers with value mapping instead of strings).
So, I wanted to check out the database tables to see if there is a specific table (PG or hypertable) to see what is growing so much.
This will list all relations of postgres (as in our case won't include hypertables). In our case, the auditlog allocated 721GB of disk space.
Audit log is a new feature of zabbix 6.0 and has been running in our environment for around three to four months.
The reason for this growth is probably that we're heavily using discovery of items which will technically trigger automatic creation and update of item definitions - which in turn will create audit log records. So, we have the audit log, but it's full of records from SYSTEM user.
Disable audit log
To avoid generating huge amounts of (in this case) unused data, I've decided to disable the audit log - maybe zabbix devs will add a feature to avoid logging activity from SYSTEM context, which would be quite appreciated in this case.
For now - it's disabled in the settings: Administration -> General -> Audit log
Clear audit log history
Having disabled audit log, we will no longer allocate new space through these records. Now it's time to clear the history.
As we won't keep any logs, the fastest and most efficient way to get rid of the data is to run:
This will effectively remove the data and free up quite a lot of disk space.
In our case, disabling audit log reduced the growth of the database by around 58%, so it's clear that spending more than half of the disk space for a metrics based monitoring on auditing is not the goal.
So, let's hope we'll get some options in the future to either compress the logs somehow and/or disable logging of SYSTEM user.