Migrating zabbix from mysql to timescaledb - migration
This series describes how to migrate a mysql database to timescaledb - use case is to migrate a zabbix application database. This part handles the migration and switchover of the server environment up to completion of the migration.
This series of posts describes how to migrate a certainly large database from mysql to timescaleDB - with near zero downtime. The use case is to port a zabbix database to another engine and gain performance benefits as a result.
The build of the container (Dockerfile) is shown here:
This contains some adjustments (like DYNSIZE) that are required when running in a container.
Just build the image (docker build -t pgloader .) and proceed.
Prepare pgloader configurations
pgloader is applying the copy using a given configuration file. We are using three of them:
The definitions are shown here:
Adjust the files with the values matching your environment (especially line 2 & 3 where the hosts & credentials are configured).
Having prepared quite some stuff, it's time to get serious and start the data migration.
Migrate history data
The first data we are transferring will be history tables - these are usually the largest data set.
This command assumes that /var/docker/pgloader/data/ contains the pgloader config files from above.
Background process This container is spawned detached, so you can use docker logs to view the current output. The migration will continue in case you disconnect your shell. Stopping the container will remove it's definition and logs - if you want to keep it, remove the --rm flag.
Depending on the size of your chunks (set in the preparation step) this will take some time.
Only one instance Make sure to not run the migration multiple times at the same time. This might generated duplicate data (if using the same pgloader job) and slow down the overall process.
Repeat this step until you reach a point which is quite near to now (like yesterday midnight). The last chunk will not yet be migrated!
History is the largest amount of data (probably). Now it's time to copy the trends.
So - this is the same as above. Wait for completion, select the next source chunk and repeat. Repeat this step also until you reach a point which is quite near to now (like yesterday midnight). The last chunk will not yet be migrated!
Having migrated timeseries data (up to a mostly recent point in time), we continue and migrate the configuration.
This step will take mostly only a few minutes (except you have a very high amount of alerts or events). Wait for it's completion.
When it's done, fire up a zabbix frontend that points to the new database and see if you can access all hosts, templates, alerts & events, etc. - everything is up to date until now, except history & trend data which is available up to the cutoff date for our last migration step.
Make sure to verify that everything is replicated - we're approaching go-live.
Having migrated most of the data (except the last batch), it's time to prepare for handover to the new environment.
Compress data in timescaleDB
Before switching the active environment, we want to make sure to apply any larger operations on our destination database to avoid downtime due high pressure on the system when going live.
The first step is to re-enable indices right now.
This will take some time.
You probably want timescaleDB to compress old data. The amount of space saving is very high, so I absolutely recommend it.
In our case we had the following gains in data efficiency
So you can see that we shrunk down the allocation to less than 10% of the original data size - as it's time series old data won't change and we don't have any reasonable downside of doing this.
Enabling compression on the hypertables is done using the following commands:
Having run this, existing data won't yet be touched - triggering compression is a (until zabbix takes over) manual process. To perform a seamless switch to the new systems we kick of this resource intensive process (of compressing the old data) manually.
Adjust date in query These command will effectively start compressing all existing data up to the given timestamp. Make sure to find a proper timestamp here (like now - 1week).
Failover zabbix server
Having migrated data, applied operations on database, we are finally set for the actual switch on the production system.
The steps will be:
stop zabbix server (which points to mysql)
migrate last batches of data
run some adjustments
verify application (zabbix frontend)
start zabbix server (on timescaleDB)
I assume that you now stopped the active zabbix server instance and the old database remains therefore unchanged.
Migrate last batch & config
The last batch of data migration consists of:
Update views on history
Run pgloader with zabbix_history.load
Update views on trends
Run pgloader with zabbix_trends.load
Run pgloader with zabbix_config.load
Upon completion, the new database contains all data from the existing one.
Final adjustments on data
During preparation we disabled all triggers on timescaleDB to avoid foreign key violations when inserting data. Now it's time to enable the triggers again.
This looks good, we're nearly there.
To handover compression of hypertables to zabbix, we can enable this using the following configuration change (which can also be done in the UI).
Verify data & enable server
At this point we have migrated all data and done all adjustments on it - the system is ready to go into production. Before actually enabling zabbix server (which potentially collects data buffered on proxies), go the zabbix frontend and verify one last time that everything is behaving as expected, you can see the latest collected metrics, events, etc. It is expected that you transferred all data from the old system, so even small gaps might raise your attention.
If you got green light, switch on the zabbix server which now points to the timescaleDB (see zabbix manual for the corresponding database driver).
During the first start, keep an eye on the behaviour, logs and utilization of the database & server itself. Hopefully everything works out of the box as expected.
If yes: Good job, feel free to clap on your shoulder :-)
If you run kubernetes on your own, you need to provide a storage solution with it. We are using ceph (operated through rook). This article gives some short overview about it's benefits and some pro's and con's of it.