Migrating zabbix from mysql to timescaledb - database setup
This part continues our mysql to timescaleDB migration. We prepare our target database and add some helpers on the source to create a resumable process.
Short summary: Zabbix is collecting timeseries data, timescaleDB is optimized for that and much faster than mysql. Period.
Procedure overview
We're using pgloader to copy over data from our existing (zabbix) database. We start copying the largest amount of data (timeseries) first (up to a recent point like yesterday midnight), then we copy all other tables and verify the application using a new instance of zabbix frontend.
If everything is fine, we are going to final phase:
Remove modifications of database (enable triggers again, create indices etc.)
Stop zabbix-server
Migrate remaining data (prepare chunks so that the operation completes in a short amount of time)
Migrate all non-timeseries data
Switch server to psql database
This process goes quite smooth and has no point of no return until the very last step.
Use of zabbix proxies Using zabbix proxies is in general encouraged if you monitor a certain amount of hosts. In this case proxies buffer incoming data so that we can even switch without any loss of incoming data.
Prepare destination database - timescaleDB
I assume that you've a timescaleDB server (psql with timescaleDB extension) already installed and accounts are also create. If not, there's plenty of information regarding this topic on the web :-)
Having completed this, you also need to create hypertables for the relevant relations/tables. This is also documented in zabbix manual - as a short reference, it will be similar to this:
Drop indices on timeseries tables
To make sure we are using ingestion at max speed, we drop indices on the destination database and create them afterwards. This should speed up the overall process and make the operation mostly an I/O bound process.
Now it's time to prepare the source database.
Disable triggers in destination database
We are migrating also tables that have constraints. A common issue in such a migration is that we do not follow the dependency graph associated with it, so constraints are violated during migration. This also applies here, so we disable triggers during this process.
Yep, quite some of them.
Having completed these steps, the timescaleDB is ready for data ingestion.
Preparation of source database - mysql
As your existing database is probably constantly changing (history & trends tables), you cannot easily copy all data to a new database without either loosing data because of cutoff or loosing data because you stopped zabbix server.
The solution: Split the time series data (which is append only [except the edge case of late incoming data from proxies]) into views and migrate the views using pgloader.
The procedure is then as follows:
Create a view for your first chunk (oldest data first)
Migrate data
Alter the view to select the next chunk
Migrate data
etc.
To create a view, you can use the following SQL commands. (note: we used partitioning before in mysql, so we had very cheap selection of ranges, you might need to filter date column).
As you can see, views are created for history & trends tables.
That is already everything we need to start with the migration. The next part will describe the actual migration of data.
To run postgres in a container on nodes with huge pages enabled requires you to configure the container accordingly. This post shows how to do this on kubernetes/openshift.