Migrating zabbix from mysql to timescaledb - migration

This series of posts describes how to migrate a certainly large database from mysql to timescaleDB - with near zero downtime. The use case is to port a zabbix database to another engine and gain performance benefits as a result.

In previous parts of this series we prepared our databases (mysql is source, timescaledb is destination), see: Migrating zabbix from mysql to timescaledb - database setup

On this part we're focusing on the actual migration.

Create the container image

The migration will be done using pgloader (https://github.com/dimitri/pgloader) which is executed within a container.

The build of the container (Dockerfile) is shown here:

FROM debian:stable-slim as builder

RUN apt-get update && apt-get install -y --no-install-recommends \
      bzip2 ca-certificates curl freetds-dev gawk git libsqlite3-dev libssl1.1 libzip-dev \
      make openssl patch sbcl time unzip wget cl-ironclad cl-babel git \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /opt/src 
    && cd /opt/src \
    && git clone https://github.com/dimitri/pgloader.git \
    && mkdir -p /opt/src/pgloader/build/bin \
    && cd /opt/src/pgloader \
    && make DYNSIZE=12288 pgloader

FROM debian:stable-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
      curl freetds-dev gawk libsqlite3-dev libzip-dev \
      make sbcl unzip \
    && rm -rf /var/lib/apt/lists/*

COPY --from=builder /opt/src/pgloader/build/bin/pgloader /usr/local/bin
Dockerfile

This contains some adjustments (like DYNSIZE) that are required when running in a container.

Just build the image (docker build -t pgloader .) and proceed.

Prepare pgloader configurations

pgloader is applying the copy using a given configuration file. We are using three of them:

  • history
  • trends
  • config

The definitions are shown here:

LOAD DATABASE
  FROM  mysql://pgloader:PASSWORD@MYSQLHOST/zabbix
  INTO	postgresql://zabbix:PASSWORD@PSQLHOST:5432/zabbix

  WITH	workers = 4, concurrency = 4, multiple reader per thread, rows per range = 500000, batch rows = 500000, prefetch rows = 1000000,
	include no drop, create no indexes, create no tables, data only, no truncate

  MATERIALIZE VIEWS history_view, history_str_view, history_text_view, history_uint_view
  INCLUDING ONLY TABLE NAMES MATCHING 'history_view', 'history_str_view', 'history_text_view', 'history_uint_view'

  alter schema 'zabbix' rename to 'public'
  alter table names matching 'history_view' rename to 'history'
  alter table names matching 'history_str_view' rename to 'history_str'
  alter table names matching 'history_text_view' rename to 'history_text'
  alter table names matching 'history_uint_view' rename to 'history_uint'

  set work_mem to '4096 MB', maintenance_work_mem to '8192 MB'
;
zabbix_history.load
LOAD DATABASE
  FROM  mysql://pgloader:PASSWORD@MYSQLHOST/zabbix
  INTO	postgresql://zabbix:PASSWORD@PSQLHOST:5432/zabbix

  WITH	workers = 4, concurrency = 4, multiple reader per thread, rows per range = 200000, batch rows = 200000, prefetch rows = 200000,
	include no drop, create no indexes, create no tables, data only, no truncate

  MATERIALIZE VIEWS trends_view, trends_uint_view
  INCLUDING ONLY TABLE NAMES MATCHING 'trends_view', 'trends_uint_view'

  alter schema 'zabbix' rename to 'public'
  alter table names matching 'trends_view' rename to 'trends'
  alter table names matching 'trends_uint_view' rename to 'trends_uint'

  set work_mem to '4096 MB', maintenance_work_mem to '8192 MB'
;
zabbix_trends.load
LOAD DATABASE
  FROM  mysql://pgloader:PASSWORD@MYSQLHOST/zabbix
  INTO	postgresql://zabbix:PASSWORD@PSQLHOST:5432/zabbix

  WITH	workers = 4, concurrency = 4, multiple reader per thread, rows per range = 25000, batch rows = 25000, prefetch rows = 25000,
	include no drop, create no indexes, create no tables, data only, reset sequences, truncate

  alter schema 'zabbix' rename to 'public'
  set work_mem to '4096 MB', maintenance_work_mem to '8192 MB'

  EXCLUDING TABLE NAMES MATCHING 'history','history_str','history_text','history_uint','history_view','history_str_view','history_text_view','history_uint_view','trends','trends_uint','trends_view','trends_uint_view'
;
zabbix_config.load

Adjust the files with the values matching your environment (especially line 2 & 3 where the hosts & credentials are configured).

Run migration

Having prepared quite some stuff, it's time to get serious and start the data migration.

Migrate history data

The first data we are transferring will be history tables - these are usually the largest data set.

docker run --rm --name pgloader -v /var/docker/pgloader/data/:/data --network host pgloader pgloader /data/zabbix_history.load
run pgloader (history)

This command assumes that /var/docker/pgloader/data/ contains the pgloader config files from above.

Background process This container is spawned detached, so you can use docker logs to view the current output. The migration will continue in case you disconnect your shell. Stopping the container will remove it's definition and logs - if you want to keep it, remove the --rm flag.

Depending on the size of your chunks (set in the preparation step) this will take some time.

https://knowyourmeme.com/memes/pablo-escobar-waiting
Only one instance Make sure to not run the migration multiple times at the same time. This might generated duplicate data (if using the same pgloader job) and slow down the overall process.

Upon completion of this process, you need to adjust the VIEW on your source database (see setup here: Migrating zabbix from mysql to timescaledb - database setup) to select the next chunk. Then you start the migration again.

Repeat this step until you reach a point which is quite near to now (like yesterday midnight). The last chunk will not yet be migrated!

History is the largest amount of data (probably). Now it's time to copy the trends.

docker run --rm --name pgloader -v /var/docker/pgloader/data/:/data --network host pgloader pgloader /data/zabbix_trends.load
run pgloader (trends)

So - this is the same as above. Wait for completion, select the next source chunk and repeat. Repeat this step also until you reach a point which is quite near to now (like yesterday midnight). The last chunk will not yet be migrated!

Migrate config

Having migrated timeseries data (up to a mostly recent point in time), we continue and migrate the configuration.

docker run --rm --name pgloader -v /var/docker/pgloader/data/:/data --network host pgloader pgloader /data/zabbix_config.load
migrate config

This step will take mostly only a few minutes (except you have a very high amount of alerts or events). Wait for it's completion.

When it's done, fire up a zabbix frontend that points to the new database and see if you can access all hosts, templates, alerts & events, etc. - everything is up to date until now, except history & trend data which is available up to the cutoff date for our last migration step.

Make sure to verify that everything is replicated - we're approaching go-live.

Finalize migration

Having migrated most of the data (except the last batch), it's time to prepare for handover to the new environment.

Compress data in timescaleDB

Before switching the active environment, we want to make sure to apply any larger operations on our destination database to avoid downtime due high pressure on the system when going live.

  • Create indices
  • Compress data

Reenable indices

The first step is to re-enable indices right now.

CREATE INDEX history_log_1 ON public.history_log USING btree (itemid, clock);
CREATE INDEX history_str_1 ON public.history_str USING btree (itemid, clock);
CREATE INDEX history_text_1 ON public.history_text USING btree (itemid, clock);
CREATE INDEX history_1 ON public.history USING btree (itemid, clock);
CREATE INDEX history_uint_1 ON public.history_uint USING btree (itemid, clock);
create indices

This will take some time.

https://www.timescale.com/blog/what-is-sql-used-for-build-environments-where-devs-can-experiment/

Compress data

You probably want timescaleDB to compress old data. The amount of space saving is very high, so I absolutely recommend it.

In our case we had the following gains in data efficiency

SELECT pg_size_pretty(before_compression_total_bytes) as "before compression", pg_size_pretty(after_compression_total_bytes) as "after compression" FROM hypertable_compression_stats('history');

before compression | after compression 
--------------------+-------------------
879 GB | 73 GB
compression gains

So you can see that we shrunk down the allocation to less than 10% of the original data size - as it's time series old data won't change and we don't have any reasonable downside of doing this.

Enabling compression on the hypertables is done using the following commands:

ALTER TABLE history SET (timescaledb.compress,timescaledb.compress_segmentby='itemid',timescaledb.compress_orderby='clock,ns');
ALTER TABLE history_str SET (timescaledb.compress,timescaledb.compress_segmentby='itemid',timescaledb.compress_orderby='clock,ns');
ALTER TABLE history_text SET (timescaledb.compress,timescaledb.compress_segmentby='itemid',timescaledb.compress_orderby='clock,ns');
ALTER TABLE history_uint SET (timescaledb.compress,timescaledb.compress_segmentby='itemid',timescaledb.compress_orderby='clock,ns');
ALTER TABLE trends SET (timescaledb.compress,timescaledb.compress_segmentby='itemid',timescaledb.compress_orderby='clock');
ALTER TABLE trends_uint SET (timescaledb.compress,timescaledb.compress_segmentby='itemid',timescaledb.compress_orderby='clock');
enable compression

Having run this, existing data won't yet be touched - triggering compression is a (until zabbix takes over) manual process. To perform a seamless switch to the new systems we kick of this resource intensive process (of compressing the old data) manually.

SELECT compress_chunk(i) FROM show_chunks('history', older_than => 1622563841, newer_than => 0) i;
SELECT compress_chunk(i) FROM show_chunks('history_str', older_than => 1622563841, newer_than => 0) i;
SELECT compress_chunk(i) FROM show_chunks('history_text', older_than => 1622563841, newer_than => 0) i;
SELECT compress_chunk(i) FROM show_chunks('history_uint', older_than => 1622563841, newer_than => 0) i;

SELECT compress_chunk(i) FROM show_chunks('trends', older_than => 1622563841, newer_than => 0) i;
SELECT compress_chunk(i) FROM show_chunks('trends_uint', older_than => 1622563841, newer_than => 0) i;
compress existing data
Adjust date in query These command will effectively start compressing all existing data up to the given timestamp. Make sure to find a proper timestamp here (like now - 1week).

Failover zabbix server

Having migrated data, applied operations on database, we are finally set for the actual switch on the production system.

The steps will be:

  • stop zabbix server (which points to mysql)
  • migrate last batches of data
  • migrate config
  • run some adjustments
  • verify application (zabbix frontend)
  • start zabbix server (on timescaleDB)

I assume that you now stopped the active zabbix server instance and the old database remains therefore unchanged.

Migrate last batch & config

The last batch of data migration consists of:

  • Update views on history
  • Run pgloader with zabbix_history.load
  • Update views on trends
  • Run pgloader with zabbix_trends.load
  • Run pgloader with zabbix_config.load

Upon completion, the new database contains all data from the existing one.

Final adjustments on data

During preparation we disabled all triggers on timescaleDB to avoid foreign key violations when inserting data. Now it's time to enable the triggers again.

ALTER TABLE acknowledges ENABLE TRIGGER ALL;
ALTER TABLE actions ENABLE TRIGGER ALL;
ALTER TABLE alerts ENABLE TRIGGER ALL;
ALTER TABLE auditlog ENABLE TRIGGER ALL;
ALTER TABLE auditlog_details ENABLE TRIGGER ALL;
ALTER TABLE autoreg_host ENABLE TRIGGER ALL;
ALTER TABLE conditions ENABLE TRIGGER ALL;
ALTER TABLE config ENABLE TRIGGER ALL;
ALTER TABLE config_autoreg_tls ENABLE TRIGGER ALL;
ALTER TABLE corr_condition ENABLE TRIGGER ALL;
ALTER TABLE corr_condition_group ENABLE TRIGGER ALL;
ALTER TABLE corr_condition_tag ENABLE TRIGGER ALL;
ALTER TABLE corr_condition_tagpair ENABLE TRIGGER ALL;
ALTER TABLE corr_condition_tagvalue ENABLE TRIGGER ALL;
ALTER TABLE corr_operation ENABLE TRIGGER ALL;
ALTER TABLE correlation ENABLE TRIGGER ALL;
ALTER TABLE dashboard ENABLE TRIGGER ALL;
ALTER TABLE dashboard_page ENABLE TRIGGER ALL;
ALTER TABLE dashboard_user ENABLE TRIGGER ALL;
ALTER TABLE dashboard_usrgrp ENABLE TRIGGER ALL;
ALTER TABLE dbversion ENABLE TRIGGER ALL;
ALTER TABLE dchecks ENABLE TRIGGER ALL;
ALTER TABLE dhosts ENABLE TRIGGER ALL;
ALTER TABLE drules ENABLE TRIGGER ALL;
ALTER TABLE dservices ENABLE TRIGGER ALL;
ALTER TABLE escalations ENABLE TRIGGER ALL;
ALTER TABLE event_recovery ENABLE TRIGGER ALL;
ALTER TABLE event_suppress ENABLE TRIGGER ALL;
ALTER TABLE event_tag ENABLE TRIGGER ALL;
ALTER TABLE events ENABLE TRIGGER ALL;
ALTER TABLE expressions ENABLE TRIGGER ALL;
ALTER TABLE functions ENABLE TRIGGER ALL;
ALTER TABLE globalmacro ENABLE TRIGGER ALL;
ALTER TABLE globalvars ENABLE TRIGGER ALL;
ALTER TABLE graph_discovery ENABLE TRIGGER ALL;
ALTER TABLE graph_theme ENABLE TRIGGER ALL;
ALTER TABLE graphs ENABLE TRIGGER ALL;
ALTER TABLE graphs_items ENABLE TRIGGER ALL;
ALTER TABLE group_discovery ENABLE TRIGGER ALL;
ALTER TABLE group_prototype ENABLE TRIGGER ALL;
ALTER TABLE host_discovery ENABLE TRIGGER ALL;
ALTER TABLE host_inventory ENABLE TRIGGER ALL;
ALTER TABLE host_tag ENABLE TRIGGER ALL;
ALTER TABLE hostmacro ENABLE TRIGGER ALL;
ALTER TABLE hosts ENABLE TRIGGER ALL;
ALTER TABLE hosts_groups ENABLE TRIGGER ALL;
ALTER TABLE hosts_templates ENABLE TRIGGER ALL;
ALTER TABLE housekeeper ENABLE TRIGGER ALL;
ALTER TABLE hstgrp ENABLE TRIGGER ALL;
ALTER TABLE httpstep ENABLE TRIGGER ALL;
ALTER TABLE httpstep_field ENABLE TRIGGER ALL;
ALTER TABLE httpstepitem ENABLE TRIGGER ALL;
ALTER TABLE httptest ENABLE TRIGGER ALL;
ALTER TABLE httptest_field ENABLE TRIGGER ALL;
ALTER TABLE httptest_tag ENABLE TRIGGER ALL;
ALTER TABLE httptestitem ENABLE TRIGGER ALL;
ALTER TABLE icon_map ENABLE TRIGGER ALL;
ALTER TABLE icon_mapping ENABLE TRIGGER ALL;
ALTER TABLE ids ENABLE TRIGGER ALL;
ALTER TABLE images ENABLE TRIGGER ALL;
ALTER TABLE interface ENABLE TRIGGER ALL;
ALTER TABLE interface_discovery ENABLE TRIGGER ALL;
ALTER TABLE interface_snmp ENABLE TRIGGER ALL;
ALTER TABLE item_condition ENABLE TRIGGER ALL;
ALTER TABLE item_discovery ENABLE TRIGGER ALL;
ALTER TABLE item_parameter ENABLE TRIGGER ALL;
ALTER TABLE item_preproc ENABLE TRIGGER ALL;
ALTER TABLE item_rtdata ENABLE TRIGGER ALL;
ALTER TABLE item_tag ENABLE TRIGGER ALL;
ALTER TABLE items ENABLE TRIGGER ALL;
ALTER TABLE lld_macro_path ENABLE TRIGGER ALL;
ALTER TABLE lld_override ENABLE TRIGGER ALL;
ALTER TABLE lld_override_condition ENABLE TRIGGER ALL;
ALTER TABLE lld_override_opdiscover ENABLE TRIGGER ALL;
ALTER TABLE lld_override_operation ENABLE TRIGGER ALL;
ALTER TABLE lld_override_ophistory ENABLE TRIGGER ALL;
ALTER TABLE lld_override_opinventory ENABLE TRIGGER ALL;
ALTER TABLE lld_override_opperiod ENABLE TRIGGER ALL;
ALTER TABLE lld_override_opseverity ENABLE TRIGGER ALL;
ALTER TABLE lld_override_opstatus ENABLE TRIGGER ALL;
ALTER TABLE lld_override_optag ENABLE TRIGGER ALL;
ALTER TABLE lld_override_optemplate ENABLE TRIGGER ALL;
ALTER TABLE lld_override_optrends ENABLE TRIGGER ALL;
ALTER TABLE maintenance_tag ENABLE TRIGGER ALL;
ALTER TABLE maintenances ENABLE TRIGGER ALL;
ALTER TABLE maintenances_groups ENABLE TRIGGER ALL;
ALTER TABLE maintenances_hosts ENABLE TRIGGER ALL;
ALTER TABLE maintenances_windows ENABLE TRIGGER ALL;
ALTER TABLE media ENABLE TRIGGER ALL;
ALTER TABLE media_type ENABLE TRIGGER ALL;
ALTER TABLE media_type_message ENABLE TRIGGER ALL;
ALTER TABLE media_type_param ENABLE TRIGGER ALL;
ALTER TABLE module ENABLE TRIGGER ALL;
ALTER TABLE opcommand ENABLE TRIGGER ALL;
ALTER TABLE opcommand_grp ENABLE TRIGGER ALL;
ALTER TABLE opcommand_hst ENABLE TRIGGER ALL;
ALTER TABLE opconditions ENABLE TRIGGER ALL;
ALTER TABLE operations ENABLE TRIGGER ALL;
ALTER TABLE opgroup ENABLE TRIGGER ALL;
ALTER TABLE opinventory ENABLE TRIGGER ALL;
ALTER TABLE opmessage ENABLE TRIGGER ALL;
ALTER TABLE opmessage_grp ENABLE TRIGGER ALL;
ALTER TABLE opmessage_usr ENABLE TRIGGER ALL;
ALTER TABLE optemplate ENABLE TRIGGER ALL;
ALTER TABLE problem ENABLE TRIGGER ALL;
ALTER TABLE problem_tag ENABLE TRIGGER ALL;
ALTER TABLE profiles ENABLE TRIGGER ALL;
ALTER TABLE proxy_autoreg_host ENABLE TRIGGER ALL;
ALTER TABLE proxy_dhistory ENABLE TRIGGER ALL;
ALTER TABLE proxy_history ENABLE TRIGGER ALL;
ALTER TABLE regexps ENABLE TRIGGER ALL;
ALTER TABLE report ENABLE TRIGGER ALL;
ALTER TABLE report_param ENABLE TRIGGER ALL;
ALTER TABLE report_user ENABLE TRIGGER ALL;
ALTER TABLE report_usrgrp ENABLE TRIGGER ALL;
ALTER TABLE rights ENABLE TRIGGER ALL;
ALTER TABLE role ENABLE TRIGGER ALL;
ALTER TABLE role_rule ENABLE TRIGGER ALL;
ALTER TABLE script_param ENABLE TRIGGER ALL;
ALTER TABLE scripts ENABLE TRIGGER ALL;
ALTER TABLE service_alarms ENABLE TRIGGER ALL;
ALTER TABLE services ENABLE TRIGGER ALL;
ALTER TABLE services_links ENABLE TRIGGER ALL;
ALTER TABLE services_times ENABLE TRIGGER ALL;
ALTER TABLE sessions ENABLE TRIGGER ALL;
ALTER TABLE sysmap_element_trigger ENABLE TRIGGER ALL;
ALTER TABLE sysmap_element_url ENABLE TRIGGER ALL;
ALTER TABLE sysmap_shape ENABLE TRIGGER ALL;
ALTER TABLE sysmap_url ENABLE TRIGGER ALL;
ALTER TABLE sysmap_user ENABLE TRIGGER ALL;
ALTER TABLE sysmap_usrgrp ENABLE TRIGGER ALL;
ALTER TABLE sysmaps ENABLE TRIGGER ALL;
ALTER TABLE sysmaps_element_tag ENABLE TRIGGER ALL;
ALTER TABLE sysmaps_elements ENABLE TRIGGER ALL;
ALTER TABLE sysmaps_link_triggers ENABLE TRIGGER ALL;
ALTER TABLE sysmaps_links ENABLE TRIGGER ALL;
ALTER TABLE tag_filter ENABLE TRIGGER ALL;
ALTER TABLE task ENABLE TRIGGER ALL;
ALTER TABLE task_acknowledge ENABLE TRIGGER ALL;
ALTER TABLE task_check_now ENABLE TRIGGER ALL;
ALTER TABLE task_close_problem ENABLE TRIGGER ALL;
ALTER TABLE task_data ENABLE TRIGGER ALL;
ALTER TABLE task_remote_command ENABLE TRIGGER ALL;
ALTER TABLE task_remote_command_result ENABLE TRIGGER ALL;
ALTER TABLE task_result ENABLE TRIGGER ALL;
ALTER TABLE timeperiods ENABLE TRIGGER ALL;
ALTER TABLE token ENABLE TRIGGER ALL;
ALTER TABLE trigger_depends ENABLE TRIGGER ALL;
ALTER TABLE trigger_discovery ENABLE TRIGGER ALL;
ALTER TABLE trigger_queue ENABLE TRIGGER ALL;
ALTER TABLE trigger_tag ENABLE TRIGGER ALL;
ALTER TABLE triggers ENABLE TRIGGER ALL;
ALTER TABLE users ENABLE TRIGGER ALL;
ALTER TABLE users_groups ENABLE TRIGGER ALL;
ALTER TABLE usrgrp ENABLE TRIGGER ALL;
ALTER TABLE valuemap ENABLE TRIGGER ALL;
ALTER TABLE valuemap_mapping ENABLE TRIGGER ALL;
ALTER TABLE widget ENABLE TRIGGER ALL;
ALTER TABLE widget_field ENABLE TRIGGER ALL;
enable database triggers

This looks good, we're nearly there.

To handover compression of hypertables to zabbix, we can enable this using the following configuration change (which can also be done in the UI).

UPDATE config SET db_extension='timescaledb',hk_history_global=1,hk_trends_global=1;
UPDATE config SET compression_status=1,compress_older='7d';
enable compression in zabbix server

Verify data & enable server

At this point we have migrated all data and done all adjustments on it - the system is ready to go into production. Before actually enabling zabbix server (which potentially collects data buffered on proxies), go the zabbix frontend and verify one last time that everything is behaving as expected, you can see the latest collected metrics, events, etc. It is expected that you transferred all data from the old system, so even small gaps might raise your attention.

If you got green light, switch on the zabbix server which now points to the timescaleDB (see zabbix manual for the corresponding database driver).

During the first start, keep an eye on the behaviour, logs and utilization of the database & server itself. Hopefully everything works out of the box as expected.

If yes: Good job, feel free to clap on your shoulder :-)

https://i.imgflip.com/4mb51e.jpg