postgres - upgrading postgres with timescaledb running in a container

One of my recent activities has been an upgrade of our monitoring (we are using Zabbix for this) database from postgres 13 to postgres 15. To achieve high performance on time series data we are using the timescaledb extensions on postgres.

The database has around 3TB of data and is therefore running on a dedicated machine. To avoid installing packages and stuff around we are running it in a container environment - in this case we use docker to manage it.

Failing with my own guides

So - I went actually using my own guide to upgrade postgres (https://blog.nuvotex.de/upgrade-timescaledbs-postgres/) and I failed. But why?

The problem is quite nasty: The timescaledb container image builds on top of the postgres container image which in turn builds on top of alpine.

Using my guide the migration will run on ubuntu. And running pg_ugprade on ubuntu seemingly applies the collation matching a version on the operating system (which might again relate to glibc).

Long story short: Using another (non-musl) architecture during upgrading introduces some collation information in the database which will then cause issues when starting the upgraded database. The error message show is:

database "zabbix" has no actual collation version, but a version was recorded

I will make a dedicated blog post on this.

How to upgrade using containers - approach number two

I will try to make a compact guide on how the upgrade procedure looks like in general.

Run an ephermal container using the current version of your database. Attach a volume (to /pgbin) to it where you can store some files
Copy over the bin directory of the current postgres installation to the attached volume (on alpine just take: /usr/local/bin)
Stop the ephermal container
Stop the container running the production server
Start a new container using the target version of your database. Attach the volume from above (/pgbin) and the existing data volume (/pgdata)
Follow this procedure to upgrade postgres (shown below)
Start new database server
Upgrade extensions
Rebuild statistics on the new server

Postgres upgrade procedure in more detail

The steps to perform the upgrade are shown in the script. Basically it's preparing the new database and running pg_upgrade - if possible use the --link option because this avoids copying over vast amounts of data.

# current datapath is /pgdata/pg13
# current binpath is /pgbin/pg13

# create the destination data path
mkdir /pgdata/pg15
chown postgres:postgres /pgdata/pg15

# switch user
su postgres
cd /tmp

# init the new server
initdb -D /pgdata/bin/pg15

# validate the update
pg_upgrade -b /pgbin/pg13/bin -B /usr/local/bin -d /pgdata/pg13/ -D /pgdata/pg15 --link --check

# validate the update
pg_upgrade -b /pgbin/pg13/bin -B /usr/local/bin -d /pgdata/pg13/ -D /pgdata/pg15 --link

example when upgrading from pg13 to pg15

The output in my case looked like this.

First step is to init the new database directory:

/pgbin/pg15/bin/initdb -D /pgdata/bin/pg15

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /pgdata/pg15 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... sh: locale: not found
2023-07-15 17:33:32.093 UTC [18] WARNING:  no usable system locales were found
ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /pgbin/pg15/bin/pg_ctl -D /pgdata/pg15 -l logfile start

init new pg server

Then do the dry run of the upgrade

/pgbin/pg15/bin/pg_upgrade -b /pgbin/pg13/bin -B /pgbin/pg15/bin -d /pgdata/pg13/ -D /pgdata/pg15 --check --link

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for user-defined encoding conversions              ok
Checking for user-defined postfix operators                 ok
Checking for incompatible polymorphic functions             ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

*Clusters are compatible*

upgrade validation

And finally the upgrade itself

/pgbin/pg15/bin/pg_upgrade -b /pgbin/pg13/bin -B /pgbin/pg15/bin -d /pgdata/pg13/ -D /pgdata/pg15 --link

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for user-defined encoding conversions              ok
Checking for user-defined postfix operators                 ok
Checking for incompatible polymorphic functions             ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting oldest XID for new cluster                          ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Adding ".old" suffix to old global/pg_control               ok

If you want to start the old cluster, you will need to remove
the ".old" suffix from /pgdata/pg13/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.

Linking user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to delete old cluster                       ok
Checking for extension updates                              notice

Your installation contains extensions that should be updated
with the ALTER EXTENSION command.  The file
    update_extensions.sql
when executed by psql by the database superuser will update
these extensions.


Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /pgbin/pg15/bin/vacuumdb --all --analyze-in-stages

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Upgrade extensions

If you are using extensions, you should (or might even need) upgrade them also. The command in my case (shown in update_extensions.sql, see output above) is:

\connect template1
ALTER EXTENSION "timescaledb" UPDATE;
\connect postgres
ALTER EXTENSION "timescaledb" UPDATE;
\connect zabbix
ALTER EXTENSION "timescaledb" UPDATE;

Rebuilding the statistics

When you've reached this point and everything is fine - you've done the upgrade.

Now you need to rebuild the statistics because they are not copied during the upgrade - unless you want to wait for autovacuum to kick in (which I strongly disagree, the database should not be usable without statistics).

# run with 16 jobs in parallel (use one job per core)
vacuumdb --all --analyze-in-stages -j 16

rebuild statistics

That's it - you have successfully upgrade your server!