Scale Sensu Go with Enterprise datastore
COMMERCIAL FEATURE: Access the datastore feature in the packaged Sensu Go distribution. For more information, read Get started with commercial features.
Sensu Go’s datastore feature enables scaling your monitoring to many thousands of events per second.
For each unique entity/check pair, Sensu records the latest event object in its datastore. By default, Sensu uses the embedded etcd datastore for event storage. The embedded etcd datastore helps you get started, but as the number of entities and checks in your Sensu implementation grows, so does the rate of events being written to the datastore. In a clustered deployment of etcd, whether embedded or external to Sensu, each event received by a member of the cluster must be replicated to other members, increasing network and disk IO utilization.
Our team documented configuration and testing of Sensu running on bare metal infrastructure in the sensu/sensu-perf project. This configuration comfortably handled 12,000 Sensu agent connections (and their keepalives) and processed more than 8,500 events per second.
This rate of events should be sufficient for many installations but assumes an ideal scenario where Sensu backend nodes use direct-attached, dedicated non-volatile memory express (NVMe) storage and are connected to a dedicated LAN. Deployments on public cloud providers are not likely to achieve similar results due to sharing both disk and network bandwidth with other tenants. Adhering to the cloud provider’s recommended practices may also become a factor because many operators are inclined to deploy a cluster across multiple availability zones. In such a deployment cluster, communication happens over shared WAN links, which are subject to uncontrolled variability in throughput and latency.
The Enterprise datastore can help operators achieve much higher rates of event processing and minimize the replication communication between etcd peers.
The sensu-perf
test environment comfortably handles 40,000 Sensu agent connections (and their keepalives) and processes more than 36,000 events per second under ideal conditions.
IMPORTANT: PostgreSQL configuration file locations differ depending on platform. The steps in this guide use common paths for RHEL-family distributions, but the files may be stored elsewhere on your system. Learn more about PostgreSQL configuration file locations.
Prerequisites
- Database server running Postgres 9.5 or later
- Postgres database (or administrative access to create one)
- Postgres user with permissions to the database (or administrative access to create such a user)
- Licensed Sensu Go backend
For optimal performance, we recommend the following PostgreSQL configuration parameters and settings as a starting point for your postgresql.conf
file:
max_connections = 200
shared_buffers = 10GB
maintenance_work_mem = 1GB
vacuum_cost_delay = 10ms
vacuum_cost_limit = 10000
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 1000
max_worker_processes = 8
max_parallel_maintenance_workers = 2
max_parallel_workers_per_gather = 2
max_parallel_workers = 8
synchronous_commit = off
wal_sync_method = fdatasync
wal_writer_delay = 5000ms
max_wal_size = 5GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.025
Adjust the parameters and settings as needed based on your hardware and the performance you observe. Read the PostgreSQL parameters documentation for information about setting parameters.
Configure Postgres
Before Sensu can start writing events to Postgres, you need a database and an account with permissions to write to that database. To provide consistent event throughput, we recommend exclusively dedicating your Postgres instance to storage of Sensu events.
If you have administrative access to Postgres, you can create the database and user.
-
Change to the postgres user and open the Postgres prompt (
postgres=#
):sudo -u postgres psql
-
Create the
sensu_events
database:CREATE DATABASE sensu_events;
PostgreSQL will return a confirmation message:
CREATE DATABASE
. -
Create the
sensu
role with a password:CREATE USER sensu WITH ENCRYPTED PASSWORD 'mypass';
PostgreSQL will return a confirmation message:
CREATE ROLE
. -
Grant the
sensu
role all privileges for thesensu_events
database:GRANT ALL PRIVILEGES ON DATABASE sensu_events TO sensu;
PostgreSQL will return a confirmation message:
GRANT
. -
Type
\q
to exit the PostgreSQL prompt.
With this configuration complete, PostgreSQL will have a sensu_events
database for storing Sensu events and a sensu
user with permissions to that database.
By default, the Postgres user you’ve just added will not be able to authenticate via password, so you’ll also need to make a change to the pg_hba.conf
file.
The required change will depend on how Sensu will connect to Postgres.
In this case, you’ll configure Postgres to allow the sensu
user to connect to the sensu_events
database from any host using an md5-encrypted password:
-
Make a copy of the current
pg_hba.conf
file:sudo cp /var/lib/pgsql/data/pg_hba.conf /var/tmp/pg_hba.conf.bak
-
Give the Sensu user permissions to connect to the
sensu_events
database from any IP address:echo 'host sensu_events sensu 0.0.0.0/0 md5' | sudo tee -a /var/lib/pgsql/data/pg_hba.conf
-
Restart the postgresql service to activate the
pg_hba.conf
changes:sudo systemctl restart postgresql
With this configuration complete, you can configure Sensu to store events in your Postgres database.
To secure communication between Sensu and the PostgreSQL event store using certificate authentication, read Secure PostgreSQL.
Configure Sensu
If your Sensu backend is already licensed, the configuration for routing events to Postgres is relatively straightforward.
Create a PostgresConfig
resource that describes the database connection as a data source name (DSN):
---
type: PostgresConfig
api_version: store/v1
metadata:
name: postgres01
spec:
dsn: "postgresql://sensu:mypass@10.0.2.15:5432/sensu_events?sslmode=disable"
pool_size: 20
{
"type": "PostgresConfig",
"api_version": "store/v1",
"metadata": {
"name": "my-postgres"
},
"spec": {
"dsn": "postgresql://sensu:mypass@10.0.2.15:5432/sensu_events",
"pool_size": 20
}
}
Save this configuration as my-postgres.yml
or my-postgres.json
and install it with sensuctl
:
sensuctl create --file my-postgres.yml
sensuctl create --file my-postgres.json
The Sensu backend is now configured to use Postgres for event storage!
In the web UI and in sensuctl, event history will appear incomplete. When Postgres configuration is provided and the backend successfully connects to the database, etcd event history is not migrated. New events will be written to Postgres as they are processed, with the Postgres datastore ultimately being brought up to date with the current state of your monitored infrastructure.
Aside from event history, which is not migrated from etcd, there’s no observable difference when using Postgres as the event store, and neither interface supports displaying the PostgresConfig type.
To verify that the change was effective and your connection to Postgres was successful, look at the sensu-backend log:
{"component":"store","level":"warning","msg":"trying to enable external event store","time":"2019-10-02T23:31:38Z"}
{"component":"store","level":"warning","msg":"switched event store to postgres","time":"2019-10-02T23:31:38Z"}
You can also use psql
to verify that events are being written to the sensu_events
database.
-
Change to the postgres user and open the Postgres prompt (
postgres=#
):sudo -u postgres psql
-
Connect to the
sensu_events
database:\c sensu_events
PostgreSQL will return a confirmation message:
You are now connected to database "sensu_events" as user "postgres".
The prompt will change to
sensu_events=#
. -
List the tables in the
sensu_events
database:\dt
PostgreSQL will list the tables:
List of relations Schema | Name | Type | Owner --------+-------------------+-------+------- public | events | table | sensu public | migration_version | table | sensu (2 rows)
-
Request a list of all entities reporting keepalives:
select sensu_entity from events where sensu_check = 'keepalive';
PostgreSQL will return a list of the entities:
sensu_entity -------------- i-414141 i-424242 i-434343 (3 rows)
Revert to the built-in datastore
If you want to revert to the default etcd event store, delete the PostgresConfig resource.
In this example, my-postgres.yml
or my-postgres.json
contain the same configuration you used to configure the Enterprise event store earlier in this guide:
sensuctl delete --file my-postgres.yml
sensuctl delete --file my-postgres.json
To verify that the change was effective, look for messages similar to these in the sensu-backend log:
{"component":"store","level":"warning","msg":"store configuration deleted","store":"/sensu.io/api/enterprise/store/v1/provider/postgres01","time":"2019-10-02T23:29:06Z"}
{"component":"store","level":"warning","msg":"switched event store to etcd","time":"2019-10-02T23:29:06Z"}
Similar to enabling PostgreSQL, switching back to the etcd datastore does not migrate current observability event data from one store to another. The web UI or sensuctl output may list outdated events until the etcd datastore catches up with the current state of your monitored infrastructure.
Configure Postgres streaming replication
Postgres supports an active standby with streaming replication. Configure streaming replication to replicate all Sensu events written to the primary Postgres server to the standby server.
Follow the steps in this section to create and add the replication role, set streaming replication configuration parameters, bootstrap the standby host, and confirm successful Postgres streaming replication.
Create and add the replication role
If you have administrative access to Postgres, you can create the replication role. Follow these steps to create and add the replication role on the primary Postgres host:
-
Change to the postgres user and open the Postgres prompt (
postgres=#
):sudo -u postgres psql
-
Create the
repl
role:CREATE ROLE repl PASSWORD '<your-password>' LOGIN REPLICATION;
PostgreSQL will return a confirmation message:
CREATE ROLE
. -
Type
\q
to exit the PostgreSQL prompt. -
Add the replication role to
pg_hba.conf
using an md5-encrypted password. Make a copy of the currentpg_hba.conf
:sudo cp /var/lib/pgsql/data/pg_hba.conf /var/tmp/pg_hba.conf.bak
-
In the following command, replace
<standby_ip>
with the IP address of your standby Postgres host and run the command:export STANDBY_IP=<standby-ip>
-
Give the repl user permissions to replicate from the standby host:
echo "host replication repl ${STANDBY_IP}/32 md5" | sudo tee -a /var/lib/pgsql/data/pg_hba.conf
-
Restart the PostgreSQL service to activate the
pg_hba.conf
changes:sudo systemctl restart postgresql
Set streaming replication configuration parameters
Follow these steps to set streaming replication configuration parameters on the primary Postgres host:
-
Make a copy of the
postgresql.conf
:sudo cp -a /var/lib/pgsql/data/postgresql.conf /var/lib/pgsql/data/postgresql.conf.bak
-
Append the necessary configuration options.
echo 'wal_level = replica' | sudo tee -a /var/lib/pgsql/data/postgresql.conf
-
Set the maximum number of concurrent connections from the standby servers:
echo 'max_wal_senders = 5' | sudo tee -a /var/lib/pgsql/data/postgresql.conf
-
To prevent the primary server from removing the WAL segments required for the standby server before shipping them, set the minimum number of segments retained in the
pg_xlog
directory:echo 'wal_keep_segments = 32' | sudo tee -a /var/lib/pgsql/data/postgresql.conf
At minimum, the number of
wal_keep_segments
should be larger than the number of segments generated between the beginning of online backup and the startup of streaming replication.NOTE: If you enable WAL archiving to an archive directory accessible from the standby, this step may not be necessary.
-
Restart the PostgreSQL service to activate the
postgresql.conf
changes:sudo systemctl restart postgresql
Bootstrap the standby host
Follow these steps to bootstrap the standby host on the standby Postgres host:
-
If the standby host has ever run Postgres, stop Postgres and empty the data directory:
sudo systemctl stop postgresql
sudo mv /var/lib/pgsql/data /var/lib/pgsql/data.bak
-
Make the standby data directory:
sudo install -d -o postgres -g postgres -m 0700 /var/lib/pgsql/data
-
In the following command, replace
<primary_ip>
with the IP address of your primary Postgres host and run the command:export PRIMARY_IP=<primary_ip>
-
Bootstrap the standby data directory:
sudo -u postgres pg_basebackup -h $PRIMARY_IP -D /var/lib/pgsql/data -P -U repl -R --wal-method=stream
-
Enter your password at the Postgres prompt:
Password:
After you enter your password, PostgreSQL will list database copy progress:
30318/30318 kB (100%), 1/1 tablespace
Confirm replication
Follow these steps to confirm replication:
-
On the primary Postgres host, remove primary-only configurations:
sudo sed -r -i.bak '/^(wal_level|max_wal_senders|wal_keep_segments).*/d' /var/lib/pgsql/data/postgresql.conf
-
Start the PostgreSQL service:
sudo systemctl start postgresql
-
Check the primary host commit log location:
sudo -u postgres psql -c "select pg_current_wal_lsn()"
PostgreSQL will list the primary host commit log location:
pg_current_wal_lsn -------------------------- 0/3000568 (1 row)
-
On the standby Postgres host, check the commit log location:
sudo -u postgres psql -c "select pg_last_wal_receive_lsn()"
sudo -u postgres psql -c "select pg_last_wal_replay_lsn()"
PostgreSQL will list the standby host commit log location:
pg_last_wal_receive_lsn ------------------------------- 0/3000568 (1 row)
pg_last_wal_replay_lsn ------------------------------ 0/3000568 (1 row)
With this configuration complete, your Sensu events will be replicated to the standby host.