Statistics Configuration

Statistics Configuration #

Arcion Replicant provides full statistical history of an ongoing replication. This page describes how to set up and configure statistics logging.

Overview #

Replicant uses a YAML configuration file and logs full statistical history of an ongoing replication. Replicant creates a table replicate_io_replication_statistics_history to log the full history of insert, update, delete, and upsert operations across all Replicant jobs. Replicant logs an entry in this table in the following format upon each successful write on a target table:

  • replication_id
  • catalog_name
  • schema_name
  • Table_name
  • Snapshot_start_range
  • Snapshot_end_range
  • Start_time
  • End_time
  • Insert_count
  • Update_count
  • Upsert_count
  • Delete_count
  • Elapsed_time_sec
  • replicant_lag [v20.10.07.10]
  • total_lag [v20.10.07.10]

Statistics configuration file #

The statistics configuration file specifies different aspects of statistics logging like statistics history and storage. The configuration file uses YAML syntax. If you’re new to YAML and want to learn more, see Learn YAML in Y minutes. For a sample configuration, see statistics.yaml in the conf/statistics/ directory of your Replicant self-hosted CLI download folder.

You can define and configure the following parameters in the statistics configuration file:

enable #

{true|false}.

Enables or disables statistics logging.

purge-statistics #

Specifies the purge rules for the statistics history.

purge-statistics.enable #

{true|false}.

Enables purging of replication statistics history.

purge-statistics.purge-stats-before-days #

Number of days to keep the statistics. For example, set this parameter to 30 to keep the statistics history for the last 30 days.

storage [v20.10.07.16] #

Storage configuration for statistics.

storage.stats-archive-type #

Specifies how Replicant archives the statistics data. The following values are supported:

METADATA_DB
Stores statistics data in the metadata database.
FILE_SYSTEM
Stores statistics data in a file.
DST_DB
Stores statistics data in the target database.

storage.storage-location #

Directory location where Replicant stores statistics files when storage.stats-archive-type is FILE_SYSTEM.

storage.format #

The format of statistics file when storage.stats-archive-type is FILE_SYSTEM.

The following formats are supported:

  • CSV
  • JSON

Default: CSV.

storage.catalog[v20.12.04.2] #

The catalog to store statistics in when storage.stats-archive-type is DST_DB.

storage.schema [v20.12.04.2] #

The schema to store statistics in when storage.stats-archive-type is DST_DB.

Sample configuration #

enable: true
purge-statistics:
  enable: true
  purge-stats-before-days: 30
storage:
  stats-archive-type:  DST_DB
  catalog: "io"
  schema: "replicate"