Replica Health Check
This article explains how to check the health of a replica to decide whether to initiate a replica rebuild.
Check for any DN PODs that are not in the Running state.
kubectl get pods -l xstore/name --show-labels | grep -v Running
Inspect if the unhealthy POD in the xstore is performing regular tasks such as upgrades, or downgrades that are within expectations. If not, and if the POD cannot recover to a ready state, consider starting a replica rebuild task.
Replication Thread and Lag Check
Due to software bugs, it's possible for the replication thread on the replica to be interrupted. Execute the following statement on the replica to view the replication status.
show slave status
A Slave_SQL_Running status of 'No' coupled with a non-empty Last_Error signals replication ruckus. this case, first clarify the cause of the replication interruption, and then initiate a replica rebuild to recover the replica.
For various reasons (such as the replica being down for a long time or issues with the replica host machine), the replica lag might be too large, requiring a significant amount of time to catch up with the primary, such as several hours. In these cases, a replica rebuild can be considered.
How to Check Replica Lag?
show slave statuson the replica and review the
select * from information_schema.alisql_cluster_globalon the primary to compare the difference in the
APPLIED_INDEXattribute values of the replica to the primary. The rate of increase in the
APPLIED_INDEXattribute values on the primary and replica can be used to estimate how long it will take to catch up with the primary's logs.