TiDB Cloud Built-in Alerting

April 21, 2026 ยท View on GitHub

TiDB Cloud provides you with an easy way to view alerts, edit alert rules, and subscribe to alert notifications.

This document describes how to do these operations and provides the TiDB Cloud built-in alert conditions for your reference.

Note:

Currently, alert subscription is available for TiDB Cloud Essential instances and TiDB Cloud Dedicated clusters.

View alerts

In TiDB Cloud, you can view both active and closed alerts on the Alerts page.

  1. In the TiDB Cloud console, navigate to the My TiDB page.

    Tip:

    If you are in multiple organizations, use the combo box in the upper-left corner to switch to your target organization first.

  2. Click the name of the target {{{ .essential }}} instance or TiDB Cloud Dedicated cluster to go to its overview page.

  3. Click Alerts in the left navigation pane.

  4. The Alerts page displays the active alerts by default. You can view the information of each active alert such as the alert name, trigger time, and duration.

  5. If you also want to view the closed alerts, just click the Status drop-down list and select Closed or All.

Edit alert rules

In TiDB Cloud, you can edit the alert rules by disabling or enabling the alerts or updating the alert threshold.

  1. On the Alerts page, click Edit Rules.

  2. Disable or enable alert rules as needed.

  3. Click Edit to update the threshold of an alert rule.

    Tip:

    Currently, TiDB Cloud provides limited capabilities for alert rule editing. Some alert rules do not support editing. If you would like to configure different trigger conditions or frequency, or have alerts automatically trigger actions in downstream services like PagerDuty, consider using a third-party monitoring and alerting integration.

Subscribe to alert notifications

In TiDB Cloud, you can subscribe to alert notifications via one of the following methods:

TiDB Cloud built-in alert conditions

The following table provides the TiDB Cloud built-in alert conditions and the corresponding recommended actions.

Note:

  • While these alert conditions do not necessarily mean there is a problem, they are often early warning indicators of emerging issues. Therefore, taking the recommended action is advised.
  • You can edit the thresholds of the alerts on the TiDB Cloud console.
  • Some alert rules are disabled by default. You can enable them as needed.

TiDB Cloud provides different alert rules for each TiDB Cloud plan, based on the features available in that plan.

Resource usage alerts

ConditionRecommended Action
Total TiDB node memory utilization across cluster exceeded 70% for 10 minutesConsider increasing the node number or node size for TiDB to reduce the memory usage percentage of the current workload.
Total TiKV node memory utilization across cluster exceeded 70% for 10 minutesConsider increasing the node number or node size for TiKV to reduce the memory usage percentage of the current workload.
Total TiFlash node memory utilization across cluster exceeded 70% for 10 minutesConsider increasing the node number or node size for TiFlash to reduce the memory usage percentage of the current workload.
Total TiDB node CPU utilization exceeded 80% for 10 minutesConsider increasing the node number or node size for TiDB to reduce the CPU usage percentage of the current workload.
Total TiKV node CPU utilization exceeded 80% for 10 minutesConsider increasing the node number or node size for TiKV to reduce the CPU usage percentage of the current workload.
Total TiFlash node CPU utilization exceeded 80% for 10 minutesConsider increasing the node number or node size for TiFlash to reduce the CPU usage percentage of the current workload.
TiKV storage utilization exceeds 80%Consider increasing the node number or node storage size for TiKV to increase your storage capacity. When the storage usage of TiKV exceeds 80%, latency spikes might occur, and higher usage might cause requests to fail.
TiFlash storage utilization exceeds 80%Consider increasing the node number or node storage size for TiFlash to increase your storage capacity. When the storage usage of all TiFlash nodes reaches 80%, any DDL statement that adds a TiFlash replica hangs indefinitely.
Max memory utilization across TiDB nodes exceeded 70% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiDB to reduce the memory usage percentage of the current workload.
Max memory utilization across TiKV nodes exceeded 70% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiKV to reduce the memory usage percentage of the current workload.
Max CPU utilization across TiDB nodes exceeded 80% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiDB to reduce the CPU usage percentage of the current workload.
Max CPU utilization across TiKV nodes exceeded 80% for 10 minutesConsider checking if there is any hotspot in the cluster or increasing the node number or node size for TiKV to reduce the CPU usage percentage of the current workload.

Data migration alerts

ConditionRecommended Action
Data migration job met error during data exportCheck the error and see Troubleshoot data migration for help.
Data migration job met error during data importCheck the error and see Troubleshoot data migration for help.
Data migration job met error during incremental migrationCheck the error and see Troubleshoot data migration for help.
Data migration job has been paused for more than 6 hours during incremental migrationData migration job has been paused for more than 6 hours during data incremental migration. The binlog in the upstream database might be purged (depending on your database binlog purge strategy) and might cause incremental migration to fail. See Troubleshoot data migration for help.
Replication lag is larger than 10 minutes and still increasing for more than 20 minutesSee Troubleshoot data migration for help.

Changefeed alerts for {{{ .dedicated }}}

ConditionRecommended Action
The changefeed latency exceeds 600 seconds.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
Possible reasons that can trigger this alert include:
  • The overall traffic in the upstream has increased, causing the existing changefeed specification to be insufficient to handle it. If the traffic increase is temporary, the changefeed latency will automatically recover after the traffic returns to normal. If the traffic increase is continuous, you need to scale up the changefeed.
  • The downstream or network is abnormal. In this case, resolve this abnormality first.
  • Tables lack indexes if the downstream is RDS, which might cause low write performance and high latency. In this case, you need to add the necessary indexes to the upstream or downstream.
If the problem cannot be fixed from your side, contact TiDB Cloud Support for further assistance.
The changefeed status is FAILED.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
If the problem cannot be fixed from your side, contact TiDB Cloud Support for further assistance.
The changefeed status is WARNING.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
If the problem cannot be fixed from your side, contact TiDB Cloud Support for further assistance.

Performance overview alerts

ConditionRecommended Action
Request units per second (RU/s) exceed 80% of the maximum RCU
  1. Review RU metrics to determine whether the increase is gradual or a sudden spike.
  2. If the increase is gradual, check whether query duration has increased. If so, the current maximum RCU might be insufficient.
  3. Scale capacity by manually increasing the maximum RCU in the TiDB Cloud console.

If you cannot resolve the issue, contact TiDB Cloud Support.
QPS drops by 80%
  1. Check whether the drop is caused by increasing query latency.
  2. Verify that your application is operating normally. If the drop is intentional, ignore this alert. If the drop is unintentional and you cannot identify the root cause, contact TiDB Cloud Support immediately.
Query P99 latency exceeds 200 ms
  1. Investigate slow queries: go to the Slow Query page and filter by a recent time range to identify newly introduced or slower-running queries.
  2. Review recent changes, such as application deployments, schema changes, or data import jobs, that might have affected traffic patterns.

If you cannot identify the root cause, contact TiDB Cloud Support immediately.
Query P95 latency exceeds 200 ms
  1. Investigate slow queries: go to the Slow Query page and filter by a recent time range to identify newly introduced or slower-running queries.
  2. Review recent changes, such as application deployments, schema changes, or data import jobs, that might have affected traffic patterns.

If you cannot identify the root cause, contact TiDB Cloud Support immediately.
Request error rate exceeds 10%Review recent errors and the overall statement execution status for the cluster.

Changefeed alerts for {{{ .essential }}}

ConditionRecommended Action
The changefeed latency exceeds 600 seconds.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
Possible reasons that can trigger this alert include:
  • The overall traffic in the upstream has increased, causing the existing changefeed specification to be insufficient to handle it. If the traffic increase is temporary, the changefeed latency will automatically recover after the traffic returns to normal. If the traffic increase is continuous, you need to scale up the changefeed.
  • The downstream or network is abnormal. In this case, resolve this abnormality first.
  • Tables lack indexes if the downstream is RDS, which might cause low write performance and high latency. In this case, you need to add the necessary indexes to the upstream or downstream.
If the problem cannot be fixed from your side, contact TiDB Cloud Support for further assistance.
The changefeed status is FAILED.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
If the problem cannot be fixed from your side, contact TiDB Cloud Support for further assistance.
The changefeed status is WARNING.Check the changefeed status on the Changefeed page and Changefeed Detail page of the TiDB Cloud console, where you can find some error messages to help diagnose this issue.
If the problem cannot be fixed from your side, contact TiDB Cloud Support for further assistance.