Resume from Checkpoint

October 30, 2025 ยท View on GitHub

When starting a task, you can retrieve the latest task position from the previous execution based on the configuration, allowing you to continue the task without starting from scratch.

Supported Sources

  • MySQL source
  • Postgres source
  • Mongo source

Position Recording

Task progress is recorded in two types: SnapshotDoing and SnapshotFinished. The recording frequency of SnapshotDoing depends on the pipeline.checkpoint_interval_secs configuration, with a default value of 10s.

Whether or not you enable resume from checkpoint, position information will be recorded as logs during task execution, located in the logs directory (runtime.log_dir) as position.log and finished.log.

In addition, you can persist task positions in the target database or a specified database through configuration. Recording positions to the database will consume approximately 150 bytes * 2 * number of synced tables of storage. For example, syncing 10,000 tables will result in the position table occupying about 3MB. When recording positions to a database, the task will generate a task_id that is as unique as possible based on the configuration. To ensure position information is not affected by other tasks, it is recommended to specify the task_id through the configuration file: global.task_id This feature only supports MySQL or PG as the target database/specified database The target account needs to have permissions to create MySQL database/PG schema and create tables

Position Reading

You can choose to read positions from logs, from target database, or from a specified database through configuration.

Progress Logs

For detailed explanations, please refer to Position Information

position.log

2024-10-10 04:04:08.152044 | current_position | {"type":"RdbSnapshot","db_type":"mysql","schema":"test_db","tb":"b","order_col":"id","value":"6"}
2024-10-10 04:04:08.152181 | checkpoint_position | {"type":"None"}

finished.log

2024-10-10 04:04:07.803422 | {"type":"RdbSnapshotFinished","db_type":"mysql","schema":"test_db","tb":"a"}
2024-10-10 04:04:08.844988 | {"type":"RdbSnapshotFinished","db_type":"mysql","schema":"test_db","tb":"b"}

Configuration

Resume from Target Database

[global]
//[Optional]
task_id=task1

[resumer]
resume_type=from_target
//[Optional] Default value is apecloud_metadata.apedts_task_position
table_full_name=apecloud_resumer_test.ape_task_position
max_connections=1

When the task starts, it will automatically ensure that the target has the apecloud_resumer_test.ape_task_position database table configured, and initialize a connection pool with a maximum of 1 connection for subsequent resume-related position recording and querying.

Resume from Specified Database

[global]
//[Optional]
task_id=task1

[resumer]
resume_type=from_db
url=mysql://xxx:xxx@127.0.0.1:3306
db_type=mysql
//[Optional] Default value is apecloud_metadata.apedts_task_position
table_full_name=apecloud_resumer_test.ape_task_position
max_connections=1

When the task starts, it will initialize a connection pool with a maximum of 1 connection and automatically ensure that the configured database instance has the apecloud_resumer_test.ape_task_position database table for subsequent resume-related position recording and querying.

Resume from Log

[runtime]
log_dir=/logs

[resumer]
resume_type=from_log
//[Optional] Uses runtime.log_dir by default
log_dir=/other_logs

Looks for position.log and finished.log in /other_logs to resume from checkpoint.

Reference Test Cases

  • dt-tests/tests/mysql_to_mysql/snapshot/resume_log_test
  • dt-tests/tests/mysql_to_mysql/snapshot/resume_db_test
  • dt-tests/tests/pg_to_pg/snapshot/resume_log_test
  • dt-tests/tests/pg_to_pg/snapshot/resume_db_test
  • dt-tests/tests/mongo_to_mongo/snapshot/resume_log_test