TiDB 2.0 Release Notes

April 29, 2024 · View on GitHub

On April 27, 2018, TiDB 2.0 GA is released! Compared with TiDB 1.0, this release has great improvement in MySQL compatibility, SQL optimizer, executor, and stability.

TiDB

  • SQL Optimizer
    • Use more compact data structure to reduce the memory usage of statistics information
    • Speed up loading statistics information when starting a tidb-server process
    • Support updating statistics information dynamically [experimental]
    • Optimize the cost model to provide more accurate query cost evaluation
    • Use Count-Min Sketch to estimate the cost of point queries more accurately
    • Support analyzing more complex conditions to make full use of indexes
    • Support manually specifying the Join order using the STRAIGHT_JOIN syntax
    • Use the Stream Aggregation operator when the GROUP BY clause is empty to improve the performance
    • Support using indexes for the MAX/MIN function
    • Optimize the processing algorithms for correlated subqueries to support decorrelating more types of correlated subqueries and transform them to Left Outer Join
    • Extend IndexLookupJoin to be used in matching the index prefix
  • SQL Execution Engine
    • Refactor all operators using the Chunk architecture, improve the execution performance of analytical queries, and reduce memory usage. There is a significant improvement in the TPC-H benchmark result.
    • Support the Streaming Aggregation operators pushdown
    • Optimize the Insert Into Ignore statement to improve the performance by over 10 times
    • Optimize the Insert On Duplicate Key Update statement to improve the performance by over 10 times
    • Optimize Load Data to improve the performance by over 10 times
    • Push down more data types and functions to TiKV
    • Support computing the memory usage of physical operators, and specifying the processing behavior in the configuration file and system variables when the memory usage exceeds the threshold
    • Support limiting the memory usage by a single SQL statement to reduce the risk of OOM
    • Support using implicit RowID in CRUD operations
    • Improve the performance of point queries
  • Server
    • Support the Proxy Protocol
    • Add more monitoring metrics and refine the log
    • Support validating the configuration files
    • Support obtaining the information of TiDB parameters through HTTP API
    • Resolve Lock in the Batch mode to speed up garbage collection
    • Support multi-threaded garbage collection
    • Support TLS
  • Compatibility
    • Support more MySQL syntaxes
    • Support modifying the lower_case_table_names system variable in the configuration file to support the OGG data replication tool
    • Improve compatibility with the Navicat management tool
    • Support displaying the table creating time in Information_Schema
    • Fix the issue that the return types of some functions/expressions differ from MySQL
    • Improve compatibility with JDBC
    • Support more SQL Modes
  • DDL
    • Optimize the Add Index operation to greatly improve the execution speed in some scenarios
    • Attach a lower priority to the Add Index operation to reduce the impact on online business
    • Output more detailed status information of the DDL jobs in Admin Show DDL Jobs
    • Support querying the original statements of currently running DDL jobs using Admin Show DDL Job Queries JobID
    • Support recovering the index data using Admin Recover Index for disaster recovery
    • Support modifying Table Options using the Alter statement

PD

  • Support Region Merge, to merge empty Regions after deleting data [experimental]
  • Support Raft Learner [experimental]
  • Optimize the scheduler
    • Make the scheduler to adapt to different Region sizes
    • Improve the priority and speed of restoring data during TiKV outage
    • Speed up data transferring when removing a TiKV node
    • Optimize the scheduling policies to prevent the disks from becoming full when the space of TiKV nodes is insufficient
    • Improve the scheduling efficiency of the balance-leader scheduler
    • Reduce the scheduling overhead of the balance-region scheduler
    • Optimize the execution efficiency of the hot-region scheduler
  • Operations interface and configuration
    • Support TLS
    • Support prioritizing the PD leaders
    • Support configuring the scheduling policies based on labels
    • Support configuring stores with a specific label not to schedule the Raft leader
    • Support splitting Region manually to handle the hotspot in a single Region
    • Support scattering a specified Region to manually adjust Region distribution in some cases
    • Add check rules for configuration parameters and improve validity check of the configuration items
  • Debugging interface
    • Add the Drop Region debugging interface
    • Add the interfaces to enumerate the health status of each PD
  • Statistics
    • Add statistics about abnormal Regions
    • Add statistics about Region isolation level
    • Add scheduling related metrics
  • Performance
    • Keep the PD leader and the etcd leader together in the same node to improve write performance
    • Optimize the performance of Region heartbeat

TiKV

  • Features
    • Protect critical configuration from incorrect modification
    • Support Region Merge [experimental]
    • Add the Raw DeleteRange API
    • Add the GetMetric API
    • Add Raw Batch Put, Raw Batch Get, Raw Batch Delete and Raw Batch Scan
    • Add Column Family options for the RawKV API and support executing operation on a specific Column Family
    • Support Streaming and Streaming Aggregation in Coprocessor
    • Support configuring the request timeout of Coprocessor
    • Carry timestamps with Region heartbeats
    • Support modifying some RocksDB parameters online, such as block-cache-size
    • Support configuring the behavior of Coprocessor when it encounters some warnings or errors
    • Support starting in the importing data mode to reduce write amplification during the data importing process
    • Support manually splitting Region in halves
    • Improve the data recovery tool tikv-ctl
    • Return more statistics in Coprocessor to guide the behavior of TiDB
    • Support the ImportSST API to import SST files [experimental]
    • Add the TiKV Importer binary to integrate with TiDB Lightning to import data quickly [experimental]
  • Performance
    • Optimize read performance using ReadPool and increase the raw_get/get/batch_get by 30%
    • Improve metrics performance
    • Inform PD immediately once the Raft snapshot process is completed to speed up balancing
    • Solve performance jitter caused by RocksDB flushing
    • Optimize the space reclaiming mechanism after deleting data
    • Speed up garbage cleaning while starting the server
    • Reduce the I/O overhead during replica migration using DeleteFilesInRanges
  • Stability
    • Fix the issue that gRPC call does not get returned when the PD leader switches
    • Fix the issue that it is slow to offline nodes caused by snapshots
    • Limit the temporary space usage consumed by migrating replicas
    • Report the Regions that cannot elect a leader for a long time
    • Update the Region size information in time according to compaction events
    • Limit the size of scan lock to avoid request timeout
    • Limit the memory usage when receiving snapshots to avoid OOM
    • Increase the speed of CI test
    • Fix the OOM issue caused by too many snapshots
    • Configure keepalive of gRPC
    • Fix the OOM issue caused by an increase of the Region number

TiSpark

TiSpark uses a separate version number. The current TiSpark version is 1.0 GA. The components of TiSpark 1.0 provide distributed computing of TiDB data using Apache Spark.

  • Provide a gRPC communication framework to read data from TiKV
  • Provide encoding and decoding of TiKV component data and communication protocol
  • Provide calculation pushdown, which includes:
    • Aggregate pushdown
    • Predicate pushdown
    • TopN pushdown
    • Limit pushdown
  • Provide index related support
    • Transform predicate into Region key range or secondary index
    • Optimize Index Only queries    - Adaptively downgrade index scan to table scan per Region
  • Provide cost-based optimization
    • Support statistics
    • Select index
    • Estimate broadcast table cost
  • Provide support for multiple Spark interfaces
    • Support Spark Shell
    • Support ThriftServer/JDBC
    • Support Spark-SQL interaction
    • Support PySpark Shell
    • Support SparkR