1、MyRocks MyRocks MyRocks MyRocksMyRocks Log-Structured Merge-Tree (LSM-Tree)Mutli-Level LSM-TreeLevel-Based Compaction Column Family (CF)One LSM-Tree Per CFSeveral MemTables MemTableActive MemTableImmutable MemTable Global WAL FilesMyRocks WriteBatch Every Trx has WriteBatch DML buffered in WriteBatc
2、h Trx Commit Writebatch - WAL, Flush WriteBatch - Active MemTable When Full, Active - Immutable Data Flush&Compaction Immutable - L0 SST File, Async L1Ln Compaction, AsyncMyRocks Read Path1. WriteBatch of THE Trx2. Active MemTable of THE CF3. Immutable MemTable of THE CF4. Global Block Cache5. SST F
3、ile of THE CF from L0 to LnMyRocks FeaturesShortcoming and LimitsRC and RR Isolation levelOnline DDL, slowly and mem-costlyOnly Row-Based ReplicationRow-Level Locking, MVCCWAL based Crash safeNo Spatial, Fulltext indexsPowerful CompressionUnstable than InnoDB, Bugs on TTL Read Performance, Range Phy
4、sical Backup (local and remote)More Efficiency Slave Replication MyRocks MyRocks MyRocksMyRocks InnoDBRocksDBInnoDB vs RocksDBRocksDBAppend-only10%InnoDB15/16MyRocks RocksDB16 RocksDB7+1 bytesInnoDB6+7 bytesLn SSTseq id0MyRocks InnoDB PageFile Block RocksDBSST FileFile BlockRocksDBSSTInnoDBMyRocksSS
5、DInnoDBRows - Page + DoublewriteRocksDB 1 + 1 + fanout * ( n 2) / 2nLSMfanout MyRocks MyRocks MyRocks1 /InnoDB1014GBDDB16+ MySQLmysqld 1TB+InnoDB, key_block_size=81 RocksDB916GB100GB MyRocksSlaveMyRocksRocksDB Snappy322GB700GB2/320TB SSD1 Write Buffer (MemTable)table/indexCFCF u Block Cache RocksDBI
6、nnoDBp Block Cache vs Buffer Poolrocksdb_block_cache_sizerocksdb_cache_index_and_filter_blocksMemTablemax_write_buffer_numbermax_write_buffer_number_to_maintainmin_write_buffer_number_to_mergep Write Buffer (CF based) vs Change Buffer (Global)p tcmalloc/jemallocp Per CFMem UsageGlobalMem Usage2 u tp
7、s 5k+1rocksdb_rpl_skip_tx_api,DDB + MyRocksrocksdb_read_free_rpl_tables2LOGICAL_LOCKDATABASERedis10w+tps4w+qpsRedis 2 u 1nFlinkDDB1DDB2myrocks1myrocks8myrocks9myrocks162 Write Stallpending-compaction Compactionrocksdb_max_background_jobs 8-16cpu2 Write Stallpending-compaction pending bytes limitp so
8、ft_pending_compaction_bytes_limit - 512Gp hard_pending_compaction_bytes_limit - 768GStalll0 sst2 Write Stallpending-compaction l0 sstp level0_slowdown_writes_trigger 100 - 500p level0_stop_writes_trigger 100 - 500Stallpending bytes limits2 Write Stallpending-compaction ppppplevel0_slowdown_writes_tr
9、igger 500level0_stop_writes_trigger 500soft_pending_compaction_bytes_limit 1024Ghard_pending_compaction_bytes_limit 1536Gcompression_per_level=kLZ4Compressionpending compaction2 0.5w/s1:11.5w/s2 2ms128-70Glz450%+ SSD3 u MyRocks XA DDBXA1MyRocksXA Binlog InnoDB2RocksDB 3 MySQL3 u 1MyRocksXAMySQL 5.7
10、XAMyRocksTransaction_ctxMasterXA PREPAREsessionrocksdb_close_connection Slavedetach/reattachreplace_native_transaction_in_thdtrx_t(InnoDB) or rocksdb:Transaction XA STARTdetachworker XA PREPAREreattachWL#6860: Binlogging XA-prepared transaction3 1 MyRocks XA BugfixXA3 2 XA, InnoDB - RocksDB3 MyRocks
11、 Online DDL NDCXA PREPARE 1InnoDBRocksDBp alter table xxx ENGINE=ROCKSDB 2RocksDB gtid_executedp InnoDBDDL XA RECOVERXACommitdisabled_storage_engines=MyISAM,InnoDBp ENGINE=ROCKSDB gtid_executed CommitXA PREPAREset session gtid_next=the gtid of create table sql;create table3 70% 8G, CPU30+MyRocksMyRocks100w+u MyRocksu MyRocksMyRocks MyRocks on RDSp p RedisInnoDBp SchemaTTL MyRocks 8.0 MyRocks Fast DDL MyRocks skip TRX API (Master/Slave) TTL replication Bugfix XA