Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

A Brief Introduction of TiDB (Percona Live)

780 Aufrufe

Veröffentlicht am

This is the speech Edward Huang gave at Percona Live - Open Source Database Conference 2017.

Veröffentlicht in: Technologie
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.HelpWriting.net ⇐ So make sure to check it out!
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Yes you are right. There are many research paper writing services available now. But almost services are fake and illegal. Only a genuine service will treat their customer with quality research papers. ⇒ www.WritePaper.info ⇐
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • You can try to use this service ⇒ www.WritePaper.info ⇐ I have used it several times in college and was absolutely satisfied with the result.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

A Brief Introduction of TiDB (Percona Live)

  1. 1. A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP
  2. 2. About me ● Dongxu (Edward) Huang, Cofounder & CTO of PingCAP ● PingCAP, based in Beijing, China. ● Infrastructure software engineer, open source hacker ● Codis / TiDB / TiKV ● Golang / Python / Rust
  3. 3. What would you do when… ● RDBMS is becoming the performance bottleneck of your backend service ● The amount of data stored in RDBMS is overwhelming ● You want to do some complex queries on a sharding cluster ○ e.g. simple JOIN or GROUP BY ● Your application needs ACID transaction on a sharding cluster
  4. 4. TiDB Project - Goal ● SQL is necessary ● Transparent sharding and data movement ● 100% OLTP + 80% OLAP ○ Transaction + Complex query ● Compatible with MySQL, at most cases ● 24/7 availability, even in case of datacenter outages ○ Thanks to Raft consensus algorithm ● Open source, of course.
  5. 5. Agenda ● Technical overview of TiDB / TiKV ○ Storage ○ Distributed SQL ○ Tools ● Real-world cases and benchmarks ● Demo
  6. 6. Architecture TiKV TiKV TiKV TiKV Raft Raft Raft TiDB TiDB TiDB ... ...... ... ... Placement Driver (PD) Control flow: Balance / Failover Metadata / Timestamp request Stateless SQL Layer Distributed Storage Layer gRPC gRPC gRPCgRPC
  7. 7. Storage stack 1/2 ● TiKV is the underlying storage layer ● Physically, data is stored in RocksDB ● We build a Raft layer on top of RocksDB ○ What is Raft? ● Written in Rust! TiKV API (gRPC) Transaction MVCC Raft (gRPC) RocksDB Raw KV API (https://github.com/pingc ap/tidb/blob/master/cmd /benchraw/main.go) Transactional KV API (https://github.com/pingcap /tidb/blob/master/cmd/ben chkv/main.go)
  8. 8. RocksDB Instance Region 1:[a-e] Region 3:[k-o] Region 5:[u-z] ... Region 4:[p-t] RocksDB Instance Region 1:[a-e] Region 2:[f-j] Region 4:[p-t] ... Region 3:[k-o] RocksDB Instance Region 2:[f-j] Region 5:[u-z] Region 3:[k-o] ... RocksDB Instance Region 1:[a-e] Region 2:[f-j] Region 5:[u-z] ... Region 4:[p-t] Raft group Storage stack 2/2 ● Data is organized by Regions ● Region: a set of continuous key-value pairs RPC (gRPC) Transaction MVCC Raft RocksDB ···
  9. 9. Dynamic Multi-Raft ● What’s DynamicMulti-Raft? ○ Dynamic split / merge ● Safe split / merge Region 1:[a-e] split Region 1.1:[a-c] Region 1.2:[d-e]split
  10. 10. Safe Split: 1/4 TiKV1 Region 1:[a-e] TiKV2 Region 1:[a-e] TiKV3 Region 1:[a-e] raft raft Leader Follower Follower Raft group
  11. 11. Safe Split: 2/4 TiKV2 Region 1:[a-e] TiKV3 Region 1:[a-e] raft raft Leader Follower Follower TiKV1 Region 1.1:[a-c] Region 1.2:[d-e]
  12. 12. Safe Split: 3/4 TiKV1 Region 1.1:[a-c] Region 1.2:[d-e] Leader Follower Follower Split log (replicated by Raft) Split log TiKV2 Region 1:[a-e] TiKV3 Region 1:[a-e]
  13. 13. Safe Split: 4/4 TiKV1 Region 1.1:[a-c] Leader Region 1.2:[d-e] TiKV2 Region 1.1:[a-c] Follower Region 1.2:[d-e] TiKV3 Region 1.1:[a-c] Follower Region 1.2:[d-e] raft raft raft raft
  14. 14. Region 1 Region 3 Region 1 Region 2 Scale-out (initial state) Region 1* Region 2 Region 2 Region 3Region 3 Node A Node B Node C Node D
  15. 15. Region 1 Region 3 Region 1^ Region 2 Region 1* Region 2 Region 2 Region 3Region 3 Node A Node B Node E 1) Transfer leadership of region 1 from Node A to Node B Node C Node D Scale-out (add new node)
  16. 16. Region 1 Region 3 Region 1* Region 2 Region 2 Region 2 Region 3 Region 1 Region 3 Node A Node B 2) Add Replica on Node E Node C Node D Node E Region 1 Scale-out (balancing)
  17. 17. Region 1 Region 3 Region 1* Region 2 Region 2 Region 2 Region 3 Region 1 Region 3 Node A Node B 3) Remove Replica from Node A Node C Node D Node E Scale-out (balancing)
  18. 18. ACID Transaction ● Based on Google Percolator ● ‘Almost’ decentralized 2-phase commit ○ Timestamp Allocator ● Optimistic transaction model ● Default isolation level: Repeatable Read ● External consistency: Snapshot Isolation + Lock ■ SELECT … FOR UPDATE
  19. 19. Distributed SQL ● Full-featured SQL layer ● Predicate pushdown ● Distributed join ● Distributed cost-based optimizer (Distributed CBO)
  20. 20. TiDB SQL Layer overview
  21. 21. What happens behind a query CREATE TABLE t (c1 INT, c2 TEXT, KEY idx_c1(c1)); SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = ‘percona’;
  22. 22. Query Plan Partial Aggregate COUNT(c1) Filter c2 = “percona” Read Index idx1: (10, +∞) Physical Plan on TiKV (index scan) Read Row Data by RowID RowID Row Row Final Aggregate SUM(COUNT(c1)) DistSQL Scan Physical Plan on TiDB COUNT(c1) COUNT(c1) TiKV TiKV TiKV COUNT(c1) COUNT(c1)
  23. 23. What happens behind a query CREATE TABLE left (id INT, email TEXT,KEY idx_id(id)); CREATE TABLE right (id INT, email TEXT, KEY idx_id(id)); SELECT * FROM left join right WHERE left.id = right.id;
  24. 24. Distributed Join (HashJoin)
  25. 25. Supported Distributed Join Type ● Hash Join ● Sort merge Join ● Index-lookup Join
  26. 26. No silver bullet (anti-patterns for TiDB SQL) ● Join between large tables without index or any hints ● Get distinct values from large tables without index ● Sort without index ● Result set is too large (forget LIMIT N?)
  27. 27. Best practices ● Random, massive, read / write workload ● No hot small table ● Use transaction, but not many conflicts
  28. 28. Tools matter ● Syncer ● TiDB-Binlog ● Mydumper/MyLoader(loader) Open sourced, too. https://github.com/pingcap/tidb-tools
  29. 29. Syncer ● Synchronize data from MySQL in real-time ● Hook up as a MySQL replica MySQL (master) Syncer Save Point (disk) Rule Filter MySQL TiDB Cluster TiDB Cluster TiDB Cluster Syncer Syncerbinlog Fake slave Syncer or
  30. 30. TiDB-Binlog TiDB Server TiDB Server Sorter Pumper Pumper TiDB Server Pumper Protobuf MySQL Binlog MySQL 3rd party applicationsCistern ● Subscribe the incremental data from TiDB ● Output Protobuf formatted data or MySQL Binlog format(WIP) Another TiDB-Cluster
  31. 31. MyDumper / Loader ● Backup/restore in parallel ● Works for TiDB too ● Actually, we don’t have our own data migration tool for now
  32. 32. Use case 1: OLTP + OLAP Slave cluster Master Master MasterMaster Master syncer syncer syncer syncersyncer ● One of the most popular bike sharing companies in China ● 7-nodes TiDB cluster for order storage (OLTP). ● Hook up as MySQL Replica, synchronize data to a 10-nodes TiDB cluster for Ad-hoc OLAP. ...
  33. 33. Use case 1: Ad-hoc OLAP TiDB Elapse (3 nodes) MySQL Elapse 5.07699437s 19.93s 10.524703077s 43.23s 10.077812714s 43.33s 10.285957629s >20 mins 10.462306097s 36.81s 9.968078965s 1 min 0.27 sec 9.998030375s 44.05s 10.866549284s 43.18s
  34. 34. Use case 2: Distributed OLTP ● One of the biggest MMORPG game in China. ● 2.2 T, 18 nodes. ● Drop-in replacement for MySQL ● Distributed OLTP
  35. 35. Sysbench OS linux (ubuntu 14.04) CPU 28 ECUs, 8 vCPUs, 2.8 GHz, Intel Xeon E5-2680v2 RAM 16 G DISK 80 G (SSD) Notice: 3 replicas
  36. 36. Sysbench (Read) table count table size sysbench threads qps latency(avg/. 95) 3 nodes 16 1M rows 256 21899.59 11.69ms / 19.87ms 6 nodes 16 1M rows 256 41928.84 6.10ms / 10.96ms 9 nodes 16 1M rows 256 58044.80 4.41ms / 7.36ms
  37. 37. Sysbench (Read)
  38. 38. Sysbench (Insert) table count table size sysbench threads TPS latency(avg/. 95) 3 nodes 16 1M rows 256 6686.59 38.28ms / 78.21ms 6 nodes 16 1M rows 256 11448.08 22.36ms / 44.61ms 9 nodes 16 1M rows 512 14977.01 34.18ms / 86.85ms
  39. 39. Sysbench (Insert)
  40. 40. Roadmap ● TiSpark: Integrate TiKV with SparkSQL ● Better optimizer (Statistic && CBO) ● Json type and document store for TiDB ○ MySQL 5.7.12+ X-Plugin ● Integrate with Kubernetes ○ Operator by CoreOS
  41. 41. Thanks https://github.com/pingcap/tidb https://github.com/pingcap/tikv Contact me: huang@pingcap.com

×