SlideShare ist ein Scribd-Unternehmen logo
1 von 43
APACHE BOOKKEEPER KV
STORE AND USE CASES
SHIVJI KUMAR JHA
@ShivjiJha
TRACK: MESSAGING
in/shivjijha
About Me
• Senior MTS at Nutanix
• Platform Engineer
– DBs, SOA, Infra, Streams
• Love
– Distributed data systems
– Open-source software (OSS)
• OSS Contributions
– Apache Pulsar
– MySQL
Contents
Why KV
store?
What is
bookkeeper?
How to use
bookkeeper?
History of Data Stores
4
A Brief History…
Of Databases
• 1960: Flat Files
• 1960s: Hierarchical Databases
• 1980: SQL / Relational Databases
– High-level language
– Abstractions: Schema, Transactions, Indexes
• 2004: NoSQL
– Scale & Availability above all
– No relational model
• 2010s: Distributed SQL
Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
A Brief History…
Of Data Streams
• Apache Kafka:
– Built inside LinkedIn
– 2011: Kafka becomes open source
– 2012: Graduated from Apache incubator
• Apache Pulsar
– Built at Yahoo
– 2016: Contributed to Open source
– 2018: Top-level Apache project
Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
A Brief History…
Of Apache Bookkeeper
• Born at Yahoo! Research
• Evolved from Apache Zookeeper (ZK)
• 2011: Incubated as subproject under ZK
• 2015: Top level Apache Project
Apache Bookkeeper
What is Bookkeeper?
• Infinite Stream of log records
• Horizontally scalable storage
• Fault-tolerant
• Low latency writes
• Offers
– Durability
– Tunable replication
– Strong consistency
Use cases
• As write ahead log (WAL) in
– HDFS namenode (first use case)
– Twitter’s Manhattan : distributed KV
– HerdDB : JVM embeddable distributed
database
• Apache Pulsar : Message & Offset store
• Salesforce : Internal database of
application storage
• Pravega (DellEMC) : Message store
• Bytedance : Internal metadata store
B-tree vs LSM
• Primary data structures for storage engines.
• B-trees behind traditional databases
– MySQL, PostgreSQL
– Indexing for expensive random access on
HDD
• Log structured Merge (LSM) trees
– Good write throughput
– Behind variety of the modern workloads
• Stream : Apache Bookkeeper, Kafka
Streams, Apache Pulsar, Flink,
• OLTP : MyRocks, MongoRocks,
Rocksandra, YugaByte, CockroachDB
• TSDB : influxDB
– Take advantage of SSD throughput
Key Value stores
• KV stores as common core behind:
– Key Value databases
– Relational databases
• Key : Primary Key, Value: Complete row
– Document databases
• Key : Primary Key (internal?), Value: document
– Streaming Platforms
• rocksDB based : Apache Pulsar, Kafka Streams, Flink
• Good idea to have less clusters!
• Good idea to have same base (KV) across clusters!
Bookkeeper = ZK + rocksDB
RocksDB
• Implements LSM
• Embeddable
• Key Value store
• Append only
– Low latency
– High throughput
• Duplicate record for update / delete
• Compaction to remove stale /
deleted records
Zookeeper
• Metadata store
• Cluster coordination
• Service discovery
• Leader election
• Dynamic configurations
• Feature flags
Bookkeeper Internals
18
Bookkeeper Cluster : Replication
https://medium.com/streamnative/why-apache-bookkeeper-part-1-consistency-durability-availability-ac697a3cf7a1
Bookkeeper : Typical Usage
https://medium.com/streamnative/why-apache-bookkeeper-part-1-consistency-durability-availability-ac697a3cf7a1
Bookkeeper Glossary
Entries
Actual data (bytes) written to ledgers.
Plus, metadata
Entry: [ledgerId, entryId, Checksum…]
Entry Log File
Actual physical file with entries
Offsets indexed for fast lookup.
Asynchronous garbage collection of
deleted and stale entries.
Bookkeeper Glossary
Journal
Transaction logs (Write ahead log)
Append only semantics
Low latency, high throughput writes
Turn on / off (durability vs
throughput)
Ledger
Logical unit of storage for APIs in bookkeeper.
Append-only semantics
Indexed & cached for faster lookups
Includes:[Status, lastEntryId, [entries] replication
factors…]
Bookkeeper : Client & Server
•Bookkeeper has no leader / follower.
•Same responsibility across nodes.
•Thick bookie client implements replication, coordination, consistency.
•Separate Auto detection and restore module if entries lost.
Client Based Replication
•Create ledger (sync / async)
•Append entry to ledger
•Read entry from ledger
•Delete Ledger (sync / async)
Bookkeeper APIs
Bookkeeper Server : Write Path
BOOKKEEPER
CLIENT
Bookkeeper Server
Bookkeeper Client
Journal (WAL)
Bookkeeper Server : Write Path
BOOKKEEPER
CLIENT
Bookkeeper Server
Bookkeeper Client
Journal (WAL)
LEDGER APIs
Writes
Bookkeeper Server : Append only
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL)
LEDGER APIs
Writes
Bookkeeper Server : Write Path
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL) Write Cache
LEDGER APIs
Writes
Bookkeeper Server : Read-Write
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL) Write Cache
Read Cache
LEDGER APIs
LEDGER APIs
Writes
Reads
Entry
Log
Files
Bookkeeper Server : IO isolation
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL) Write Cache
Read Cache
LEDGER APIs
LEDGER APIs
disk
disk
Writes
Reads
Entry
Log
Files
Bookkeeper Server : Read Path
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL) Write Cache
Entry
Log
Files
Read Cache
LEDGER APIs
Reads
index
Bookkeeper Server : Flush
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL) Write Cache
Entry
Log
Files
Read Cache
LEDGER APIs
Reads
Asynchronous, batched flush!
Bookkeeper : Offsets
• Sent in response to write()
• Cumulative ack
• Readers can read until LAC
Last add confirmed (LAC)
• Last entry client requested to write.
• Write in progress, not acked yet.
Last add pushed (LAP)
READERS
LAC LAP
WRITER
Entries
Bookkeeper : Recovery
READERS
LAC LAP
WRITER
Entries
Bookkeeper : Recovery
•Writer crashed / network partition
•Client retries / fails
•Retry reaches new bookkeeper node
Bookkeeper Failure
•Put Ledger state in recovery
•Fences old file with consensus.
•Write to new file
•New owner back ? Split brain?
New Bookkeeper owner
READERS
LAC LAP
WRITER
Entries
NEW
WRITER
Bookkeeper:
A Pulsar Use case
35
Apache Pulsar 101
PRODUCER CONSUMER
• Cloud-native,
• Distributed messaging and
• Distributed streaming platform
Apache Pulsar
• Modular Design
• Horizontally scalable
• Low latency & high throughput
• Multi-tenancy
• Geo Replication
Highlights
Apache Pulsar 101
PRODUCER CONSUMER
BROKER
BOOKKEEPER
ZOOKEEPER
Bookkeeper Server : Read-Write
BOOKKEEPER
CLIENT
Bookkeeper Client Bookkeeper Server
Journal (WAL) Write Cache
Read Cache
LEDGER APIs
LEDGER APIs
Writes
Reads
Entry
Log
Files
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Bookkeeper Server
Journal (WAL) Write Cache
Read Cache
LEDGER APIs
LEDGER APIs
Writes
Reads
Entry
Log
Files
BROKER
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Bookkeeper Server
Journal (WAL) Write Cache
Read Cache
LEDGER APIs
LEDGER APIs
Writes
Reads
Entry
Log
Files
BROKER
TOPIC1 TOPIC2 TOPIC3
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Bookkeeper Server
Journal (WAL) Write Cache
Read Cache
LEDGER APIs
LEDGER APIs
Writes
Reads
Entry
Log
Files
BROKER
TOPIC1 TOPIC2 TOPIC3
PRODUCER CONSUMER
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Topic Ledger Mapping
BROKER
TOPIC1 TOPIC2 TOPIC3
TOPIC 3
MANAGED LEDGER
PRODUCER CONSUMER
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Topic Ledger Mapping
BROKER
TOPIC1 TOPIC2 TOPIC3
TOPIC 3
MANAGED LEDGER
PRODUCER CONSUMER
Ledgers[]
schemaLedgers[]
compactedLedgers[]
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Topic Ledger Mapping
BROKER
TOPIC1 TOPIC2 TOPIC3
TOPIC 3
MANAGED LEDGER
PRODUCER CONSUMER
Ledgers[]
schemaLedgers[]
compactedLedgers[]
ledgerId,
entriesRange,
Ledger size, metadata
Pulsar Broker & Bookkeeper
BOOKKEEPER
CLIENT
Pulsar Broker Topic Ledger Mapping
BROKER
TOPIC1 TOPIC2 TOPIC3
TOPIC 3
MANAGED LEDGER
PRODUCER CONSUMER
Ledgers[]
schemaLedgers[]
compactedLedgers[]
ledgerId,
entriesRange,
Ledger size, offloaded?
CURSOR 1 CURSOR 2
CONSUMER 1 CONSUMER 1
Cluster Coordination: Zookeeper
• Pointers to data
– Topic ledgers mapping
– Ledger topics mapping
– Topic schema mapping
• Service Discovery
– List of available bookies
– List of available brokers
– Which broker owns which topic
– How much load on which topic etc
• Distributed coordination
– Locks
– Leader election
• System Configuration
– Dynamic configs for hot reload
– Feature flags
• Provisioning Configuration
– Metadata for tenants, namespaces
– Namespace policies
Summary
• Plethora of databases, workloads, use cases.
– Too many clusters – difficult to operate
• RocksDB : very popular LSM implementation
– High write throughput, leverages SSD throughput
– Varied workloads on rocksDB : databases, queues, streams
• Bookkeeper : Consistent distributed KV base
– Infinite commit log
– Can use in a lot of different ways
– Apache Pulsar is one example, but a lot more building up!
– Fault tolerant, horizontally scalable store behind Pulsar
References
1. Mark Callaghan - Choosing between Efficiency and
Performance with RocksDB
2. FoundationDB Record Layer – White paper
3. Why Apache Bookkeeper part 1 :
consistency,durability,availability By Sijie Guo
4. Understanding How Apache Pulsar works By Jack Vanlightly
5. How Pulsar stores your data – Pulsar Summit NA 2021 By
Shivji Kumar Jha
6. Convergence of Messaging, streaming and storage By Sijie
Guo
THANK YOU
QUESTIONS?
@ShivjiJha
shiv4289
in/shivjijha/
ShivjiKumarJha

Weitere ähnliche Inhalte

Was ist angesagt?

Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021StreamNative
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaAIMDek Technologies
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First OverviewRicardo Paiva
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformMatteo Merli
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluStorage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluHostedbyConfluent
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 

Was ist angesagt? (20)

Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First Overview
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluStorage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 

Ähnlich wie Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases

How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用LINE Corporation
 
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataAkka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataLightbend
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Lucas Jellema
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresChristian Posta
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
Deep Dive on AWS Lambda - January 2017 AWS Online Tech TalksDeep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
Deep Dive on AWS Lambda - January 2017 AWS Online Tech TalksAmazon Web Services
 
Why do you consider to adopt Koha Open Source Integrated Library System for y...
Why do you consider to adopt Koha Open Source Integrated Library System for y...Why do you consider to adopt Koha Open Source Integrated Library System for y...
Why do you consider to adopt Koha Open Source Integrated Library System for y...Md. Zahid Hossain Shoeb
 
Riak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud StorageRiak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud Storagebuildacloud
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyContinuent
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)Alexander Goida
 
WSO2Con USA 2017: Building an Effective API Architecture
WSO2Con USA 2017: Building an Effective API ArchitectureWSO2Con USA 2017: Building an Effective API Architecture
WSO2Con USA 2017: Building an Effective API ArchitectureWSO2
 

Ähnlich wie Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases (20)

How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
 
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast DataAkka Streams And Kafka Streams: Where Microservices Meet Fast Data
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
Deep Dive on AWS Lambda - January 2017 AWS Online Tech TalksDeep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks
 
Why do you consider to adopt Koha Open Source Integrated Library System for y...
Why do you consider to adopt Koha Open Source Integrated Library System for y...Why do you consider to adopt Koha Open Source Integrated Library System for y...
Why do you consider to adopt Koha Open Source Integrated Library System for y...
 
Riak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud StorageRiak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud Storage
 
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / ProxyTraining Slides: Basics 103: The Power of Tungsten Connector / Proxy
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)
 
WSO2Con USA 2017: Building an Effective API Architecture
WSO2Con USA 2017: Building an Effective API ArchitectureWSO2Con USA 2017: Building an Effective API Architecture
WSO2Con USA 2017: Building an Effective API Architecture
 

Mehr von Shivji Kumar Jha

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesShivji Kumar Jha
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesShivji Kumar Jha
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxShivji Kumar Jha
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarShivji Kumar Jha
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingShivji Kumar Jha
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarShivji Kumar Jha
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar clusterShivji Kumar Jha
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar clusterShivji Kumar Jha
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationShivji Kumar Jha
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesShivji Kumar Jha
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityShivji Kumar Jha
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterShivji Kumar Jha
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDsShivji Kumar Jha
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationShivji Kumar Jha
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesShivji Kumar Jha
 

Mehr von Shivji Kumar Jha (19)

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
 

KĂźrzlich hochgeladen

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vĂĄzquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

KĂźrzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases

  • 1. APACHE BOOKKEEPER KV STORE AND USE CASES SHIVJI KUMAR JHA @ShivjiJha TRACK: MESSAGING in/shivjijha
  • 2. About Me • Senior MTS at Nutanix • Platform Engineer – DBs, SOA, Infra, Streams • Love – Distributed data systems – Open-source software (OSS) • OSS Contributions – Apache Pulsar – MySQL
  • 4. History of Data Stores 4
  • 5. A Brief History… Of Databases • 1960: Flat Files • 1960s: Hierarchical Databases • 1980: SQL / Relational Databases – High-level language – Abstractions: Schema, Transactions, Indexes • 2004: NoSQL – Scale & Availability above all – No relational model • 2010s: Distributed SQL Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
  • 6. A Brief History… Of Data Streams • Apache Kafka: – Built inside LinkedIn – 2011: Kafka becomes open source – 2012: Graduated from Apache incubator • Apache Pulsar – Built at Yahoo – 2016: Contributed to Open source – 2018: Top-level Apache project Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
  • 7. A Brief History… Of Apache Bookkeeper • Born at Yahoo! Research • Evolved from Apache Zookeeper (ZK) • 2011: Incubated as subproject under ZK • 2015: Top level Apache Project
  • 8. Apache Bookkeeper What is Bookkeeper? • Infinite Stream of log records • Horizontally scalable storage • Fault-tolerant • Low latency writes • Offers – Durability – Tunable replication – Strong consistency Use cases • As write ahead log (WAL) in – HDFS namenode (first use case) – Twitter’s Manhattan : distributed KV – HerdDB : JVM embeddable distributed database • Apache Pulsar : Message & Offset store • Salesforce : Internal database of application storage • Pravega (DellEMC) : Message store • Bytedance : Internal metadata store
  • 9. B-tree vs LSM • Primary data structures for storage engines. • B-trees behind traditional databases – MySQL, PostgreSQL – Indexing for expensive random access on HDD • Log structured Merge (LSM) trees – Good write throughput – Behind variety of the modern workloads • Stream : Apache Bookkeeper, Kafka Streams, Apache Pulsar, Flink, • OLTP : MyRocks, MongoRocks, Rocksandra, YugaByte, CockroachDB • TSDB : influxDB – Take advantage of SSD throughput
  • 10. Key Value stores • KV stores as common core behind: – Key Value databases – Relational databases • Key : Primary Key, Value: Complete row – Document databases • Key : Primary Key (internal?), Value: document – Streaming Platforms • rocksDB based : Apache Pulsar, Kafka Streams, Flink • Good idea to have less clusters! • Good idea to have same base (KV) across clusters!
  • 11. Bookkeeper = ZK + rocksDB RocksDB • Implements LSM • Embeddable • Key Value store • Append only – Low latency – High throughput • Duplicate record for update / delete • Compaction to remove stale / deleted records Zookeeper • Metadata store • Cluster coordination • Service discovery • Leader election • Dynamic configurations • Feature flags
  • 13. Bookkeeper Cluster : Replication https://medium.com/streamnative/why-apache-bookkeeper-part-1-consistency-durability-availability-ac697a3cf7a1
  • 14. Bookkeeper : Typical Usage https://medium.com/streamnative/why-apache-bookkeeper-part-1-consistency-durability-availability-ac697a3cf7a1
  • 15. Bookkeeper Glossary Entries Actual data (bytes) written to ledgers. Plus, metadata Entry: [ledgerId, entryId, Checksum…] Entry Log File Actual physical file with entries Offsets indexed for fast lookup. Asynchronous garbage collection of deleted and stale entries.
  • 16. Bookkeeper Glossary Journal Transaction logs (Write ahead log) Append only semantics Low latency, high throughput writes Turn on / off (durability vs throughput) Ledger Logical unit of storage for APIs in bookkeeper. Append-only semantics Indexed & cached for faster lookups Includes:[Status, lastEntryId, [entries] replication factors…]
  • 17. Bookkeeper : Client & Server •Bookkeeper has no leader / follower. •Same responsibility across nodes. •Thick bookie client implements replication, coordination, consistency. •Separate Auto detection and restore module if entries lost. Client Based Replication •Create ledger (sync / async) •Append entry to ledger •Read entry from ledger •Delete Ledger (sync / async) Bookkeeper APIs
  • 18. Bookkeeper Server : Write Path BOOKKEEPER CLIENT Bookkeeper Server Bookkeeper Client Journal (WAL)
  • 19. Bookkeeper Server : Write Path BOOKKEEPER CLIENT Bookkeeper Server Bookkeeper Client Journal (WAL) LEDGER APIs Writes
  • 20. Bookkeeper Server : Append only BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) LEDGER APIs Writes
  • 21. Bookkeeper Server : Write Path BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) Write Cache LEDGER APIs Writes
  • 22. Bookkeeper Server : Read-Write BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) Write Cache Read Cache LEDGER APIs LEDGER APIs Writes Reads Entry Log Files
  • 23. Bookkeeper Server : IO isolation BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) Write Cache Read Cache LEDGER APIs LEDGER APIs disk disk Writes Reads Entry Log Files
  • 24. Bookkeeper Server : Read Path BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) Write Cache Entry Log Files Read Cache LEDGER APIs Reads index
  • 25. Bookkeeper Server : Flush BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) Write Cache Entry Log Files Read Cache LEDGER APIs Reads Asynchronous, batched flush!
  • 26. Bookkeeper : Offsets • Sent in response to write() • Cumulative ack • Readers can read until LAC Last add confirmed (LAC) • Last entry client requested to write. • Write in progress, not acked yet. Last add pushed (LAP) READERS LAC LAP WRITER Entries
  • 27. Bookkeeper : Recovery READERS LAC LAP WRITER Entries
  • 28. Bookkeeper : Recovery •Writer crashed / network partition •Client retries / fails •Retry reaches new bookkeeper node Bookkeeper Failure •Put Ledger state in recovery •Fences old file with consensus. •Write to new file •New owner back ? Split brain? New Bookkeeper owner READERS LAC LAP WRITER Entries NEW WRITER
  • 30. Apache Pulsar 101 PRODUCER CONSUMER • Cloud-native, • Distributed messaging and • Distributed streaming platform Apache Pulsar • Modular Design • Horizontally scalable • Low latency & high throughput • Multi-tenancy • Geo Replication Highlights
  • 31. Apache Pulsar 101 PRODUCER CONSUMER BROKER BOOKKEEPER ZOOKEEPER
  • 32. Bookkeeper Server : Read-Write BOOKKEEPER CLIENT Bookkeeper Client Bookkeeper Server Journal (WAL) Write Cache Read Cache LEDGER APIs LEDGER APIs Writes Reads Entry Log Files
  • 33. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Bookkeeper Server Journal (WAL) Write Cache Read Cache LEDGER APIs LEDGER APIs Writes Reads Entry Log Files BROKER
  • 34. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Bookkeeper Server Journal (WAL) Write Cache Read Cache LEDGER APIs LEDGER APIs Writes Reads Entry Log Files BROKER TOPIC1 TOPIC2 TOPIC3
  • 35. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Bookkeeper Server Journal (WAL) Write Cache Read Cache LEDGER APIs LEDGER APIs Writes Reads Entry Log Files BROKER TOPIC1 TOPIC2 TOPIC3 PRODUCER CONSUMER
  • 36. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Topic Ledger Mapping BROKER TOPIC1 TOPIC2 TOPIC3 TOPIC 3 MANAGED LEDGER PRODUCER CONSUMER
  • 37. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Topic Ledger Mapping BROKER TOPIC1 TOPIC2 TOPIC3 TOPIC 3 MANAGED LEDGER PRODUCER CONSUMER Ledgers[] schemaLedgers[] compactedLedgers[]
  • 38. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Topic Ledger Mapping BROKER TOPIC1 TOPIC2 TOPIC3 TOPIC 3 MANAGED LEDGER PRODUCER CONSUMER Ledgers[] schemaLedgers[] compactedLedgers[] ledgerId, entriesRange, Ledger size, metadata
  • 39. Pulsar Broker & Bookkeeper BOOKKEEPER CLIENT Pulsar Broker Topic Ledger Mapping BROKER TOPIC1 TOPIC2 TOPIC3 TOPIC 3 MANAGED LEDGER PRODUCER CONSUMER Ledgers[] schemaLedgers[] compactedLedgers[] ledgerId, entriesRange, Ledger size, offloaded? CURSOR 1 CURSOR 2 CONSUMER 1 CONSUMER 1
  • 40. Cluster Coordination: Zookeeper • Pointers to data – Topic ledgers mapping – Ledger topics mapping – Topic schema mapping • Service Discovery – List of available bookies – List of available brokers – Which broker owns which topic – How much load on which topic etc • Distributed coordination – Locks – Leader election • System Configuration – Dynamic configs for hot reload – Feature flags • Provisioning Configuration – Metadata for tenants, namespaces – Namespace policies
  • 41. Summary • Plethora of databases, workloads, use cases. – Too many clusters – difficult to operate • RocksDB : very popular LSM implementation – High write throughput, leverages SSD throughput – Varied workloads on rocksDB : databases, queues, streams • Bookkeeper : Consistent distributed KV base – Infinite commit log – Can use in a lot of different ways – Apache Pulsar is one example, but a lot more building up! – Fault tolerant, horizontally scalable store behind Pulsar
  • 42. References 1. Mark Callaghan - Choosing between Efficiency and Performance with RocksDB 2. FoundationDB Record Layer – White paper 3. Why Apache Bookkeeper part 1 : consistency,durability,availability By Sijie Guo 4. Understanding How Apache Pulsar works By Jack Vanlightly 5. How Pulsar stores your data – Pulsar Summit NA 2021 By Shivji Kumar Jha 6. Convergence of Messaging, streaming and storage By Sijie Guo