SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Luxun - A Persistent Messaging
 System Tailored for Big Data
    Collecting & Analytics
                              By William
           http://bulldog2011.github.com
Performance Highlight

   On Single Server Grade Machine with Single
    Topic
       Average producing throughput > 100MBps, peak
        > 200MBps
       Average consuming throughput > 100MBps, peak
        > 200MBps
   In Networking Case,
       Throughput only limited by network and disk IO
        bandwidth
Typical Big Data or Activity Stream

   Logs generated by frontend applications or
    backend services
   User behavior data
   Application or system performance trace
   Business, application or system metrics data
   Events that need immediate action
Unified Big Data Pipeline
Luxun Design Objectives
   Fast & High-Throughput
       Top priority, close to O(1) memory access
   Persistent & Durable
       All data is persistent on disk and is crash resistant
   Producer & Consumer Separation
       Each one can work without knowing the existence of the other
   Realtime
       Produced message will be immediately visible to consumer
   Distributed
       Horizontal scalable with commodity machines
   Multiple Clients Support
       Easy integration with clients from different platforms, such as Java, C#, PHP, Ruby,
        Python, C++…
   Flexible consuming semantics
       Consume once, fanout, can even consume by index
   Light Weight
       Small footprint binary, no Zookeeper coordination
Basic Concepts
   Topic
     Logically it’s a named place to send messages to or to consume
      messages from, physically it’s a persistent queue
   Broker
     Aka Luxun server

   Message
     Datum to produce or consume

   Producer
     A role which will send messages to topics

   Consumer
     A role which will consume messages from topics

   Consumer Group
     A group of consumers that will receive only one copy of a
      message from a topic
Overall Architecture
Core Principle

   Sequential disk
    read can be
    comparable to
    or even faster
    than random
    memory read
Core Technology – Memory Mapped File
   Map files into memory, persisted by OS
     OS will be responsible to persist messages even the process
       crashes
   Can be shared between processes/threads
     Produced message immediately visible to consumer threads

   Limited by the amount of disk space you have
     Can scale very well when exceeding your main memory size

     In Java implementation, does not use heap memory directly, GC
       impact limited
Persistent Queue – Logic View

   Just like a big array or circular array, message
    appended/read by index
Persistent Queue – Consume Once &
Fanout Queue
Persistent Queue – Physical View

   Paged index file and data file
Persistent Queue - Concurrency

   Append operation is synchronized in queue
    implementation
   Read operation is already thread safe
   Array Header Index Pointer is a read/write
    barrier
Persistent Queue – Components View
Persistent Queue – Dynamic View
   Memory Mapped Sliding Window
       Leverage locality of rear append and front read access mode of queue
Communication Layer – Why Thrift

   Stable & Mature
       Created by Facebook, used in Cassandra and HBase
   High Performance
       Binary serialization protocol and non-blocking server model
   Simple & Light-Weight
       IDL driven development, auto-generate client & server side
        proxy
   Cross-Language
       Auto-generate clients for Java, C#, C++, PHP, Ruby,
        Python, …
   Flexible & Pluggable Architecture
       Programming like playing with building blocks, components
        are replaceable as needed.
Communication Layer – Components
View
Communication Layer – Luxun Thrift
IDL
Producer – The Interface
Producing Partitioning on Producer Side
Producing Partitioning through VIP
Producing Compression

   Current Support
       0 – No compression
       1 – GZip compression
       2 – Snappy compression
   Enable Compression for better utilization of
       Network bandwidth
       Disk space
Sync Producing
   Better real-time, worse throughput
       Use this mode only if real-time is the top priority
Async & Batch Producing
   Better Throughput, sacrifice a little real-time
     Should be enabled whenever possible for higher throughput
Simple Consumer
Advanced Stream Style Consumer
Advanced Consumer Internals
Consumer Group
   Within same group, consume once sematics
   Among different groups, fanout semantics
JMX Based Monitoring
Key Performance Test Observations

   On single machine, throughput is only limited
    by disk IO bandwidth
   In networking case, throughput is only limited
    by network bandwidth
       1Gbps network is ok, 10Gbps network is
        recommended
   Not sensitive to JVM heap setting,
       Memory mapped file uses off-heap memory
       4GB is ok, >8GB is recommended
   Throughput > 50 MBps even on normal PC
Key Performance Test Observations
Continue
   Performs good on both Windows and Linux
    platforms
   The throughput of async batch producing is order of
    magnitude better than sycn producing
   Flush on broker has negative impact on throughput,
    recommend to disable flush because of unique
    feature of memoy mapped file
   The throughput of one way not confirmed produce
    interface is 3 times better than two way confirmed
    produce interface
Key Performance Test Observations
Continue
   The overall performance will not change as
    number of topics increase, the throughput will
    be shared among different topics.
   Compression should be enabled for better
    network bandwidth and disk space utilization,
    Snappy has better efficiency than GZip.
Operation - Most Important Performance
Configurations
   On broker
       Flush has negative impact to performance,
        recommend to turn it off.
   On producer side
       Compression
       Sync vs async producing
       Batch size
   On consumer side
       Fetch size
Operation – Log Cleanup Configurations

   Expired log cleanup can be configured with:
       log.retention.hours – old back log page files
        outside of the retention window will be deleted
        periodically.
       log.retention.size – old back log page files
        outside of the retention size will be deleted
        periodically
Luxun vs Apache Kafka – the Main
Difference
    Luxun is inspired by Kafka, however, they have
     following main differences:
                       Luxun                  Kafka
    Persistent         Memory Mapped File     Filesystem & OS page cache
    Queue
    Communcation Thrift RPC                   Custom NIO and messaging
    layer                                     protocol

    Message            Index Based            Offset based
    access mode
    Distribution for   Random distribution    Zookeeper for distributed
    scalability                               coordination

    Partitioning       Only on server level   Partition within a topic
Credits

   Luxun borrowed design ideas and adapted source
    from following open source projects:
       Apache Kafka - http://kafka.apache.org/index.html
       Jafka - https://github.com/adyliu/jafka
       Java Chronicle - https://github.com/peter-lawrey/Java-
        Chronicle
       Fqueue - http://code.google.com/p/fqueue/
       Ashes-queue - http://code.google.com/p/ashes-queue/
       Kestrel - https://github.com/robey/kestrel
Next Steps

   Add a sharding layer for distribution and
    replication
   More clients, C#, PHP, Ruby, Python, C++,
    etc
   Big data apps, such as centralized logging,
    tracing, metrics and events systems based
    on Luxun.
Origin of the Name

   In memorial of LuXun, a great Chinese writer
Source, Docs and Downloadable



   https://github.com/bulldog2011/luxun

Weitere ähnliche Inhalte

Was ist angesagt?

Messaging With Apache ActiveMQ
Messaging With Apache ActiveMQMessaging With Apache ActiveMQ
Messaging With Apache ActiveMQBruce Snyder
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageGlusterFS
 
The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems Ioanna Tsalouchidou
 
Distributed Caching Essential Lessons (Ts 1402)
Distributed Caching   Essential Lessons (Ts 1402)Distributed Caching   Essential Lessons (Ts 1402)
Distributed Caching Essential Lessons (Ts 1402)Yury Kaliaha
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuiteEDB
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replicationMasao Fujii
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsYury Kaliaha
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Ensuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperEnsuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperhptoga
 
HBase 2.0 cluster topology
HBase 2.0 cluster topologyHBase 2.0 cluster topology
HBase 2.0 cluster topologyMikhail Antonov
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability BpChris Adkin
 
Messaging With ActiveMQ
Messaging With ActiveMQMessaging With ActiveMQ
Messaging With ActiveMQBruce Snyder
 
Migrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxMigrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxNovell
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...xKinAnx
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux ContainersJignesh Shah
 
Jug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsJug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsDavide Carnevali
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & dockerejlp12
 

Was ist angesagt? (20)

Messaging With Apache ActiveMQ
Messaging With Apache ActiveMQMessaging With Apache ActiveMQ
Messaging With Apache ActiveMQ
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
 
The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems The Chubby lock service for loosely- coupled distributed systems
The Chubby lock service for loosely- coupled distributed systems
 
Distributed Caching Essential Lessons (Ts 1402)
Distributed Caching   Essential Lessons (Ts 1402)Distributed Caching   Essential Lessons (Ts 1402)
Distributed Caching Essential Lessons (Ts 1402)
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based Applications
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Ensuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperEnsuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paper
 
HBase 2.0 cluster topology
HBase 2.0 cluster topologyHBase 2.0 cluster topology
HBase 2.0 cluster topology
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
 
Messaging With ActiveMQ
Messaging With ActiveMQMessaging With ActiveMQ
Messaging With ActiveMQ
 
Migrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxMigrating Novell GroupWise to Linux
Migrating Novell GroupWise to Linux
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
 
Jug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsJug Lugano - Scale over the limits
Jug Lugano - Scale over the limits
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & docker
 

Andere mochten auch

Empresa manufacturera
Empresa manufactureraEmpresa manufacturera
Empresa manufacturerayuridia987
 
What is wisdom_7 zafer oter
What is wisdom_7 zafer oterWhat is wisdom_7 zafer oter
What is wisdom_7 zafer oterZafer OTER
 
Principal's Talk at Yew Tee Primary School
Principal's Talk at Yew Tee Primary SchoolPrincipal's Talk at Yew Tee Primary School
Principal's Talk at Yew Tee Primary SchoolYuliana Tjioe
 
Frisbee unit plan__revised
Frisbee unit plan__revisedFrisbee unit plan__revised
Frisbee unit plan__revisedZach Taylor
 
anohana
anohanaanohana
anohanaybenjo
 
Lodチャレンジ2014の紹介
Lodチャレンジ2014の紹介Lodチャレンジ2014の紹介
Lodチャレンジ2014の紹介Yasuhiro Wada
 
Thermodynamics (2013 new edition) copy
Thermodynamics (2013 new edition)   copyThermodynamics (2013 new edition)   copy
Thermodynamics (2013 new edition) copyYuri Melliza
 
13. Effectiveness Of Erp Systems
13. Effectiveness Of Erp Systems13. Effectiveness Of Erp Systems
13. Effectiveness Of Erp SystemsDonovan Mulder
 
Marketing to children china - november 2012
Marketing to children   china - november 2012Marketing to children   china - november 2012
Marketing to children china - november 2012yuwanzi
 
Cavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositi
Cavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositiCavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositi
Cavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositiProfessor Yasser Metwally
 
7 Days of Creation, Free Bible Chart from Word Of God Team
7 Days of Creation, Free Bible Chart from Word Of God Team7 Days of Creation, Free Bible Chart from Word Of God Team
7 Days of Creation, Free Bible Chart from Word Of God Teamyesudas.rs
 
Bab 16 17 18 19 penjas kelas 11 sma (xi)
Bab 16 17 18 19 penjas kelas 11 sma (xi)Bab 16 17 18 19 penjas kelas 11 sma (xi)
Bab 16 17 18 19 penjas kelas 11 sma (xi)wah yuni
 
Nutricion edad preescolar(1)
Nutricion edad preescolar(1)Nutricion edad preescolar(1)
Nutricion edad preescolar(1)ximenalauraquispe
 
Yara Fertilizer Industry Handbook
Yara Fertilizer Industry HandbookYara Fertilizer Industry Handbook
Yara Fertilizer Industry HandbookYara International
 

Andere mochten auch (19)

Tourism 2
Tourism 2Tourism 2
Tourism 2
 
Sql dbx
Sql dbxSql dbx
Sql dbx
 
Empresa manufacturera
Empresa manufactureraEmpresa manufacturera
Empresa manufacturera
 
What is wisdom_7 zafer oter
What is wisdom_7 zafer oterWhat is wisdom_7 zafer oter
What is wisdom_7 zafer oter
 
Principal's Talk at Yew Tee Primary School
Principal's Talk at Yew Tee Primary SchoolPrincipal's Talk at Yew Tee Primary School
Principal's Talk at Yew Tee Primary School
 
Frisbee unit plan__revised
Frisbee unit plan__revisedFrisbee unit plan__revised
Frisbee unit plan__revised
 
anohana
anohanaanohana
anohana
 
Lodチャレンジ2014の紹介
Lodチャレンジ2014の紹介Lodチャレンジ2014の紹介
Lodチャレンジ2014の紹介
 
Unit 3 environment
Unit 3 environmentUnit 3 environment
Unit 3 environment
 
Thermodynamics (2013 new edition) copy
Thermodynamics (2013 new edition)   copyThermodynamics (2013 new edition)   copy
Thermodynamics (2013 new edition) copy
 
Internet fraud
Internet fraudInternet fraud
Internet fraud
 
13. Effectiveness Of Erp Systems
13. Effectiveness Of Erp Systems13. Effectiveness Of Erp Systems
13. Effectiveness Of Erp Systems
 
Marketing to children china - november 2012
Marketing to children   china - november 2012Marketing to children   china - november 2012
Marketing to children china - november 2012
 
Cavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositi
Cavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositiCavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositi
Cavum septum pellucidum (CSP), Cavum vergae and Cavum veli interpositi
 
7 Days of Creation, Free Bible Chart from Word Of God Team
7 Days of Creation, Free Bible Chart from Word Of God Team7 Days of Creation, Free Bible Chart from Word Of God Team
7 Days of Creation, Free Bible Chart from Word Of God Team
 
Pemakanan sihat 1
Pemakanan sihat 1Pemakanan sihat 1
Pemakanan sihat 1
 
Bab 16 17 18 19 penjas kelas 11 sma (xi)
Bab 16 17 18 19 penjas kelas 11 sma (xi)Bab 16 17 18 19 penjas kelas 11 sma (xi)
Bab 16 17 18 19 penjas kelas 11 sma (xi)
 
Nutricion edad preescolar(1)
Nutricion edad preescolar(1)Nutricion edad preescolar(1)
Nutricion edad preescolar(1)
 
Yara Fertilizer Industry Handbook
Yara Fertilizer Industry HandbookYara Fertilizer Industry Handbook
Yara Fertilizer Industry Handbook
 

Ähnlich wie Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics

GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewPhuwadon D
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @JavaPeter Lawrey
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityConSanFrancisco123
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2jeperkins4
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Arch linux and whole security concepts in linux explained
Arch linux and whole security concepts in linux explained Arch linux and whole security concepts in linux explained
Arch linux and whole security concepts in linux explained krishna kakade
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin Kuberton
 

Ähnlich wie Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics (20)

GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka PresentationJanuary 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's View
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
PROSE
PROSEPROSE
PROSE
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
tittle
tittletittle
tittle
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Arch linux and whole security concepts in linux explained
Arch linux and whole security concepts in linux explained Arch linux and whole security concepts in linux explained
Arch linux and whole security concepts in linux explained
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 

Mehr von William Yang

MicroServices architecture @ Ctrip v1.1
MicroServices architecture @ Ctrip v1.1MicroServices architecture @ Ctrip v1.1
MicroServices architecture @ Ctrip v1.1William Yang
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyWilliam Yang
 
A Big, Fast and Persistent Queue
A Big, Fast and Persistent QueueA Big, Fast and Persistent Queue
A Big, Fast and Persistent QueueWilliam Yang
 
Easy Web Serivce on iOS with Pico
Easy Web Serivce on iOS with PicoEasy Web Serivce on iOS with Pico
Easy Web Serivce on iOS with PicoWilliam Yang
 

Mehr von William Yang (6)

MicroServices architecture @ Ctrip v1.1
MicroServices architecture @ Ctrip v1.1MicroServices architecture @ Ctrip v1.1
MicroServices architecture @ Ctrip v1.1
 
Antifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A StudyAntifragile, Microservices and DevOps - A Study
Antifragile, Microservices and DevOps - A Study
 
From SOA to MSA
From SOA to MSAFrom SOA to MSA
From SOA to MSA
 
Nano
NanoNano
Nano
 
A Big, Fast and Persistent Queue
A Big, Fast and Persistent QueueA Big, Fast and Persistent Queue
A Big, Fast and Persistent Queue
 
Easy Web Serivce on iOS with Pico
Easy Web Serivce on iOS with PicoEasy Web Serivce on iOS with Pico
Easy Web Serivce on iOS with Pico
 

Kürzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics

  • 1. Luxun - A Persistent Messaging System Tailored for Big Data Collecting & Analytics By William http://bulldog2011.github.com
  • 2. Performance Highlight  On Single Server Grade Machine with Single Topic  Average producing throughput > 100MBps, peak > 200MBps  Average consuming throughput > 100MBps, peak > 200MBps  In Networking Case,  Throughput only limited by network and disk IO bandwidth
  • 3. Typical Big Data or Activity Stream  Logs generated by frontend applications or backend services  User behavior data  Application or system performance trace  Business, application or system metrics data  Events that need immediate action
  • 4. Unified Big Data Pipeline
  • 5. Luxun Design Objectives  Fast & High-Throughput  Top priority, close to O(1) memory access  Persistent & Durable  All data is persistent on disk and is crash resistant  Producer & Consumer Separation  Each one can work without knowing the existence of the other  Realtime  Produced message will be immediately visible to consumer  Distributed  Horizontal scalable with commodity machines  Multiple Clients Support  Easy integration with clients from different platforms, such as Java, C#, PHP, Ruby, Python, C++…  Flexible consuming semantics  Consume once, fanout, can even consume by index  Light Weight  Small footprint binary, no Zookeeper coordination
  • 6. Basic Concepts  Topic  Logically it’s a named place to send messages to or to consume messages from, physically it’s a persistent queue  Broker  Aka Luxun server  Message  Datum to produce or consume  Producer  A role which will send messages to topics  Consumer  A role which will consume messages from topics  Consumer Group  A group of consumers that will receive only one copy of a message from a topic
  • 8. Core Principle  Sequential disk read can be comparable to or even faster than random memory read
  • 9. Core Technology – Memory Mapped File  Map files into memory, persisted by OS  OS will be responsible to persist messages even the process crashes  Can be shared between processes/threads  Produced message immediately visible to consumer threads  Limited by the amount of disk space you have  Can scale very well when exceeding your main memory size  In Java implementation, does not use heap memory directly, GC impact limited
  • 10. Persistent Queue – Logic View  Just like a big array or circular array, message appended/read by index
  • 11. Persistent Queue – Consume Once & Fanout Queue
  • 12. Persistent Queue – Physical View  Paged index file and data file
  • 13. Persistent Queue - Concurrency  Append operation is synchronized in queue implementation  Read operation is already thread safe  Array Header Index Pointer is a read/write barrier
  • 14. Persistent Queue – Components View
  • 15. Persistent Queue – Dynamic View  Memory Mapped Sliding Window  Leverage locality of rear append and front read access mode of queue
  • 16. Communication Layer – Why Thrift  Stable & Mature  Created by Facebook, used in Cassandra and HBase  High Performance  Binary serialization protocol and non-blocking server model  Simple & Light-Weight  IDL driven development, auto-generate client & server side proxy  Cross-Language  Auto-generate clients for Java, C#, C++, PHP, Ruby, Python, …  Flexible & Pluggable Architecture  Programming like playing with building blocks, components are replaceable as needed.
  • 17. Communication Layer – Components View
  • 18. Communication Layer – Luxun Thrift IDL
  • 19. Producer – The Interface
  • 20. Producing Partitioning on Producer Side
  • 22. Producing Compression  Current Support  0 – No compression  1 – GZip compression  2 – Snappy compression  Enable Compression for better utilization of  Network bandwidth  Disk space
  • 23. Sync Producing  Better real-time, worse throughput  Use this mode only if real-time is the top priority
  • 24. Async & Batch Producing  Better Throughput, sacrifice a little real-time  Should be enabled whenever possible for higher throughput
  • 28. Consumer Group  Within same group, consume once sematics  Among different groups, fanout semantics
  • 30. Key Performance Test Observations  On single machine, throughput is only limited by disk IO bandwidth  In networking case, throughput is only limited by network bandwidth  1Gbps network is ok, 10Gbps network is recommended  Not sensitive to JVM heap setting,  Memory mapped file uses off-heap memory  4GB is ok, >8GB is recommended  Throughput > 50 MBps even on normal PC
  • 31. Key Performance Test Observations Continue  Performs good on both Windows and Linux platforms  The throughput of async batch producing is order of magnitude better than sycn producing  Flush on broker has negative impact on throughput, recommend to disable flush because of unique feature of memoy mapped file  The throughput of one way not confirmed produce interface is 3 times better than two way confirmed produce interface
  • 32. Key Performance Test Observations Continue  The overall performance will not change as number of topics increase, the throughput will be shared among different topics.  Compression should be enabled for better network bandwidth and disk space utilization, Snappy has better efficiency than GZip.
  • 33. Operation - Most Important Performance Configurations  On broker  Flush has negative impact to performance, recommend to turn it off.  On producer side  Compression  Sync vs async producing  Batch size  On consumer side  Fetch size
  • 34. Operation – Log Cleanup Configurations  Expired log cleanup can be configured with:  log.retention.hours – old back log page files outside of the retention window will be deleted periodically.  log.retention.size – old back log page files outside of the retention size will be deleted periodically
  • 35. Luxun vs Apache Kafka – the Main Difference  Luxun is inspired by Kafka, however, they have following main differences: Luxun Kafka Persistent Memory Mapped File Filesystem & OS page cache Queue Communcation Thrift RPC Custom NIO and messaging layer protocol Message Index Based Offset based access mode Distribution for Random distribution Zookeeper for distributed scalability coordination Partitioning Only on server level Partition within a topic
  • 36. Credits  Luxun borrowed design ideas and adapted source from following open source projects:  Apache Kafka - http://kafka.apache.org/index.html  Jafka - https://github.com/adyliu/jafka  Java Chronicle - https://github.com/peter-lawrey/Java- Chronicle  Fqueue - http://code.google.com/p/fqueue/  Ashes-queue - http://code.google.com/p/ashes-queue/  Kestrel - https://github.com/robey/kestrel
  • 37. Next Steps  Add a sharding layer for distribution and replication  More clients, C#, PHP, Ruby, Python, C++, etc  Big data apps, such as centralized logging, tracing, metrics and events systems based on Luxun.
  • 38. Origin of the Name  In memorial of LuXun, a great Chinese writer
  • 39. Source, Docs and Downloadable https://github.com/bulldog2011/luxun