SlideShare a Scribd company logo
1 of 66
Writing Scalable
Software in Java
From multi-core to grid-computing
Me

•   Ruben Badaró
•   Dev Expert at Changingworlds/Amdocs
•   PT.JUG Leader
•   http://www.zonaj.org
What this talk is not
       about
• Sales pitch
• Cloud Computing
• Service Oriented Architectures
• Java EE
• How to write multi-threaded code
Summary

• Define Performance and Scalability
• Vertical Scalability - scaling up
• Horizontal Scalability - scaling out
• Q&A
Performance != Scalability
Performance


 Amount of useful work accomplished by a
computer system compared to the time and
             resources used
Scalability


Capability of a system to increase the amount of
useful work as resources and load are added to
                    the system
Scalability

•   A system that performs fast with 10 users
    might not do so with 1000 - it doesn’t scale
•   Designing for scalability always decreases
    performance
Linear Scalability
Throughput




                          Resources
Reality is sub-linear
Throughput




                     Resources
Amdahl’s Law
Scalability is about
       parallelizing
• Parallel decomposition allows division of
  work
• Parallelizing might mean more work
• There’s almost always a part of serial
  computation
Vertical Scalability
Vertical Scalability
      Somewhat hard
Vertical Scalability
                      Scale Up

•   Bigger, meaner machines
    -   More cores (and more powerful)
    -   More memory
    -   Faster local storage
•   Limited
    -   Technical constraints
    -   Cost - big machines get exponentially
        expensive
Shared State

• Need to use those cores
• Java - shared-state concurrency
 - Mutable state protected with locks
 - Hard to get right
 - Most developers don’t have experience
    writing multithreaded code
This is how they look
              like
public static synchronized SomeObject getInstance() {

    return instance;

}



public SomeObject doConcurrentThingy() {

    synchronized(this) {

        //...

    }

    return ..;

}
Single vs Multi-threaded
 •   Single-threaded

     -   No scheduling cost
     -   No synchronization cost
 •   Multi-threaded
     -   Context Switching (high cost)
     -   Memory Synchronization (memory barriers)
     -   Blocking
Lock Contention
             Little’s Law


The average number of customers in a stable
 system is equal to their average arrival rate
multiplied by their average time in the system
Reducing Contention
•   Reduce lock duration
•   Reduce frequency with which locks are
    requested (stripping)
•   Replace exclusive locks with other mechanisms
    -   Concurrent Collections
    -   ReadWriteLocks
    -   Atomic Variables
    -   Immutable Objects
Concurrent Collections

 • Use lock stripping
 • Includes    putIfAbsent()   and replace()
     methods
 •   ConcurrentHashMap   has 16 separate locks by
     default
 • Don’t reinvent the wheel
ReadWriteLocks

• Pair of locks
• Read lock can be held by multiple
  threads if there are no writers
• Write lock is exclusive
• Good improvements if object as fewer
  writers
Atomic Variables

• Allow to make check-update type of
  operations atomically
• Without locks - use low-level CPU
  instructions
• It’s   volatile   on steroids (visibility +
  atomicity)
Immutable Objects
•   Immutability makes concurrency simple - thread-
    safety guaranteed
•   An immutable object is:
    - final
    -   fields are final and private
    -   Constructor constructs the object completely
    -   No state changing methods
    -   Copy internal mutable objects when receiving
        or returning
JVM issues
•   Caching is useful - storing stuff in memory
•   Larger JVM heap size means longer garbage
    collection times
•   Not acceptable to have long pauses
•   Solutions
    -   Maximum size for heap 2GB/4GB
    -   Multiple JVMs per machine
    -   Better garbage collectors: G1 might help
Scaling Up: Other
        Approaches
• Change the paradigm
 - Actors (Erlang and Scala)
 - Dataflow programming (GParallelizer)
 - Software Transactional Memory
     (Pastrami)
 -   Functional languages, such as Clojure
Scaling Up: Other
        Approaches
• Dedicated JVM-friendly hardware
 - Azul Systems is amazing
 - Hundreds of cores
 - Enormous heap sizes with negligible gc
     pauses
 -   HTM included
 -   Built-in lock elision mechanism
Horizontal Scalability
Horizontal Scalability
       The hard part
Horizontal Scalability
               Scale Out


• Big machines are expensive - 1 x 32 core
  normally much more expensive than 4 x
  8 core
• Increase throughput by adding more
  machines
• Distributed Systems research revisited -
  not new
Requirements

• Scalability
• Availability
• Reliability
• Performance
Typical Server Architecture
... # of users increases
... and increases
... too much load
... and we loose availability
... so we add servers
... and a load balancer
... and another one rides the bus
... we create a DB cluster
... and we cache wherever we can



              Cache




              Cache
Challenges

• How do we route requests to servers?
• How do distribute data between servers?
• How do we handle failures?
• How do we keep our cache consistent?
• How do we handle load peaks?
Technique #1: Partitioning



  A     F      K      P     U
  ...   ...    ...    ...   ...
  E      J     O      T     Z

              Users
Technique #1: Partitioning

  • Each server handles a subset of data
  • Improves scalability by parallelizing
  • Requires predictable routing
  • Introduces problems with locality
  • Move work to where the data is!
Technique #2: Replication

Active




Backup
Technique #2: Replication

  • Keep copies of data/state in multiple
    servers
  • Used for fail-over - increases availability
  • Requires more cold hardware
  • Overhead of replicating might reduce
    performance
Technique #3: Messaging
Technique #3: Messaging
 • Use message passing, queues and pub/sub
   models - JMS
 • Improves reliability easily
 • Helps deal with peaks
  - The queue keeps filling
  - If it gets too big, extra requests are
     rejected
Solution #1: De-
      normalize DB
• Faster queries
• Additional work to generate tables
• Less space efficiency
• Harder to maintain consistency
Solution #2: Non-SQL
       Database

• Why not remove the relational part
  altogether
• Bad for complex queries
• Berkeley DB is a prime example
Solution #3: Distributed
       Key/Value Stores
•   Highly scalable - used in the largest websites in the
    world, based on Amazon’s Dynamo and Google’s
    BigTable
•   Mostly open source
•   Partitioned
•   Replicated
•   Versioned
•   No SPOF
•   Voldemort (LinkedIn), Cassandra (Facebook) and HBase
    are written in Java
Solution #4:
MapReduce




    Map...
Solution #4:
MapReduce




    Map...
Solution #4:
MapReduce


               Divide Work




    Map...
Solution #4:
MapReduce


               Divide Work




    Map...
Solution #4:
MapReduce


               Divide Work




    Map...
Solution #4:
MapReduce




    Map...
Solution #4:
MapReduce


               Compute




    Map...
Solution #4:
MapReduce

               Return and
                aggregate




   Reduce...
Solution #4:
MapReduce

               Return and
                aggregate




   Reduce...
Solution #4:
MapReduce

               Return and
                aggregate




   Reduce...
Solution #4:
          MapReduce
• Google’s algorithm to split work, process it
  and reduce to an answer
• Used for offline processing of large
  amounts of data
• Hadoop is used everywhere! Other options
  such as GridGain exist
Solution #5: Data Grid
• Data (and computations)
• In-memory - low response times
• Database back-end (SQL or not)
• Partitioned - operations on data executed in
  specific partition
• Replicated - handles failover automatically
• Transactional
Solution #5: Data Grid
• It’s a distributed cache + computational
  engine
• Can be used as a cache with JPA and the like
• Oracle Coherence is very good.
• Terracotta, Gridgain, Gemfire, Gigaspaces,
  Velocity (Microsoft) and Websphere
  extreme scale (IBM)
Retrospective

• You need to scale up and out
• Write code thinking of hundreds of cores
• Relational might not be the way to go
• Cache whenever you can
• Be aware of data locality
Q &A
Thanks for listening!




                       Ruben Badaró
                http://www.zonaj.org

More Related Content

What's hot

Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )SANG WON PARK
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
nexus helm 설치, docker/helm repo 설정과 예제
nexus helm 설치, docker/helm repo 설정과 예제nexus helm 설치, docker/helm repo 설정과 예제
nexus helm 설치, docker/helm repo 설정과 예제choi sungwook
 
VMware vSphere vsan EN.pptx
VMware vSphere vsan EN.pptxVMware vSphere vsan EN.pptx
VMware vSphere vsan EN.pptxCH431
 
Monitoring in CloudStack
Monitoring in CloudStackMonitoring in CloudStack
Monitoring in CloudStackShapeBlue
 
Implementing Microservices with NATS
Implementing Microservices with NATSImplementing Microservices with NATS
Implementing Microservices with NATSApcera
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesBruno Borges
 
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueShapeBlue
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture componentsDavid Pasek
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
 
Quarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java frameworkQuarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java frameworkSVDevOps
 
Virtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure softwareVirtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure softwareDuncan Epping
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRed Hat Developers
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 

What's hot (20)

Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
nexus helm 설치, docker/helm repo 설정과 예제
nexus helm 설치, docker/helm repo 설정과 예제nexus helm 설치, docker/helm repo 설정과 예제
nexus helm 설치, docker/helm repo 설정과 예제
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
VMware vSphere vsan EN.pptx
VMware vSphere vsan EN.pptxVMware vSphere vsan EN.pptx
VMware vSphere vsan EN.pptx
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
Monitoring in CloudStack
Monitoring in CloudStackMonitoring in CloudStack
Monitoring in CloudStack
 
Implementing Microservices with NATS
Implementing Microservices with NATSImplementing Microservices with NATS
Implementing Microservices with NATS
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
 
vSAN architecture components
vSAN architecture componentsvSAN architecture components
vSAN architecture components
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
Quarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java frameworkQuarkus - a next-generation Kubernetes Native Java framework
Quarkus - a next-generation Kubernetes Native Java framework
 
Virtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure softwareVirtual SAN 6.2, hyper-converged infrastructure software
Virtual SAN 6.2, hyper-converged infrastructure software
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech Talk
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 

Viewers also liked

MapReduce Debates and Schema-Free
MapReduce Debates and Schema-FreeMapReduce Debates and Schema-Free
MapReduce Debates and Schema-Freehybrid cloud
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 
Operations-Driven Web Services at Rent the Runway
Operations-Driven Web Services at Rent the RunwayOperations-Driven Web Services at Rent the Runway
Operations-Driven Web Services at Rent the RunwayCamille Fournier
 
Securing Java EE Web Apps
Securing Java EE Web AppsSecuring Java EE Web Apps
Securing Java EE Web AppsFrank Kim
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Archmclee
 
Cuestionario internet Hernandez Michel
Cuestionario internet Hernandez MichelCuestionario internet Hernandez Michel
Cuestionario internet Hernandez Micheljhonzmichelle
 
Scalable Application Development on AWS
Scalable Application Development on AWSScalable Application Development on AWS
Scalable Application Development on AWSMikalai Alimenkou
 
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Jerry SILVER
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemJames Gan
 
Scalable Applications with Scala
Scalable Applications with ScalaScalable Applications with Scala
Scalable Applications with ScalaNimrod Argov
 
Intro to Erlang
Intro to ErlangIntro to Erlang
Intro to ErlangKen Pratt
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978David Chou
 
Diary of a Scalable Java Application
Diary of a Scalable Java ApplicationDiary of a Scalable Java Application
Diary of a Scalable Java ApplicationMartin Jackson
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
Scalable Java Application Development on AWS
Scalable Java Application Development on AWSScalable Java Application Development on AWS
Scalable Java Application Development on AWSMikalai Alimenkou
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 

Viewers also liked (20)

MapReduce Debates and Schema-Free
MapReduce Debates and Schema-FreeMapReduce Debates and Schema-Free
MapReduce Debates and Schema-Free
 
Java tips
Java tipsJava tips
Java tips
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Os
OsOs
Os
 
Operations-Driven Web Services at Rent the Runway
Operations-Driven Web Services at Rent the RunwayOperations-Driven Web Services at Rent the Runway
Operations-Driven Web Services at Rent the Runway
 
Securing Java EE Web Apps
Securing Java EE Web AppsSecuring Java EE Web Apps
Securing Java EE Web Apps
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Cuestionario internet Hernandez Michel
Cuestionario internet Hernandez MichelCuestionario internet Hernandez Michel
Cuestionario internet Hernandez Michel
 
Scalable Application Development on AWS
Scalable Application Development on AWSScalable Application Development on AWS
Scalable Application Development on AWS
 
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core System
 
Scalable Applications with Scala
Scalable Applications with ScalaScalable Applications with Scala
Scalable Applications with Scala
 
Intro to Erlang
Intro to ErlangIntro to Erlang
Intro to Erlang
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
 
Diary of a Scalable Java Application
Diary of a Scalable Java ApplicationDiary of a Scalable Java Application
Diary of a Scalable Java Application
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
Scalable Java Application Development on AWS
Scalable Java Application Development on AWSScalable Java Application Development on AWS
Scalable Java Application Development on AWS
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
The Java memory model made easy
The Java memory model made easyThe Java memory model made easy
The Java memory model made easy
 

Similar to Writing Scalable Software in Java from Multi-core to Grid Computing

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopAyon Sinha
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
Dr and ha solutions with sql server azure
Dr and ha solutions with sql server azureDr and ha solutions with sql server azure
Dr and ha solutions with sql server azureMSDEVMTL
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 

Similar to Writing Scalable Software in Java from Multi-core to Grid Computing (20)

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
try
trytry
try
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
DrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalabilityDrupalCampLA 2014 - Drupal backend performance and scalability
DrupalCampLA 2014 - Drupal backend performance and scalability
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
Dr and ha solutions with sql server azure
Dr and ha solutions with sql server azureDr and ha solutions with sql server azure
Dr and ha solutions with sql server azure
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Writing Scalable Software in Java from Multi-core to Grid Computing

  • 1. Writing Scalable Software in Java From multi-core to grid-computing
  • 2. Me • Ruben Badaró • Dev Expert at Changingworlds/Amdocs • PT.JUG Leader • http://www.zonaj.org
  • 3. What this talk is not about • Sales pitch • Cloud Computing • Service Oriented Architectures • Java EE • How to write multi-threaded code
  • 4. Summary • Define Performance and Scalability • Vertical Scalability - scaling up • Horizontal Scalability - scaling out • Q&A
  • 6. Performance Amount of useful work accomplished by a computer system compared to the time and resources used
  • 7. Scalability Capability of a system to increase the amount of useful work as resources and load are added to the system
  • 8. Scalability • A system that performs fast with 10 users might not do so with 1000 - it doesn’t scale • Designing for scalability always decreases performance
  • 12. Scalability is about parallelizing • Parallel decomposition allows division of work • Parallelizing might mean more work • There’s almost always a part of serial computation
  • 14. Vertical Scalability Somewhat hard
  • 15. Vertical Scalability Scale Up • Bigger, meaner machines - More cores (and more powerful) - More memory - Faster local storage • Limited - Technical constraints - Cost - big machines get exponentially expensive
  • 16. Shared State • Need to use those cores • Java - shared-state concurrency - Mutable state protected with locks - Hard to get right - Most developers don’t have experience writing multithreaded code
  • 17. This is how they look like public static synchronized SomeObject getInstance() { return instance; } public SomeObject doConcurrentThingy() { synchronized(this) { //... } return ..; }
  • 18. Single vs Multi-threaded • Single-threaded - No scheduling cost - No synchronization cost • Multi-threaded - Context Switching (high cost) - Memory Synchronization (memory barriers) - Blocking
  • 19. Lock Contention Little’s Law The average number of customers in a stable system is equal to their average arrival rate multiplied by their average time in the system
  • 20. Reducing Contention • Reduce lock duration • Reduce frequency with which locks are requested (stripping) • Replace exclusive locks with other mechanisms - Concurrent Collections - ReadWriteLocks - Atomic Variables - Immutable Objects
  • 21. Concurrent Collections • Use lock stripping • Includes putIfAbsent() and replace() methods • ConcurrentHashMap has 16 separate locks by default • Don’t reinvent the wheel
  • 22. ReadWriteLocks • Pair of locks • Read lock can be held by multiple threads if there are no writers • Write lock is exclusive • Good improvements if object as fewer writers
  • 23. Atomic Variables • Allow to make check-update type of operations atomically • Without locks - use low-level CPU instructions • It’s volatile on steroids (visibility + atomicity)
  • 24. Immutable Objects • Immutability makes concurrency simple - thread- safety guaranteed • An immutable object is: - final - fields are final and private - Constructor constructs the object completely - No state changing methods - Copy internal mutable objects when receiving or returning
  • 25. JVM issues • Caching is useful - storing stuff in memory • Larger JVM heap size means longer garbage collection times • Not acceptable to have long pauses • Solutions - Maximum size for heap 2GB/4GB - Multiple JVMs per machine - Better garbage collectors: G1 might help
  • 26. Scaling Up: Other Approaches • Change the paradigm - Actors (Erlang and Scala) - Dataflow programming (GParallelizer) - Software Transactional Memory (Pastrami) - Functional languages, such as Clojure
  • 27. Scaling Up: Other Approaches • Dedicated JVM-friendly hardware - Azul Systems is amazing - Hundreds of cores - Enormous heap sizes with negligible gc pauses - HTM included - Built-in lock elision mechanism
  • 29. Horizontal Scalability The hard part
  • 30. Horizontal Scalability Scale Out • Big machines are expensive - 1 x 32 core normally much more expensive than 4 x 8 core • Increase throughput by adding more machines • Distributed Systems research revisited - not new
  • 33. ... # of users increases
  • 35. ... too much load
  • 36. ... and we loose availability
  • 37. ... so we add servers
  • 38. ... and a load balancer
  • 39. ... and another one rides the bus
  • 40. ... we create a DB cluster
  • 41. ... and we cache wherever we can Cache Cache
  • 42. Challenges • How do we route requests to servers? • How do distribute data between servers? • How do we handle failures? • How do we keep our cache consistent? • How do we handle load peaks?
  • 43. Technique #1: Partitioning A F K P U ... ... ... ... ... E J O T Z Users
  • 44. Technique #1: Partitioning • Each server handles a subset of data • Improves scalability by parallelizing • Requires predictable routing • Introduces problems with locality • Move work to where the data is!
  • 46. Technique #2: Replication • Keep copies of data/state in multiple servers • Used for fail-over - increases availability • Requires more cold hardware • Overhead of replicating might reduce performance
  • 48. Technique #3: Messaging • Use message passing, queues and pub/sub models - JMS • Improves reliability easily • Helps deal with peaks - The queue keeps filling - If it gets too big, extra requests are rejected
  • 49. Solution #1: De- normalize DB • Faster queries • Additional work to generate tables • Less space efficiency • Harder to maintain consistency
  • 50. Solution #2: Non-SQL Database • Why not remove the relational part altogether • Bad for complex queries • Berkeley DB is a prime example
  • 51. Solution #3: Distributed Key/Value Stores • Highly scalable - used in the largest websites in the world, based on Amazon’s Dynamo and Google’s BigTable • Mostly open source • Partitioned • Replicated • Versioned • No SPOF • Voldemort (LinkedIn), Cassandra (Facebook) and HBase are written in Java
  • 54. Solution #4: MapReduce Divide Work Map...
  • 55. Solution #4: MapReduce Divide Work Map...
  • 56. Solution #4: MapReduce Divide Work Map...
  • 58. Solution #4: MapReduce Compute Map...
  • 59. Solution #4: MapReduce Return and aggregate Reduce...
  • 60. Solution #4: MapReduce Return and aggregate Reduce...
  • 61. Solution #4: MapReduce Return and aggregate Reduce...
  • 62. Solution #4: MapReduce • Google’s algorithm to split work, process it and reduce to an answer • Used for offline processing of large amounts of data • Hadoop is used everywhere! Other options such as GridGain exist
  • 63. Solution #5: Data Grid • Data (and computations) • In-memory - low response times • Database back-end (SQL or not) • Partitioned - operations on data executed in specific partition • Replicated - handles failover automatically • Transactional
  • 64. Solution #5: Data Grid • It’s a distributed cache + computational engine • Can be used as a cache with JPA and the like • Oracle Coherence is very good. • Terracotta, Gridgain, Gemfire, Gigaspaces, Velocity (Microsoft) and Websphere extreme scale (IBM)
  • 65. Retrospective • You need to scale up and out • Write code thinking of hundreds of cores • Relational might not be the way to go • Cache whenever you can • Be aware of data locality
  • 66. Q &A Thanks for listening! Ruben Badaró http://www.zonaj.org