SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Architectures, Frameworks and
Infrastructure
Harendra Pathak
Areas of Focus
[ Web ]
[ Services ]
[ Monitoring and Alerting ]
[ Logging and Metrics ]
[ Analytics ]
[ Datastores ]
[ Caching ]
[ Services ]
Java 8
• Lambda expressions and Method References
persons.stream().map(p -> p.email());
persons.stream().mapToInt(Person::getAge).sum();
Java 7 - Collections.sort(personList, new Comparator<Person>(){
public int compare(Person p1, Person p2){
return p1.firstName.compareTo(p2.firstName);
}
});
Java 8 - Collections.sort(personList, (Person p1, Person p2) ->
p1.firstName.compareTo(p2.firstName));
• Annotations on types
• @NonNull – Compile-time null checks.
• @ReadOnly – Compile-time error on any attempt to change the object.
Java 8
• Extension Methods
• Default methods that you can add to your interfaces without breaking backward
compatibility. Example: forEach(..lambda expression)
public interface Iterable<T> {
Iterator<T> iterator();
default void forEach(Consumer<? super T> action) {
Objects.requireNonNull(action);
for (T t : this) {
action.accept(t);
}
}
}
• Other changes
• Parallel array sorting.
• Improved I/O API.
• Better date and time API.
• Base64 encoding and decoding.
Microservices
•Small problem domain
•Less than 500-1000 lines of code.
•Across 5 or so domain objects in Java.
•Can be built, deployed and run independently.
•Owns its own data storage.
SpringBoot - Basics
• Basics
• Use Gradle plugin for runnable jar/war.
• Run a project in-place with bootRun task.
• Spring-Loaded - Reload Java classes without restarting the
container.
• Unlike 'hot code replace' which only allows simple changes
once a JVM is running (e.g. changes to method bodies),
Spring Loaded allows you to add/modify/delete
methods/fields/constructors.
• Datastores
[ If you are using auto-configuration, repositories will be searched from the package
containing your main configuration class (the one annotated with @EnableAutoConfiguration or
@SpringBootApplication) down.]
• JPA
• NoSQL
• Couchbase
• MongoDB
SpringBoot - Features
• Externalized Configuration
server:
address: 192.168.2.192
---
spring:
profiles: development
server:
address: 127.0.0.1
---
spring:
profiles: staging
server:
address: 192.168.22.184
• Profile specific configuration values
• Using YAML instead of Properties
• Profile specific application-[profile].yml files
• Multi-profile YAML documents
• Automatic property expansion using Gradle
• Adding active profiles
• -Dspring.profiles.active=production
SpringBoot - Features
• Production Services
• Customize endpoints
• Sensitivity
• Disabling
• Writing custom HealthIndicators
• Metrics
• System metrics
• Tomcat session metrics
• Recording your own metrics
• Metric repositories
SpringBoot - Features
• Tests
• Unit Tests
• Integration Tests
• EnvironmentTestUtils
• OutputCapture
• TestRestTemplate
SpringBoot - Features
• Customizing embedded servlet containers
• Configure Tomcat
• Enable Multiple Connectors with Tomcat
• Configure SSL
• Use Tomcat behind a front-end proxy server
• Enable HTTPS when running behind a proxy
server
• Switch off the Spring MVC DispatcherServlet
• Switch off the Default MVC configuration
• Customize ViewResolvers
SpringBoot - Features
• Auditing
• Tracing
• Deployment
• Unix/Linux services
• Converting Existing Applications to Spring Boot
• Servlet 3.0+ applications with no web.xml.
• Applications with a web.xml.
• Applications with a context hierarchy.
• Applications without a context hierarchy.
Gradle
• Build automation system for polyglot environment. Linkedin uses it to
build 60 programming languages.
• Plugins and integrations with almost every tool in the DevOps pipeline.
• Manage dependencies across repository types like Maven and Ivy.
• Concise and scriptable. Right balance of declarative and imperative.
• Incremental builds, build caching and parallelisation of builds.
• Build analytics and reporting to see problems and areas of optimisation.
Gradle vs Maven - Maven Build
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
<version>2.12.1</version>
<executions>
<execution>
<configuration>
<configLocation>config/checkstyle/checkstyle.xml</configLocation>
<consoleOutput>true</consoleOutput>
<failsOnError>true</failsOnError>
</configuration>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>findbugs-maven-plugin</artifactId>
<version>2.5.4</version>
<executions>
<execution>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-pmd-plugin</artifactId>
<version>3.1</version>
<executions>
<execution>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
Gradle vs Maven - Gradle Build
apply plugin: 'checkstyle'
apply plugin: 'findbugs'
apply plugin: 'pmd'
version = '1.0'
repositories {
mavenCentral()
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.11'
testCompile group: 'org.hamcrest', name: 'hamcrest-all', version: '1.3'
}
Swagger - What
• Documentation via Java annotations.
• API editor
• Client SDK generator
• Available annotations https://github.com/swagger-
api/swagger-core/wiki/Annotations
• https://github.com/springfox/springfox
Swagger - Screenshot
[ Web ]
AngularJS
•Adds special markup to HTML to make it more
expressive keep in sync with JS. Have logic in JS and
see HTML modified.
•Well suited for SPAs and mobile sites as it reduces the
amount of content transferred while navigating across
your apps.
•Client-side MVC / MVVM. MVVM as AngularJS has
2-way binding.
Gulp
• Build system for websites
• Compile SCSS to CSS
• Spriting
• Minify JS / CSS files
• Combine JS / CSS
• Fingerprinting files for aggressive caching
• GZip files etc.
• Plugins for everything you need.
• In contrast to another alternative Grunt, it uses streams. Grunt
takes files, runs a single task on them and saves them to new
files, repeating the entire process for every task. Lots of file hits
make Grunt slower than Gulp.
Jasmine and Karma
• Jasmine
• Unit testing framework
• Suites, Specs, Matchers, Spies
• Supports async tests with runs/waitsFor
• Karma
• Test runner for unit tests.
• Run inside or outside a browser
• Run in IDE or command line
Nginx - What
• Designed from ground up to use event-driven(asynchronous)
connection handling.
• Spawns worker processes, each of which can handle thousands of
connections.
• Fast event loop that checks for and processes events.
• Decouples actual work from connections.
• When the connection closes, work is removed from the loop.
• Allows Nginx to scale incredibly high.
• Consistently low memory and CPU usage even under heavy
load.
• Module selection at compile time
Nginx - Better than Apache httpd
• Apache httpd connection handling
•mpm_prefork: one process -> one thread -
> one connection
•mpm_worker: one process -> multiple threads ->
one connection per thread
•mpm_event: optimised for keep-alive connections
by having a pool of dedicated threads for keep-
alive connections and new requests to other
threads.
Amazon CloudFront
•Amazon’s CDN.
•Use S3 or another system as origin server.
•Two edge servers in India
•Chennai
•Mumbai
[ Datastores ]
Couchbase
• Strong consistency within same data centre.
• Peer to peer architecture.
• Elastic scalability. Add / remove nodes.
• Ease of Administration
• Integrated admin console and scripting API with cluster-wide monitoring to manage large
deployments.
• General purpose
• A distributed cache, key/value store, and document database for enterprise web, mobile, and IoT
applications.
• Consistent high-performance
• Integrated cache for low latency reads.
• Fine-grained locking for high write throughput. No single point of failure.
• XA
• Automatic failover.
• Data replication within and between data centers ensures zero downtime.
• Java Library - spring-data-couchbase and Couchbase provided couchbase-java-client for low-level
access.
[ Analytics ]
Lambda Architecture - What
Lambda Architecture - What
• Batch layer - Historical archive of all data collected by the
system. Results are typically minutes to hours old.
• Speed layer - Compute analytics on data as it enters the
system with a sub-second latency. Dataset used for
analysis is zero to an hour old. Combine results from this
layer with those of batch layer for better decisions.
• Serving layer - Cache results from batch layer and
periodically refresh them.
Lambda Architecture - How
•Kafka in the outer core to ingest data and fan it out to
batch and speed layers.
•Batch layer - Batch query systems like Spark with
HDFS are a good fit here.
•Speed layer - This layer typically has queuing,
streaming and processing subsystems. Storm,
Cassandra.
•Serving layer - Redis would be a great fit here.
[ Logging and Metrics ]
Collection and Storage - Log
• Fluentd - Log Collection
• The vanilla instance runs on 20-30MB of memory and can
process 13,000 events/second/core.
• 2000+ data driven companies already using it.
• Collect
• Resource and custom metrics
• Application / system logs for analysis
• Logs for archival
• Java API for logging
• Tail other logs and forward them using td-agent
• ElasticSearch - Log Storage
• Later different types of storage like S3, DB etc.
Collection and Storage - Metrics
• Collectd
• Collect system performance information - CPU utilization,
memory usage, disk usage etc.
• Use graphite plugin to send this data to Graphite server
• Statsd
• Capture different types of metrics: gauges, counters,
timing summary statistics, and sets.
• Client library send stats via UDP to StatsD daemon.
• StatsD daemon listens to the UDP traffic from all
application libraries, aggregates data over time and
flushes it at the desired interval to designated storage.
Visualization
• Graphite + Grafana
• Use graphite with Grafana as the graphing tool for
metrics and stats. One can save dashboards in
Grafana and load them to/from ElasticSearch.
• Kibana
• Kibana can be used to visualise and analyse any kind
of structured / unstructured data if it has been indexed
into ElasticSearch. Lots of ways of visualising and
slicing / dicing data.
[ Monitoring and Alerting ]
Evaluation Criteria
• Idea is to introduce monitoring so that it could be supported and
monitored 24x7 with the hope of achieving minimal downtime.
• Scalable
• Can present large number of easy to understand checks.
• Can handle large no. of hosts and large no. of checks.
• UI
• Access to historical alerts.
• Ability to switch off alerts with comments.
• Easy to extend/change.
• Support custom checks.
• Ease of configuring alert thresholds.
• Good support for check dependencies.
• Cause and effect separation by having alert dependencies.
Options
• Sensu
• Local agents which push information to an AMQP broker. Various servers can now
ingest information from this broker. Weaker coupling and horizontal scaling. Central broker
is a SPOF and poses scaling challenges though.
• Icinga2
• Icinga 1 was a Nagios fork with a better UI but ran into some trademark/copyright
violations.
• Icinga2 is a complete rewrite. It can work in both modes: agents and central servers pulling
data.
• Ngios
• Nagios uses a group of central servers that are configured to perform checks on remote
hosts. This design makes it difficult to scale Nagios, as large fleets quickly reach the limit of
vertical scaling, and Nagios does not easily scale horizontally. Nagios is also notoriously
difficult to use with modern DevOps and configuration management tools, as local
configurations must be updated when remote servers are added or removed. Runs in a
loop and can use only one core.
• CloudWatch
• In-built features for monitoring AWS resources only. Send custom metrics yourself and then
it will be treated same from stats, graphs and alarms perspectives.
[ Caching ]
Redis
• In-memory data structure store.
• Can persist data, but keeps all data in-memory.
• Traditional data types.
• Stats
• Easily serves 100K’s of ops/sec.
• ~2 MB footprint.
• Mostly ACID
• Single-threaded, hence every operation is
• Atomic
• Consistent
• Isolated
• Watch / Multi / Discard / Exec allows multi-statement operations as as single unit but without rollbacks.
• Durability is configurable and is a tradeoff between safety and efficiency.
• Redis Cluster
• Redis 3.0 released on 5th May, 2015 (last week) introduces Redis Cluster.
• With automatic data sharding, fault tolerance and performance improvements.
• Alternatives: Memcached
Thank You !!

Weitere ähnliche Inhalte

Was ist angesagt?

Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBAmar Das
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckData Con LA
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafkapunesparkmeetup
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingAmazon Web Services
 
High Availability of SAP ASCS in Microsoft Azure
High Availability of SAP ASCS in Microsoft AzureHigh Availability of SAP ASCS in Microsoft Azure
High Availability of SAP ASCS in Microsoft AzureGary Jackson MBCS
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 

Was ist angesagt? (20)

Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Sun Web Server Brief
Sun Web Server BriefSun Web Server Brief
Sun Web Server Brief
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuck
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafka
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
High Availability of SAP ASCS in Microsoft Azure
High Availability of SAP ASCS in Microsoft AzureHigh Availability of SAP ASCS in Microsoft Azure
High Availability of SAP ASCS in Microsoft Azure
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 

Ähnlich wie Architectures, Frameworks and Infrastructure Overview

Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 
2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari OverviewzData Inc.
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
Drupal performance
Drupal performanceDrupal performance
Drupal performanceGabi Lee
 
More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)Michael Collier
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineCsaba Toth
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureCarst Vaartjes
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup
 

Ähnlich wie Architectures, Frameworks and Infrastructure Overview (20)

Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics Architecture
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
Performance on a budget
Performance on a budgetPerformance on a budget
Performance on a budget
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
 
DW on AWS
DW on AWSDW on AWS
DW on AWS
 

Architectures, Frameworks and Infrastructure Overview

  • 2. Areas of Focus [ Web ] [ Services ] [ Monitoring and Alerting ] [ Logging and Metrics ] [ Analytics ] [ Datastores ] [ Caching ]
  • 4. Java 8 • Lambda expressions and Method References persons.stream().map(p -> p.email()); persons.stream().mapToInt(Person::getAge).sum(); Java 7 - Collections.sort(personList, new Comparator<Person>(){ public int compare(Person p1, Person p2){ return p1.firstName.compareTo(p2.firstName); } }); Java 8 - Collections.sort(personList, (Person p1, Person p2) -> p1.firstName.compareTo(p2.firstName)); • Annotations on types • @NonNull – Compile-time null checks. • @ReadOnly – Compile-time error on any attempt to change the object.
  • 5. Java 8 • Extension Methods • Default methods that you can add to your interfaces without breaking backward compatibility. Example: forEach(..lambda expression) public interface Iterable<T> { Iterator<T> iterator(); default void forEach(Consumer<? super T> action) { Objects.requireNonNull(action); for (T t : this) { action.accept(t); } } } • Other changes • Parallel array sorting. • Improved I/O API. • Better date and time API. • Base64 encoding and decoding.
  • 6. Microservices •Small problem domain •Less than 500-1000 lines of code. •Across 5 or so domain objects in Java. •Can be built, deployed and run independently. •Owns its own data storage.
  • 7. SpringBoot - Basics • Basics • Use Gradle plugin for runnable jar/war. • Run a project in-place with bootRun task. • Spring-Loaded - Reload Java classes without restarting the container. • Unlike 'hot code replace' which only allows simple changes once a JVM is running (e.g. changes to method bodies), Spring Loaded allows you to add/modify/delete methods/fields/constructors. • Datastores [ If you are using auto-configuration, repositories will be searched from the package containing your main configuration class (the one annotated with @EnableAutoConfiguration or @SpringBootApplication) down.] • JPA • NoSQL • Couchbase • MongoDB
  • 8. SpringBoot - Features • Externalized Configuration server: address: 192.168.2.192 --- spring: profiles: development server: address: 127.0.0.1 --- spring: profiles: staging server: address: 192.168.22.184 • Profile specific configuration values • Using YAML instead of Properties • Profile specific application-[profile].yml files • Multi-profile YAML documents • Automatic property expansion using Gradle • Adding active profiles • -Dspring.profiles.active=production
  • 9. SpringBoot - Features • Production Services • Customize endpoints • Sensitivity • Disabling • Writing custom HealthIndicators • Metrics • System metrics • Tomcat session metrics • Recording your own metrics • Metric repositories
  • 10. SpringBoot - Features • Tests • Unit Tests • Integration Tests • EnvironmentTestUtils • OutputCapture • TestRestTemplate
  • 11. SpringBoot - Features • Customizing embedded servlet containers • Configure Tomcat • Enable Multiple Connectors with Tomcat • Configure SSL • Use Tomcat behind a front-end proxy server • Enable HTTPS when running behind a proxy server • Switch off the Spring MVC DispatcherServlet • Switch off the Default MVC configuration • Customize ViewResolvers
  • 12. SpringBoot - Features • Auditing • Tracing • Deployment • Unix/Linux services • Converting Existing Applications to Spring Boot • Servlet 3.0+ applications with no web.xml. • Applications with a web.xml. • Applications with a context hierarchy. • Applications without a context hierarchy.
  • 13. Gradle • Build automation system for polyglot environment. Linkedin uses it to build 60 programming languages. • Plugins and integrations with almost every tool in the DevOps pipeline. • Manage dependencies across repository types like Maven and Ivy. • Concise and scriptable. Right balance of declarative and imperative. • Incremental builds, build caching and parallelisation of builds. • Build analytics and reporting to see problems and areas of optimisation.
  • 14. Gradle vs Maven - Maven Build <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-checkstyle-plugin</artifactId> <version>2.12.1</version> <executions> <execution> <configuration> <configLocation>config/checkstyle/checkstyle.xml</configLocation> <consoleOutput>true</consoleOutput> <failsOnError>true</failsOnError> </configuration> <goals> <goal>check</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>findbugs-maven-plugin</artifactId> <version>2.5.4</version> <executions> <execution> <goals> <goal>check</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-pmd-plugin</artifactId> <version>3.1</version> <executions> <execution> <goals> <goal>check</goal> </goals> </execution> </executions> </plugin>
  • 15. Gradle vs Maven - Gradle Build apply plugin: 'checkstyle' apply plugin: 'findbugs' apply plugin: 'pmd' version = '1.0' repositories { mavenCentral() } dependencies { testCompile group: 'junit', name: 'junit', version: '4.11' testCompile group: 'org.hamcrest', name: 'hamcrest-all', version: '1.3' }
  • 16. Swagger - What • Documentation via Java annotations. • API editor • Client SDK generator • Available annotations https://github.com/swagger- api/swagger-core/wiki/Annotations • https://github.com/springfox/springfox
  • 19. AngularJS •Adds special markup to HTML to make it more expressive keep in sync with JS. Have logic in JS and see HTML modified. •Well suited for SPAs and mobile sites as it reduces the amount of content transferred while navigating across your apps. •Client-side MVC / MVVM. MVVM as AngularJS has 2-way binding.
  • 20. Gulp • Build system for websites • Compile SCSS to CSS • Spriting • Minify JS / CSS files • Combine JS / CSS • Fingerprinting files for aggressive caching • GZip files etc. • Plugins for everything you need. • In contrast to another alternative Grunt, it uses streams. Grunt takes files, runs a single task on them and saves them to new files, repeating the entire process for every task. Lots of file hits make Grunt slower than Gulp.
  • 21. Jasmine and Karma • Jasmine • Unit testing framework • Suites, Specs, Matchers, Spies • Supports async tests with runs/waitsFor • Karma • Test runner for unit tests. • Run inside or outside a browser • Run in IDE or command line
  • 22. Nginx - What • Designed from ground up to use event-driven(asynchronous) connection handling. • Spawns worker processes, each of which can handle thousands of connections. • Fast event loop that checks for and processes events. • Decouples actual work from connections. • When the connection closes, work is removed from the loop. • Allows Nginx to scale incredibly high. • Consistently low memory and CPU usage even under heavy load. • Module selection at compile time
  • 23. Nginx - Better than Apache httpd • Apache httpd connection handling •mpm_prefork: one process -> one thread - > one connection •mpm_worker: one process -> multiple threads -> one connection per thread •mpm_event: optimised for keep-alive connections by having a pool of dedicated threads for keep- alive connections and new requests to other threads.
  • 24. Amazon CloudFront •Amazon’s CDN. •Use S3 or another system as origin server. •Two edge servers in India •Chennai •Mumbai
  • 26. Couchbase • Strong consistency within same data centre. • Peer to peer architecture. • Elastic scalability. Add / remove nodes. • Ease of Administration • Integrated admin console and scripting API with cluster-wide monitoring to manage large deployments. • General purpose • A distributed cache, key/value store, and document database for enterprise web, mobile, and IoT applications. • Consistent high-performance • Integrated cache for low latency reads. • Fine-grained locking for high write throughput. No single point of failure. • XA • Automatic failover. • Data replication within and between data centers ensures zero downtime. • Java Library - spring-data-couchbase and Couchbase provided couchbase-java-client for low-level access.
  • 29. Lambda Architecture - What • Batch layer - Historical archive of all data collected by the system. Results are typically minutes to hours old. • Speed layer - Compute analytics on data as it enters the system with a sub-second latency. Dataset used for analysis is zero to an hour old. Combine results from this layer with those of batch layer for better decisions. • Serving layer - Cache results from batch layer and periodically refresh them.
  • 30. Lambda Architecture - How •Kafka in the outer core to ingest data and fan it out to batch and speed layers. •Batch layer - Batch query systems like Spark with HDFS are a good fit here. •Speed layer - This layer typically has queuing, streaming and processing subsystems. Storm, Cassandra. •Serving layer - Redis would be a great fit here.
  • 31. [ Logging and Metrics ]
  • 32. Collection and Storage - Log • Fluentd - Log Collection • The vanilla instance runs on 20-30MB of memory and can process 13,000 events/second/core. • 2000+ data driven companies already using it. • Collect • Resource and custom metrics • Application / system logs for analysis • Logs for archival • Java API for logging • Tail other logs and forward them using td-agent • ElasticSearch - Log Storage • Later different types of storage like S3, DB etc.
  • 33. Collection and Storage - Metrics • Collectd • Collect system performance information - CPU utilization, memory usage, disk usage etc. • Use graphite plugin to send this data to Graphite server • Statsd • Capture different types of metrics: gauges, counters, timing summary statistics, and sets. • Client library send stats via UDP to StatsD daemon. • StatsD daemon listens to the UDP traffic from all application libraries, aggregates data over time and flushes it at the desired interval to designated storage.
  • 34. Visualization • Graphite + Grafana • Use graphite with Grafana as the graphing tool for metrics and stats. One can save dashboards in Grafana and load them to/from ElasticSearch. • Kibana • Kibana can be used to visualise and analyse any kind of structured / unstructured data if it has been indexed into ElasticSearch. Lots of ways of visualising and slicing / dicing data.
  • 35. [ Monitoring and Alerting ]
  • 36. Evaluation Criteria • Idea is to introduce monitoring so that it could be supported and monitored 24x7 with the hope of achieving minimal downtime. • Scalable • Can present large number of easy to understand checks. • Can handle large no. of hosts and large no. of checks. • UI • Access to historical alerts. • Ability to switch off alerts with comments. • Easy to extend/change. • Support custom checks. • Ease of configuring alert thresholds. • Good support for check dependencies. • Cause and effect separation by having alert dependencies.
  • 37. Options • Sensu • Local agents which push information to an AMQP broker. Various servers can now ingest information from this broker. Weaker coupling and horizontal scaling. Central broker is a SPOF and poses scaling challenges though. • Icinga2 • Icinga 1 was a Nagios fork with a better UI but ran into some trademark/copyright violations. • Icinga2 is a complete rewrite. It can work in both modes: agents and central servers pulling data. • Ngios • Nagios uses a group of central servers that are configured to perform checks on remote hosts. This design makes it difficult to scale Nagios, as large fleets quickly reach the limit of vertical scaling, and Nagios does not easily scale horizontally. Nagios is also notoriously difficult to use with modern DevOps and configuration management tools, as local configurations must be updated when remote servers are added or removed. Runs in a loop and can use only one core. • CloudWatch • In-built features for monitoring AWS resources only. Send custom metrics yourself and then it will be treated same from stats, graphs and alarms perspectives.
  • 39. Redis • In-memory data structure store. • Can persist data, but keeps all data in-memory. • Traditional data types. • Stats • Easily serves 100K’s of ops/sec. • ~2 MB footprint. • Mostly ACID • Single-threaded, hence every operation is • Atomic • Consistent • Isolated • Watch / Multi / Discard / Exec allows multi-statement operations as as single unit but without rollbacks. • Durability is configurable and is a tradeoff between safety and efficiency. • Redis Cluster • Redis 3.0 released on 5th May, 2015 (last week) introduces Redis Cluster. • With automatic data sharding, fault tolerance and performance improvements. • Alternatives: Memcached