SlideShare a Scribd company logo
1 of 20
TRANSFORM & ANALYZE
TIME SERIES DATA VIA
APACHE SPARK
DEMI BEN-ARI
SR. SOFTWARE ENGINEER
WINDWARD
17.05.2015
ABOUT ME
DEMI BEN-ARI
SENIOR SOFTWARE ENGINEER AT WINDWARD LTD.
BS’C COMPUTER SCIENCE – ACADEMIC COLLEGE TEL-AVIV YAFFO
IN THE PAST:
SOFTWARE TEAM LEADER & SENIOR JAVA SOFTWARE ENGINEER,
MISSILE DEFENSE AND ALERT SYSTEM - “OFEK” UNIT - IAF
WHAT DOES WINDWARD DO?
Windward is a maritime data and analytics
company, bringing unprecedented visibility to
the maritime domain. Windward has built the
world's first maritime data platform, the
Windward Mind,
which analyzes and organizes the world's
maritime data
WHERE DOES THE DATA COME FROM?
Other
Sources
Maritime
Databases
AIS
Automatic
Identificati
on
System
Port Agent
Reports
WINDWARD MIND
VESSEL + AREA STORIES
SPECIAL IN WINDWARD’S DOMAIN
Maritim
e Mind
Data Mining Scope
Market
Trends
Anomaly
Detection
• Single Data
point scope
• Going in Detail
• Fraud detection
• Sample / Total
Data scope
• Trends
• Data Sampling
problems
MISSING PARTS IN TIME SERIES DATA
• DATA ARRIVING FROM THE SATELLITES
• MIGHT CAUSE DELAYS BECAUSE OF BAD TRANSMISSION
• DATA VENDORS DELAYING THE DATA STREAM
• CALCULATION IN LAYERS MAY CAUSE HOLES IN THE DATA
THE PROBLEM - RECEIVING DATA
T = 0
Level 3 Entity
Level 2 Entity
Level 1 Entity
Beginning state, no data, and the time line begins
THE PROBLEM - RECEIVING DATA
T = 10
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 1 entities data
arrives and gets stored
THE PROBLEM - RECEIVING DATA
T = 10
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 2 entities are
created on top of Level
1’s Data
(Decreased amount of
data)
Level 3 entities are
created on top of Level
2’s Data
(Decreased amount of
data)
THE PROBLEM - RECEIVING DATA
T = 20
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 1 entity's data
arriving late
Because of the sliding window’s
back size, level 2 and 3 entities
would not be created properly and
there would be “Holes” in the Data
SOLUTION TO THE PROBLEM
• CREATING DEPENDENT MICRO SERVICES FORMING A DATA PIPELINE
• OUR MICRO SERVICES ARE MAINLY APACHE SPARK APPLICATIONS
• SERVICES ARE ONLY DEPENDENT ON THE DATA - NOT THE PREVIOUS SERVICE’S RUN
• FORMING A STRUCTURE AND SCHEDULING OF “BACK SLIDING WINDOW”
• KNOW YOUR DATA AND IT’S RELEVANCE TROUGH TIME
• DON’T TRY TO FORESEE THE FUTURE – IT MIGHT BIAS THE RESULTS
WHY CHOOSING APACHE SPARK?
• IN MEMORY COMPUTATION (NOT ONLY)
• FULLY DISTRIBUTED FRAMEWORK – LINEAR SCALE OUT
• FAULT TOLERANT FRAMEWORK
• MULTIPLE LANGUAGE API
• HIGHER LEVEL ABSTRACTIONS (SPARKSQL, MLLIB, GRAPHX, SPARK STREAMING)
• FUNCTIONAL PROGRAMMING PARADIGM
• EASY TO USE AND MAINTAIN
THINGS TO TAKE IN CONSIDERATION
• AFTER WRITING THE SERVICE – HOW DO YOU BOOTSTRAP YOUR DATA?
• DO SO WITHOUT “KNOWING THE FUTURE”
• SEPARATE YOUR DATA -> SEPARATE YOUR SERVICES BY THE DATA TYPES
• AUTOMATE AS MUCH AS YOU CAN – DEPLOYMENT, MAINTENANCE
• MONITORING
• DATA AVAILABILITY
• PERFORMANCE
• RUNTIME
WHAT OTHER KIND OF PROBLEMS DO WE
HANDLE?
• FRAUD DETECTION
• CUSTOMIZE USER DOMAIN REQUESTS
• DATA FILTERING (MALFORMED / TAMPERED DATA)
• CREATING VESSEL’S PATTERN OF LIFE
• RELATIONSHIP BETWEEN VESSELS
• PROTOCOL RESEARCH AND ANALYSIS (AIS)
WINDWARD DATA FLOW
Extern
al
Data
Source
s
Analytics Layers Data Output
SUMMARY – WINDWARD’S STACK
Cluster
Dev Testing
Live
Staging
ProductionEnv
ELK
QUESTIONS?
THANKS,
RESOURCES AND CONTACT
• DEMI BEN-ARI
• LINKEDIN
• TWITTER: @DEMIBENARI
• BLOG: HTTP://PROGEXC.BLOGSPOT.COM/
• EMAIL: DEMI.BENARI@GMAIL.COM
• WINDWARD LTD.
• BIG THINGS ARE HAPPENING HERE –
FACEBOOK GROUP
• MEETUP – BIG THINGS
jobs@windward.e
u

More Related Content

What's hot

2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
Distributed Erlang Systems In Operation
Distributed Erlang Systems In OperationDistributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Andy Gross
 

What's hot (20)

2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Distributed Erlang Systems In Operation
Distributed Erlang Systems In OperationDistributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function Programming
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis Gkoufas
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 

Viewers also liked

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Patrick McFadin
 

Viewers also liked (17)

Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til Piffl
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
 
Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy Ryza
 
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Apache Spark 입문에서 머신러닝까지
Apache Spark 입문에서 머신러닝까지Apache Spark 입문에서 머신러닝까지
Apache Spark 입문에서 머신러닝까지
 
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
 
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
 
텐서플로 걸음마 (TensorFlow Tutorial)
텐서플로 걸음마 (TensorFlow Tutorial)텐서플로 걸음마 (TensorFlow Tutorial)
텐서플로 걸음마 (TensorFlow Tutorial)
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 

Similar to Transform & Analyze Time Series Data via Apache Spark @Windward

Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Deepak Shankar
 
Webinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisWebinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System Analysis
Deepak Shankar
 
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Tom Moore
 
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
hayesct
 
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds
 

Similar to Transform & Analyze Time Series Data via Apache Spark @Windward (20)

'Malware Analysis' by PP Singh
'Malware Analysis' by PP Singh'Malware Analysis' by PP Singh
'Malware Analysis' by PP Singh
 
Malware Analysis -an overview by PP Singh
Malware Analysis -an overview by PP SinghMalware Analysis -an overview by PP Singh
Malware Analysis -an overview by PP Singh
 
ELITE.BCS-Cloud-and-Mobile-Risk-Assessments
ELITE.BCS-Cloud-and-Mobile-Risk-AssessmentsELITE.BCS-Cloud-and-Mobile-Risk-Assessments
ELITE.BCS-Cloud-and-Mobile-Risk-Assessments
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
 
Mobile Database and Service Oriented Architecture
Mobile Database and Service Oriented ArchitectureMobile Database and Service Oriented Architecture
Mobile Database and Service Oriented Architecture
 
Webinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisWebinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System Analysis
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
 
2020 FRSecure CISSP Mentor Program - Class 7
2020 FRSecure CISSP Mentor Program - Class 72020 FRSecure CISSP Mentor Program - Class 7
2020 FRSecure CISSP Mentor Program - Class 7
 
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORSBig Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
 
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
Rapid Assessment of Web Resources (RAWR) - DerbyCon 3.0
 
Security data deluge
Security data delugeSecurity data deluge
Security data deluge
 
SolarWinds Federal & Government SE Webinar: Technical Update & Demo of New Fe...
SolarWinds Federal & Government SE Webinar: Technical Update & Demo of New Fe...SolarWinds Federal & Government SE Webinar: Technical Update & Demo of New Fe...
SolarWinds Federal & Government SE Webinar: Technical Update & Demo of New Fe...
 
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
 
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
SolarWinds Federal Webinar - Maximizing Your Deployment with Appstack (Jan2016)
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
FME for Disaster Response
FME for Disaster ResponseFME for Disaster Response
FME for Disaster Response
 
Federal Tools Webinar: Leveraging Affordable Tools to Enhance Your Orion Impl...
Federal Tools Webinar: Leveraging Affordable Tools to Enhance Your Orion Impl...Federal Tools Webinar: Leveraging Affordable Tools to Enhance Your Orion Impl...
Federal Tools Webinar: Leveraging Affordable Tools to Enhance Your Orion Impl...
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud SecurityGet Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
 

More from Demi Ben-Ari

More from Demi Ben-Ari (20)

Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
CTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at PanoraysCTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at Panorays
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- Panorays
 
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - PanoraysAll I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
 
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-AriCommunity, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
Know the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek AlumniKnow the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek Alumni
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Know the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek AlumniKnow the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek Alumni
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Bootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-AriBootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 

Transform & Analyze Time Series Data via Apache Spark @Windward

  • 1. TRANSFORM & ANALYZE TIME SERIES DATA VIA APACHE SPARK DEMI BEN-ARI SR. SOFTWARE ENGINEER WINDWARD 17.05.2015
  • 2. ABOUT ME DEMI BEN-ARI SENIOR SOFTWARE ENGINEER AT WINDWARD LTD. BS’C COMPUTER SCIENCE – ACADEMIC COLLEGE TEL-AVIV YAFFO IN THE PAST: SOFTWARE TEAM LEADER & SENIOR JAVA SOFTWARE ENGINEER, MISSILE DEFENSE AND ALERT SYSTEM - “OFEK” UNIT - IAF
  • 3. WHAT DOES WINDWARD DO? Windward is a maritime data and analytics company, bringing unprecedented visibility to the maritime domain. Windward has built the world's first maritime data platform, the Windward Mind, which analyzes and organizes the world's maritime data
  • 4. WHERE DOES THE DATA COME FROM? Other Sources Maritime Databases AIS Automatic Identificati on System Port Agent Reports
  • 5. WINDWARD MIND VESSEL + AREA STORIES
  • 6.
  • 7. SPECIAL IN WINDWARD’S DOMAIN Maritim e Mind Data Mining Scope Market Trends Anomaly Detection • Single Data point scope • Going in Detail • Fraud detection • Sample / Total Data scope • Trends • Data Sampling problems
  • 8. MISSING PARTS IN TIME SERIES DATA • DATA ARRIVING FROM THE SATELLITES • MIGHT CAUSE DELAYS BECAUSE OF BAD TRANSMISSION • DATA VENDORS DELAYING THE DATA STREAM • CALCULATION IN LAYERS MAY CAUSE HOLES IN THE DATA
  • 9. THE PROBLEM - RECEIVING DATA T = 0 Level 3 Entity Level 2 Entity Level 1 Entity Beginning state, no data, and the time line begins
  • 10. THE PROBLEM - RECEIVING DATA T = 10 Level 3 Entity Level 2 Entity Level 1 Entity Computation sliding window size Level 1 entities data arrives and gets stored
  • 11. THE PROBLEM - RECEIVING DATA T = 10 Level 3 Entity Level 2 Entity Level 1 Entity Computation sliding window size Level 2 entities are created on top of Level 1’s Data (Decreased amount of data) Level 3 entities are created on top of Level 2’s Data (Decreased amount of data)
  • 12. THE PROBLEM - RECEIVING DATA T = 20 Level 3 Entity Level 2 Entity Level 1 Entity Computation sliding window size Level 1 entity's data arriving late Because of the sliding window’s back size, level 2 and 3 entities would not be created properly and there would be “Holes” in the Data
  • 13. SOLUTION TO THE PROBLEM • CREATING DEPENDENT MICRO SERVICES FORMING A DATA PIPELINE • OUR MICRO SERVICES ARE MAINLY APACHE SPARK APPLICATIONS • SERVICES ARE ONLY DEPENDENT ON THE DATA - NOT THE PREVIOUS SERVICE’S RUN • FORMING A STRUCTURE AND SCHEDULING OF “BACK SLIDING WINDOW” • KNOW YOUR DATA AND IT’S RELEVANCE TROUGH TIME • DON’T TRY TO FORESEE THE FUTURE – IT MIGHT BIAS THE RESULTS
  • 14. WHY CHOOSING APACHE SPARK? • IN MEMORY COMPUTATION (NOT ONLY) • FULLY DISTRIBUTED FRAMEWORK – LINEAR SCALE OUT • FAULT TOLERANT FRAMEWORK • MULTIPLE LANGUAGE API • HIGHER LEVEL ABSTRACTIONS (SPARKSQL, MLLIB, GRAPHX, SPARK STREAMING) • FUNCTIONAL PROGRAMMING PARADIGM • EASY TO USE AND MAINTAIN
  • 15. THINGS TO TAKE IN CONSIDERATION • AFTER WRITING THE SERVICE – HOW DO YOU BOOTSTRAP YOUR DATA? • DO SO WITHOUT “KNOWING THE FUTURE” • SEPARATE YOUR DATA -> SEPARATE YOUR SERVICES BY THE DATA TYPES • AUTOMATE AS MUCH AS YOU CAN – DEPLOYMENT, MAINTENANCE • MONITORING • DATA AVAILABILITY • PERFORMANCE • RUNTIME
  • 16. WHAT OTHER KIND OF PROBLEMS DO WE HANDLE? • FRAUD DETECTION • CUSTOMIZE USER DOMAIN REQUESTS • DATA FILTERING (MALFORMED / TAMPERED DATA) • CREATING VESSEL’S PATTERN OF LIFE • RELATIONSHIP BETWEEN VESSELS • PROTOCOL RESEARCH AND ANALYSIS (AIS)
  • 18. SUMMARY – WINDWARD’S STACK Cluster Dev Testing Live Staging ProductionEnv ELK
  • 20. THANKS, RESOURCES AND CONTACT • DEMI BEN-ARI • LINKEDIN • TWITTER: @DEMIBENARI • BLOG: HTTP://PROGEXC.BLOGSPOT.COM/ • EMAIL: DEMI.BENARI@GMAIL.COM • WINDWARD LTD. • BIG THINGS ARE HAPPENING HERE – FACEBOOK GROUP • MEETUP – BIG THINGS jobs@windward.e u