SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Anything data
revisited
Big, Streaming, NoSQL, Cloud, Science
… a sloppy travel guide
whoami ( linkedin. com/in/ahmetakyol )
whoami ( linkedin. com/in/ahmetakyol )
whoami - Dilbert already did it
Who are these people or who are you ?
Who are these people or who are you ?
Why a travel guide ?
“... Martin is an excellent map reader even in the most
hectic Italian traffic … And after Martin and Cindy
left us, we did better because we had learned from what
they had showed us … When there’s no guide available, it
helps to have someone who understands how to read the
maps, tracks, signs, and indications. When we’re on our
own, it helps to learn how to do those things ourselves“
“Software projects are always traveling in
areas they don’t know “
Ron Jeffries (from his foreword for PoAPA book)
Why a ‘sloppy’ travel guide - (Big Data Landscape 2012 )
Why a ‘sloppy’ travel guide - (Big Data Landscape 2017 )
Why a ‘sloppy’ travel guide - (many others: ai,iot ...)
Why a ‘sloppy’ travel guide - ( the ‘n’ V’s of Big Data )
Chasing Cool Technologies - Big Data Envy
“We continue to see organizations chasing ‘cool’ technologies,
taking on unnecessary complexity and risk when a simpler choice
would be better.”
“ While we've long understood the value of Big Data to better
understand how people interact with us, we've noticed an alarming
trend of Big Data envy: organizations using complex tools to handle
‘not-really-that-big’ Data.”
“ The Apache Cassandra database promises massive scalability on commodity
hardware, but we have seen teams overwhelmed by its architectural and
operational complexity. Unless you have data volumes that require a 100+
node cluster, we recommend against using Cassandra. ”
https://www.thoughtworks.com/radar/techniques/big-data-envy
Big Data Envy - architectural complexity (expectation)
from ‘10000 foot view’
big data systems may seem
like ‘good old n-tier’s
Big Data Envy - architectural complexity (example)
A dataflow diagram
from a good (but still a)
reference application.
Real life examples are
usually more complex !
Big Data Envy - architectural complexity (aws example)
Big Data Architectural Patterns and Best Practices on AWS : https://www.youtube.com/watch?v=RNrsIlweCno
Big Data Envy - architectural complexity (blueprints)
Big Data Envy - operational complexity (devops)
Big Data Envy - operational complexity (devops)
http://www.slideshare.net/jcmia1/apache-spark-20-tuning-guide
● Tuning JVM, OS and
each (big) data
system
● Choosing right
hardware for each
‘right solution’
● Orchestrating /
monitoring /
debugging many
small applications
running on and/or
interacting with such
distributed systems
OOM Troubleshooting example for Apache Spark
Know thyself - reaching the cliff of confusion
https://www.vikingcodeschool.com/posts/why-learning-to-code-is-so-damn-hard
What is your learning style ?
“ What’s a better
learning strategy:
covering a subject in
full detail from top-to-
bottom, or progressively
sharpening a quick
overview? “
How about an expanding/evolving learning style ?
Lifelong learning is
the "ongoing, voluntary, and
self-motivated" pursuit of
knowledge for either
personal or professional
reasons. Therefore, it not
only enhances social
inclusion, active
citizenship, and personal
development, but also
self-sustainability, as well
as competitiveness and
employability.
The Unknown Unknowns - the iceberg of ignorance
In his acclaimed study “The Iceberg
of Ignorance”, consultant Sidney
Yoshida concluded: “Only 4% of an
organization’s front line problems
are known by top management, 9% are
known by middle management, 74% by
supervisors and 100% by employees…”
Guidelines - the very first principle (business value)
“DDD isn’t first and foremost about technology.
In its most central principles, DDD is about
discussion, listening, understanding, discovery,
and business value, all in an effort to
centralize knowledge. If you are capable of
understanding the business in which your company
works, you can at a minimum participate in the
software model discovery process to produce a
Ubiquitous Language.”
“Our highest priority is to satisfy
the customer
through early and continuous
delivery of
valuable software”
the very first principle of the agile manifesto
Guidelines - science before technology (business value)
Guidelines - garbage dump or compulsive hoarding (business value)
Guidelines - making simple but not simpler
● “ Make things as simple as possible,
but not simpler.” (Albert Einstein)
● As simple as possible: no over-engineering
search for the simplest feasible solution
possible
○ feasible ‘ready’ solution
○ fully managed solutions
○ manageable packed solutions with support
○ solutions known for stability, manageability
● Not simpler: no under-engineering
○ right task, right tool
○ right usage: design patterns, best practices
Guidelines - right task right tool isn’t enough
Guidelines - right task right tool right usage
DynamoDB Design Patterns and Best Practices : https://www.youtube.com/watch?v=PDQ3jbDyTQ4
Guidelines - don’t let API fool you (cassandra)
CQL Under The Hood : https://www.youtube.com/watch?v=CY5-bWpqAVA
Guidelines - don’t let API fool you (cassandra)
CQL Under The Hood : https://www.youtube.com/watch?v=CY5-bWpqAVA
Guidelines - learn data paths and structures ( C* )
learning “write path”,
“read path” and main
internal data structures
gives critical hints
about “do’s and don’ts”;
especially anti-patterns:
● Queue-like designs
● Intensive updates
● Deletes
http://www.slideshare.net/doanduyhai/cassandra-nice-use-cases-and-worst-anti-patterns
Guidelines - loading data, layouts and file formats (hdfs)
● Data distribution , small files
problem
● Row v.s. columnar formats
● I/O advantage, read only what you
need:
○ Vertical: projection
○ Horizontal: predicate pushdown
Guidelines -SQL or not (Spark as a Compiler)
https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
Guidelines -SQL or not (Beam Combine vs GroupBy)
https://issues.apache.org/jira/browse/BEAM-2477
Guidelines -SQL or not ( Spark RDD vs Spark DF and SQL)
https://databricks.com/session/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets
Guidelines - learning from costs (google)
Guidelines - learning from costs (bigquery)
Guidelines - learning from costs (kinesis)
“ Pricing is based on volume of data ingested
into Amazon Kinesis Firehose, which is
calculated as the number of data records you
send to the service, times the size of each
record rounded up to the nearest 5KB. For
example, if your data records are 42KB each,
Amazon Kinesis Firehose will count each record
as 45 KB of data ingested. ”
“ A record is the data that your data producer
adds to your Amazon Kinesis Stream. A PUT
Payload Unit is counted in 25KB payload
“chunks” that comprise a record. For example,
a 5KB record contains one PUT Payload Unit, a
45KB record contains two PUT Payload Units,
and a 1MB record contains 40 PUT Payload
Units. PUT Payload Unit is charged with a per
million PUT Payload Units rate. ”
Cloud computing - simple example
“ a system, which
tracks price
changes for my
desirable products
in online stores
(which I trust to
buy from) and
notifies me over
the email when
price drops. “
http://www.bebetterdeveloper.com/coding/architecture/serverless-system-architecture-using-aws.html
Cloud computing - simple “serverless” example
http://www.bebetterdeveloper.com/coding/architecture/serverless-system-architecture-using-aws.html
“ a system, which
tracks price
changes for my
desirable products
in online stores
(which I trust to
buy from) and
notifies me over
the email when
price drops. “
Cloud computing - serverless real world example
Guidelines - learn windows of opportunity (streaming)
SELECT sensorid,
Count(*) AS count
FROM sensorreadings TIMESTAMP by time
GROUP BY sensorid,
tumblingwindow(second, 10)
Guidelines - learn windows of opportunity (streaming)
SELECT sensorid,
Count(*) AS count
FROM sensorreadings TIMESTAMP by time
GROUP BY sensorid,
hoppingwindow(second, 10, 5)
Guidelines - learn windows of opportunity (streaming)
The Evolution of Massive-Scale Data Processing : https://goo.gl/f31iXP
Guidelines - data processing evolution (history)
The Evolution of Massive-Scale Data Processing : https://goo.gl/f31iXP
Guidelines - data processing evolution (unified/continuous)
https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html
Guidelines (bonus) - know thy theorem ( CAP )
Guidelines (bonus) - know thy theorem ( PACELC )
Anything data (revisited)

Weitere ähnliche Inhalte

Ähnlich wie Anything data (revisited)

Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...Burr Sutter
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
Digital Platforms - Scott Shaw
Digital Platforms - Scott ShawDigital Platforms - Scott Shaw
Digital Platforms - Scott ShawThoughtworks
 
AWS Big Data combo
AWS Big Data comboAWS Big Data combo
AWS Big Data comboJulien SIMON
 
Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?Daniel Zivkovic
 
Introduction to cloud computing - za garage talks
Introduction to cloud computing -  za garage talksIntroduction to cloud computing -  za garage talks
Introduction to cloud computing - za garage talksVijay Rayapati
 
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...Ontico
 
Real World Azure - IT Pros
Real World Azure - IT ProsReal World Azure - IT Pros
Real World Azure - IT ProsClint Edmonson
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...Julien SIMON
 
Faire grandir votre idée dans le cloud AWS
Faire grandir votre idée dans le cloud AWSFaire grandir votre idée dans le cloud AWS
Faire grandir votre idée dans le cloud AWSAmazon Web Services
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at ScaleEoin Hurrell, PhD
 
Docker - Scripting the PayPal Cloud
Docker - Scripting the PayPal CloudDocker - Scripting the PayPal Cloud
Docker - Scripting the PayPal CloudAbraham Hoffman
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionJean-Claude Sotto
 
Moving Forward with AI
Moving Forward with AIMoving Forward with AI
Moving Forward with AIAdrian Hornsby
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMilen Dyankov
 

Ähnlich wie Anything data (revisited) (20)

Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
Teaching Elephants to Dance (Federal Audience): A Developer's Journey to Digi...
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Digital Platforms - Scott Shaw
Digital Platforms - Scott ShawDigital Platforms - Scott Shaw
Digital Platforms - Scott Shaw
 
AWS Big Data combo
AWS Big Data comboAWS Big Data combo
AWS Big Data combo
 
Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?Azure for AWS & GCP Pros: Which Azure services to use?
Azure for AWS & GCP Pros: Which Azure services to use?
 
Introduction to cloud computing - za garage talks
Introduction to cloud computing -  za garage talksIntroduction to cloud computing -  za garage talks
Introduction to cloud computing - za garage talks
 
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
 
Real World Azure - IT Pros
Real World Azure - IT ProsReal World Azure - IT Pros
Real World Azure - IT Pros
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Faire grandir votre idée dans le cloud AWS
Faire grandir votre idée dans le cloud AWSFaire grandir votre idée dans le cloud AWS
Faire grandir votre idée dans le cloud AWS
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
Docker - Scripting the PayPal Cloud
Docker - Scripting the PayPal CloudDocker - Scripting the PayPal Cloud
Docker - Scripting the PayPal Cloud
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
 
Moving Forward with AI
Moving Forward with AIMoving Forward with AI
Moving Forward with AI
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
 

Kürzlich hochgeladen

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Kürzlich hochgeladen (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Anything data (revisited)

  • 1. Anything data revisited Big, Streaming, NoSQL, Cloud, Science … a sloppy travel guide
  • 2. whoami ( linkedin. com/in/ahmetakyol )
  • 3. whoami ( linkedin. com/in/ahmetakyol )
  • 4. whoami - Dilbert already did it
  • 5. Who are these people or who are you ?
  • 6. Who are these people or who are you ?
  • 7. Why a travel guide ? “... Martin is an excellent map reader even in the most hectic Italian traffic … And after Martin and Cindy left us, we did better because we had learned from what they had showed us … When there’s no guide available, it helps to have someone who understands how to read the maps, tracks, signs, and indications. When we’re on our own, it helps to learn how to do those things ourselves“ “Software projects are always traveling in areas they don’t know “ Ron Jeffries (from his foreword for PoAPA book)
  • 8. Why a ‘sloppy’ travel guide - (Big Data Landscape 2012 )
  • 9. Why a ‘sloppy’ travel guide - (Big Data Landscape 2017 )
  • 10. Why a ‘sloppy’ travel guide - (many others: ai,iot ...)
  • 11. Why a ‘sloppy’ travel guide - ( the ‘n’ V’s of Big Data )
  • 12. Chasing Cool Technologies - Big Data Envy “We continue to see organizations chasing ‘cool’ technologies, taking on unnecessary complexity and risk when a simpler choice would be better.” “ While we've long understood the value of Big Data to better understand how people interact with us, we've noticed an alarming trend of Big Data envy: organizations using complex tools to handle ‘not-really-that-big’ Data.” “ The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. ” https://www.thoughtworks.com/radar/techniques/big-data-envy
  • 13. Big Data Envy - architectural complexity (expectation) from ‘10000 foot view’ big data systems may seem like ‘good old n-tier’s
  • 14. Big Data Envy - architectural complexity (example) A dataflow diagram from a good (but still a) reference application. Real life examples are usually more complex !
  • 15.
  • 16. Big Data Envy - architectural complexity (aws example) Big Data Architectural Patterns and Best Practices on AWS : https://www.youtube.com/watch?v=RNrsIlweCno
  • 17. Big Data Envy - architectural complexity (blueprints)
  • 18. Big Data Envy - operational complexity (devops)
  • 19. Big Data Envy - operational complexity (devops) http://www.slideshare.net/jcmia1/apache-spark-20-tuning-guide ● Tuning JVM, OS and each (big) data system ● Choosing right hardware for each ‘right solution’ ● Orchestrating / monitoring / debugging many small applications running on and/or interacting with such distributed systems OOM Troubleshooting example for Apache Spark
  • 20. Know thyself - reaching the cliff of confusion https://www.vikingcodeschool.com/posts/why-learning-to-code-is-so-damn-hard
  • 21. What is your learning style ? “ What’s a better learning strategy: covering a subject in full detail from top-to- bottom, or progressively sharpening a quick overview? “
  • 22. How about an expanding/evolving learning style ? Lifelong learning is the "ongoing, voluntary, and self-motivated" pursuit of knowledge for either personal or professional reasons. Therefore, it not only enhances social inclusion, active citizenship, and personal development, but also self-sustainability, as well as competitiveness and employability.
  • 23. The Unknown Unknowns - the iceberg of ignorance In his acclaimed study “The Iceberg of Ignorance”, consultant Sidney Yoshida concluded: “Only 4% of an organization’s front line problems are known by top management, 9% are known by middle management, 74% by supervisors and 100% by employees…”
  • 24. Guidelines - the very first principle (business value) “DDD isn’t first and foremost about technology. In its most central principles, DDD is about discussion, listening, understanding, discovery, and business value, all in an effort to centralize knowledge. If you are capable of understanding the business in which your company works, you can at a minimum participate in the software model discovery process to produce a Ubiquitous Language.” “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software” the very first principle of the agile manifesto
  • 25. Guidelines - science before technology (business value)
  • 26. Guidelines - garbage dump or compulsive hoarding (business value)
  • 27. Guidelines - making simple but not simpler ● “ Make things as simple as possible, but not simpler.” (Albert Einstein) ● As simple as possible: no over-engineering search for the simplest feasible solution possible ○ feasible ‘ready’ solution ○ fully managed solutions ○ manageable packed solutions with support ○ solutions known for stability, manageability ● Not simpler: no under-engineering ○ right task, right tool ○ right usage: design patterns, best practices
  • 28. Guidelines - right task right tool isn’t enough
  • 29. Guidelines - right task right tool right usage DynamoDB Design Patterns and Best Practices : https://www.youtube.com/watch?v=PDQ3jbDyTQ4
  • 30. Guidelines - don’t let API fool you (cassandra) CQL Under The Hood : https://www.youtube.com/watch?v=CY5-bWpqAVA
  • 31. Guidelines - don’t let API fool you (cassandra) CQL Under The Hood : https://www.youtube.com/watch?v=CY5-bWpqAVA
  • 32. Guidelines - learn data paths and structures ( C* ) learning “write path”, “read path” and main internal data structures gives critical hints about “do’s and don’ts”; especially anti-patterns: ● Queue-like designs ● Intensive updates ● Deletes http://www.slideshare.net/doanduyhai/cassandra-nice-use-cases-and-worst-anti-patterns
  • 33. Guidelines - loading data, layouts and file formats (hdfs) ● Data distribution , small files problem ● Row v.s. columnar formats ● I/O advantage, read only what you need: ○ Vertical: projection ○ Horizontal: predicate pushdown
  • 34. Guidelines -SQL or not (Spark as a Compiler) https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
  • 35. Guidelines -SQL or not (Beam Combine vs GroupBy) https://issues.apache.org/jira/browse/BEAM-2477
  • 36. Guidelines -SQL or not ( Spark RDD vs Spark DF and SQL) https://databricks.com/session/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets
  • 37. Guidelines - learning from costs (google)
  • 38. Guidelines - learning from costs (bigquery)
  • 39. Guidelines - learning from costs (kinesis) “ Pricing is based on volume of data ingested into Amazon Kinesis Firehose, which is calculated as the number of data records you send to the service, times the size of each record rounded up to the nearest 5KB. For example, if your data records are 42KB each, Amazon Kinesis Firehose will count each record as 45 KB of data ingested. ” “ A record is the data that your data producer adds to your Amazon Kinesis Stream. A PUT Payload Unit is counted in 25KB payload “chunks” that comprise a record. For example, a 5KB record contains one PUT Payload Unit, a 45KB record contains two PUT Payload Units, and a 1MB record contains 40 PUT Payload Units. PUT Payload Unit is charged with a per million PUT Payload Units rate. ”
  • 40. Cloud computing - simple example “ a system, which tracks price changes for my desirable products in online stores (which I trust to buy from) and notifies me over the email when price drops. “ http://www.bebetterdeveloper.com/coding/architecture/serverless-system-architecture-using-aws.html
  • 41. Cloud computing - simple “serverless” example http://www.bebetterdeveloper.com/coding/architecture/serverless-system-architecture-using-aws.html “ a system, which tracks price changes for my desirable products in online stores (which I trust to buy from) and notifies me over the email when price drops. “
  • 42. Cloud computing - serverless real world example
  • 43. Guidelines - learn windows of opportunity (streaming) SELECT sensorid, Count(*) AS count FROM sensorreadings TIMESTAMP by time GROUP BY sensorid, tumblingwindow(second, 10)
  • 44. Guidelines - learn windows of opportunity (streaming) SELECT sensorid, Count(*) AS count FROM sensorreadings TIMESTAMP by time GROUP BY sensorid, hoppingwindow(second, 10, 5)
  • 45. Guidelines - learn windows of opportunity (streaming) The Evolution of Massive-Scale Data Processing : https://goo.gl/f31iXP
  • 46. Guidelines - data processing evolution (history) The Evolution of Massive-Scale Data Processing : https://goo.gl/f31iXP
  • 47. Guidelines - data processing evolution (unified/continuous) https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html
  • 48. Guidelines (bonus) - know thy theorem ( CAP )
  • 49. Guidelines (bonus) - know thy theorem ( PACELC )