SlideShare ist ein Scribd-Unternehmen logo
1 von 99
Downloaden Sie, um offline zu lesen
Avoiding big data anti-patterns
whoami
• Alex Holmes
• Software engineer
• @grep_alex
• grepalex.com
Why should I care about anti-patterns?
big data
^
Agenda
• Walking through anti-patterns
• Looking at why they should
be avoided
• Consider some mitigations
• It’s big data!
• A single tool for the job
• Polyglot data integration
• Full scans FTW!
• Tombstones
• Counting with Java built-in
collections
• It’s open
I’ll cover: by …
Meet your protagonists
Alex Jade
(the amateur) (the pro)
It’s big data!
i want to calculate some
statistics on some static user
data . . .
how big is the data?
it’s big data,
so huge,
20GB!!!
i need to order a hadoop cluster!
What’s the problem?
you think you have big
data ...
but you don’t!
Poll: how much RAM can a single server support?
A. 256 GB
B. 512 GB
C. 1TB
http://yourdatafitsinram.com
keep it simple . . .
use MYSQL or POSTGRES
or R/python/matlab
Summary
• Simplify your analytics toolchain when working with
small data (especially if you don’t already have an
investment in big data tooling)
• Old(er) tools such as OLTP/OLAP/R/Python still
have their place for this type of work
A single tool for the job
that looks
like a nail!!!
nosql
PROBLEM
What’s the problem?
Big data tools are usually designed to do one thing well
(maybe two)
^
Types of workloads
• Low-latency data lookups
• Near real-time processing
• Interactive analytics
• Joins
• Full scans
• Search
• Data movement and integration
• ETL
The old world was simple
OLTP/OLAP
The new world … not so much
You need to research and find the
best-in-class for your function
Best-in-class big data tools (in my opinion)
If you want … Consider …
Low-latency lookups Cassandra, memcached
Near real-time processing Storm
Interactive analytics Vertica, Teradata
Full scans, system of record data,
ETL, batch processing
HDFS, MapReduce, Hive,
Pig
Data movement and integration Kafka
Summary
• There is no single big data tool that does it all
• We live in a polyglot world where new tools are
announced every day - don’t believe the hype!
• Test the claims on your own hardware and data;
start small and fail fast
Polyglot data integration
i need to move
clickstream data
from my application
to hadoop
Application
Hadoop Loader
Hadoop
shoot, i need to
use that same
data in streaming
Application
Hadoop Loader
Hadoop
JMS
JMS
What’s the problem?
OLTP
OLAP /
EDW
HBase
Cassan
dra
Voldem
ort
Hadoop
SecurityAnalytics
Rec.
Engine
Search Monitoring
Social
Graph
that way new consumers can
be added without any work!
we need a central data
repository and pipeline to isolate
consumers from the source
let’s use kafka!
OLTP
OLAP /
EDW
HBase
Cassan
dra
Voldem
ort
Hadoop
SecurityAnalytics
Rec.
Engine
Search Monitoring
Social
Graph
kafka
Background
• Apache project
• Originated from LinkedIn
• Open-sourced in 2011
• Written in Scala and Java
• Borrows concepts in messaging systems and logs
• Foundational data movement and integration
technology
What’s the big whoop about Kafka?
Throughput
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Producer
Consumer Consumer Consumer
Kafka Server Kafka Server Kafka Server
Producer Producer
2,024,032 TPS
2,615,968 TPS
2ms
O.S. page cache is leveraged
OS page cache
...11109876543210
ProducerConsumer A Consumer B
Disk
writesreadsreads
Things to look out for
• Leverages ZooKeeper, which is tricky to configure
• Reads can become slow when the page cache is
missed and disk needs to be hit
• Lack of security
Summary
• Don’t write your own data integration
• Use Kafka for light-weight, fast and scalable data
integration
Full scans FTW!
i heard that hadoop was
designed to work with huge
data volumes!
i’m going to stick my
data on hadoop . . .
and run some joins
SELECT * FROM huge_table
JOIN ON other_huge_table
…
What’s the problem?
yes, hadoop is very efficient
at batch workloads
files are split into large blocks and
distributed throughout the cluster
“data locality” is a first-class concept,
where compute is pushed to storage Scheduler
but hadoop doesn’t negate
all these optimizations we
learned when working on
relational databases
so partition your data
according to how you
will most commonly
access it
hdfs:/data/tweets/date=20140929/
hdfs:/data/tweets/date=20140930/
hdfs:/data/tweets/date=20140931/
disk io is slow
and then make sure to
include a filter in your
queries so that only
those partitions are read
...
WHERE DATE=20151027
include projections to
reduce data that needs
to be read from disk or
pushed over the network
SELECT id, name
FROM...
hash joins require
network io which is slow
65.23VRSN
39.54MSFT
526.62GOOGL
PriceSymbol
RestonVRSN
RedmondMSFT
Mtn ViewGOOGL
HeadquartersSymbol
merge joins are way
more efficient
Records in all datasets
sorted by join key
Headquarters
Redmond
Mtn View
Reston65.23VRSN
39.54MSFT
526.62GOOGL
PriceSymbol
The merge algorithm
streams and performs an
inline merge of the
datasets
you’ll have to bucket
and sort your data
and tell your query
engine to use a sort-
merge-bucket (SMB) join
— Hive properties to enable a SMB join
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
and look at using a
columnar data
format like parquet
Column Strorage
Column 1 (Symbol)
526.62
MSFT
05-10-2014
05-10-2014
39.54
GOOGL
Column 2 (Date)
Column 3 (Price)
Summary
• Partition, filter and project your data (same as you
used to do with relational databases)
• Look at bucketing and sorting your data to support
advanced join techniques such as sort-merge-
bucket
• Consider storing your data in columnar form
Tombstones
i need to store data in a Highly
Available persistent queue
. . .
and we already have Cassandra
deployed
. . .
bingo!!!
What is Cassandra?
• Low-latency distributed database
• Apache project modeled after Dynamo and
BigTable
• Data replication for fault tolerance and scale
• Multi-datacenter support
• CAP
Node
Node
Node
Node
Node
Node
East West
What’s the problem?
V VVV V VVVVV V
KKKKKKKKKKK
V VVV V VVVVV V
KKKKKKKKKKK
tombstone markers indicate that the
column has been deleted
deletes in Cassandra are soft;
deleted columns are marked
with tombstones
these tombstoned columns
slow-down reads
by default tombstones stay
around for 10 days
IF you want to know why, read
up on
gc_grace_secs,
and reappearing deletes
don’t use Cassandra, use
kafka
design your schema and read
patterns to avoid tombstones
getting in the way of your reads
keep track of consumer offsets,
and add a time or bucket semantic
to rows. only delete rows after
some time has elapsed, or once all
consumers have consumed them.
ID bucket
1 2
2 1
msg
81583
723804
offset
723803
81582
msg
81582
723803
msg
81581
723802
bucket
bucket
consumer
consumer
. . .
. . .
ID
1
2
Summary
• Try to avoid use cases that require high volume
deletes and slice queries that scan over tombstone
columns
• Design your schema and delete/read patterns with
tombstone avoidance in mind
Counting with Java’s built-in collections
I’m going to COUNT
THE DISTINCT NUMBER
OF USERS THAT
viewed a tweet
What’s the problem?
Poll: what does HashSet<K> use under the covers?
A. K[]
B. Entry<K>[]
C. HashMap<K,V>
D. TreeMap<K,V>
Memory consumption
HashSet = 32 * SIZE + 4 * CAPACITY
String = 8 * (int) ((((no chars) * 2) + 45) / 8)
Average user is 6 characters long = 64 bytes
number of elements in set set capacity (array length)
For 10,000,000 users this is at least 1GiB
USE HYPERLOGLOG TO WORK
WITH approximate DISTINCt
COUNTS @SCALE
HyperLogLog
• Cardinality estimation algorithm
• Uses (a lot) less space than sets
• Doesn’t provide exact distinct counts (being
“close” is probably good enough)
• Cardinality Estimation for Big Data: http://druid.io/
blog/2012/05/04/fast-cheap-and-98-right-
cardinality-estimation-for-big-data.html
1 billion distinct elements = 1.5kb memory
standard error = 2%
https://www.flickr.com/photos/redwoodphotography/4356518997
Hashes
10110100100101010101100111001011
Good hash functions
should result in each bit
having a 50% probability
of occurring
h(entity):
Bit pattern observations
1xxxxxxxxx..x
01xxxxxxxx..x
001xxxxxxx..x
0001xxxxxx..x
50% of hashed values will look like:
25% of hashed values will look like:
12.5% of hashed values will look like:
6.25% of hashed values will look like:
0 0
0
0 0
0
0
0
0 00
0
0
0
00
000
0 0
00 0
register
0 0
0
0 0
0
0
0
0 00
0
0
0
00
000
0 0
00 0
h(entity):
register index:
4
register value:
1
register
harmonic_mean
(
HLL
estimated
cardinality =
(= 1
01010 01 0 0 01
1
HLL Java library
• https://github.com/aggregateknowledge/java-hll
• Neat implementation - it automatically promotes
internal data structure to HLL once it grows beyond
a certain size
Approximate count algorithms
• HyperLogLog (distinct counts)
• CountMinSketch (frequencies of members)
• Bloom Filter (set membership)
Summary
• Data skew is a reality when working at Internet
scale
• Java’s builtin collections have a large memory
footprint don’t scale
• For high-cardinality data use approximate
estimation algorithms
stepping away . . .
MATH
It’s open
prototyping/VIABILITY - DONE
coding - done
testing - done
performance & scalability testing -
done
monitoring - done
i’m ready to ship!
What’s the problem?
https://www.flickr.com/photos/arjentoet/8428179166
https://www.flickr.com/photos/gowestphoto/3922495716
https://www.flickr.com/photos/joybot/6026542856
How the old world worked
OLTP/OLAP
Authentication
Authorization
DBA
security’s not my job!!!
we disagree
infosec
Important questions to ask
• Is my data encrypted when it’s in motion?
• Is my data encrypted on disk?
• Are there ACL’s defining who has access to what?
• Are these checks enabled by default?
How do tools stack up?
ACL’s
At-rest
encryption
In-motion
encryption
Enabled by
default
Ease of use
Oracle
Hadoop
Cassandra
ZooKeeper
Kafka
Summary
• Enable security for your tools!
• Include security as part of evaluating a tool
• Ask vendors and project owners to step up to the
plate
We’re done!
Conclusions
• Don’t assume that a particular big data technology will
work for your use case - verify it for yourself on your
own hardware and data early on in the evaluation of a
tool
• Be wary of the “new hotness” and vendor claims - they
may burn you
• Make sure that load/scale testing is a required part of
your go-to-production plan
Thanks for your time!

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...DataStax
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangSpark Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming apphadooparchbook
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 

Was ist angesagt? (20)

Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 

Andere mochten auch

Searching and reporting with splunk 6.x e learning
Searching and reporting with splunk 6.x   e learningSearching and reporting with splunk 6.x   e learning
Searching and reporting with splunk 6.x e learningJacky Lai
 
Data Mining with Splunk
Data Mining with SplunkData Mining with Splunk
Data Mining with SplunkDavid Carasso
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVROairisData
 
La Gouvernance des Données
La Gouvernance des DonnéesLa Gouvernance des Données
La Gouvernance des DonnéesSoft Computing
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystemconfluent
 
Impact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really mattersImpact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really mattersChristian Hassa
 

Andere mochten auch (9)

Searching and reporting with splunk 6.x e learning
Searching and reporting with splunk 6.x   e learningSearching and reporting with splunk 6.x   e learning
Searching and reporting with splunk 6.x e learning
 
Data Mining with Splunk
Data Mining with SplunkData Mining with Splunk
Data Mining with Splunk
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
La Gouvernance des Données
La Gouvernance des DonnéesLa Gouvernance des Données
La Gouvernance des Données
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Impact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really mattersImpact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really matters
 

Ähnlich wie Avoiding big data antipatterns

Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatternsAnurag S
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10benoitg
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metricsJim Plush
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDeltares
 

Ähnlich wie Avoiding big data antipatterns (20)

Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 

Kürzlich hochgeladen

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 

Kürzlich hochgeladen (20)

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 

Avoiding big data antipatterns