SlideShare a Scribd company logo
1 of 15
So HappyTogether
 BigTable + Dynamo
 Semi-structured data model
 Decentralized – no special roles
 Ridiculously fast writes, fast reads
 Tunably consistent
 Cross-DC capable
 You design your data model based off of your
query model
 Real-time ad-hoc queries aren’t viable
 Secondary indexes help (0.7)
 What about analytics?
 Hadoop has analytics
 MapReduce
 Pig/Hive and other tools built above MapReduce
 Configurable data sources/destinations
 Many already familiar with it
 Active community
 Always able to output to Cassandra directly
 0.6
 ColumnFamilyInputFormat
 Pig support – Cassandra LoadFunc
 0.7
 ColumnFamilyOutputFormat
 Hadoop Streaming Output
 Streamlined configuration
 Recipe
 Overlay Hadoop on top of Cassandra
 Separate server for name node and job tracker
 Co-locate task trackers with Cassandra nodes
 Add data nodes to taste
 Voilà
 Data locality
 Analytics engine scales with data
 Example
 Cassandra specific InputFormat
 Configuration – ConfigHelper, Hadoop variables
 InputSplits over the data – tunable
 Example usage in contrib/word_count
 OutputFormat
 Configuration – ConfigHelper, Hadoop variables
 Batches output – tunable
 Don’t have to use Cassandra api
 Some optimizations (e.g. ConsistencyLevel.ONE)
 Example usage in contrib/word_count
 60,000+ Documented UFO Sightings
 Data set from http://infochimps.com
sighted_at reported_at location shape duration description
19951009 19951009 Iowa City, IA
Man repts.Witnessing “flash,
followed by a classic UFO, w/ a
tailfin at back.” …
19940801 19950220 Renton, WA
Man repts. seeing 2x large
ships hovering in night sky
while using Russian-made
night binoculars.
19970111 19970111 St. Cloud, MN pyramid 2 min.
Summary : Right when me and
my friend left my house we
saw a bright green glowing
object that looked like a 4
sided pyramid then after about
2 min it took off straight into
the sky leaving a yellow trail
behind it…
 What about languages outside of Java?
 Build on what Hadoop uses - Streaming
 Output streaming in 0.7.0
 Example in contrib/hadoop_streaming_output
 Input streaming in progress, likely 0.7.1
 Developed atYahoo!
 PigLatin/Grunt shell
 Powerful scripting language for analytics
 Example usage in contrib/pig
 Configuration – Hadoop/Env variables
 Raptr.com
 Home grown solution -> Cassandra + Hadoop
 Query time: hours -> minutes
 Pig obviated their need for multi-lingual MR
 Speed and ease are enabling
 Imagini/Visual DNA
 US Government (Digital Reasoning)
 See http://github.com/digitalreasoning/PyStratus
 Hive support in progress (HIVE-1434)
 Hadoop Input Streaming (likely 0.7.1)
 Performance improvements
 Hadoop analytics for Cassandra
 Data locality for processing
 Scales with the cluster
 More information
 http://cassandra.apache.org
 http://wiki.apache.org/cassandra/HadoopSupport
 Cassandra:The Definitive Guide
 About me:
 jeremy.hanna@rackspace.com
 @jeromatron onTwitter
 jeromatron on IRC in #cassandra

More Related Content

What's hot

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Datasamuel shamiri
 
Geek camp
Geek campGeek camp
Geek campjdhok
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7Ryuji Tamagawa
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopSteve Watt
 
Map Analytics in Starcraft II (2/3/2015)
Map Analytics in Starcraft II (2/3/2015)Map Analytics in Starcraft II (2/3/2015)
Map Analytics in Starcraft II (2/3/2015)gy8
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012Amazon Web Services
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training courseKamal A
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
 

What's hot (20)

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
 
Geek camp
Geek campGeek camp
Geek camp
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Map Analytics in Starcraft II (2/3/2015)
Map Analytics in Starcraft II (2/3/2015)Map Analytics in Starcraft II (2/3/2015)
Map Analytics in Starcraft II (2/3/2015)
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 

Viewers also liked

Real time ship tracking system using ais data
Real time ship tracking system using ais dataReal time ship tracking system using ais data
Real time ship tracking system using ais dataChathura
 
Flapping Foil Propulsion System in Ship and Underwater Vehicles
Flapping Foil Propulsion System in Ship and Underwater Vehicles Flapping Foil Propulsion System in Ship and Underwater Vehicles
Flapping Foil Propulsion System in Ship and Underwater Vehicles Sharat Mathew
 
Propulsion Systems Of Ships
Propulsion Systems Of ShipsPropulsion Systems Of Ships
Propulsion Systems Of ShipsVipin Devaraj
 
Marine Propulsion History and Electric Propulsion & Future Technology
Marine Propulsion History and Electric Propulsion & Future TechnologyMarine Propulsion History and Electric Propulsion & Future Technology
Marine Propulsion History and Electric Propulsion & Future TechnologyMohammud Hanif Dewan M.Phil.
 
A seminar report on Electric Propulsion
A seminar report on Electric PropulsionA seminar report on Electric Propulsion
A seminar report on Electric PropulsionSAKTI PRASAD MISHRA
 
SHIP PROPULSION SEMINAR report
SHIP PROPULSION SEMINAR reportSHIP PROPULSION SEMINAR report
SHIP PROPULSION SEMINAR reportDNSPTL4569
 
Biomimicry
BiomimicryBiomimicry
BiomimicryNUS SDE
 

Viewers also liked (12)

Real time ship tracking system using ais data
Real time ship tracking system using ais dataReal time ship tracking system using ais data
Real time ship tracking system using ais data
 
Flapping Foil Propulsion System in Ship and Underwater Vehicles
Flapping Foil Propulsion System in Ship and Underwater Vehicles Flapping Foil Propulsion System in Ship and Underwater Vehicles
Flapping Foil Propulsion System in Ship and Underwater Vehicles
 
Marine Propulsion
Marine PropulsionMarine Propulsion
Marine Propulsion
 
Propulsion Systems Of Ships
Propulsion Systems Of ShipsPropulsion Systems Of Ships
Propulsion Systems Of Ships
 
Marine Propulsion History and Electric Propulsion & Future Technology
Marine Propulsion History and Electric Propulsion & Future TechnologyMarine Propulsion History and Electric Propulsion & Future Technology
Marine Propulsion History and Electric Propulsion & Future Technology
 
A seminar report on Electric Propulsion
A seminar report on Electric PropulsionA seminar report on Electric Propulsion
A seminar report on Electric Propulsion
 
The Electric Propulsion Systems
The Electric Propulsion SystemsThe Electric Propulsion Systems
The Electric Propulsion Systems
 
Hydraulics training
Hydraulics trainingHydraulics training
Hydraulics training
 
SHIP PROPULSION SEMINAR report
SHIP PROPULSION SEMINAR reportSHIP PROPULSION SEMINAR report
SHIP PROPULSION SEMINAR report
 
Basic hydraulic circuit
Basic hydraulic circuitBasic hydraulic circuit
Basic hydraulic circuit
 
BIOMIMETIC ARCHITECTURE
BIOMIMETIC ARCHITECTUREBIOMIMETIC ARCHITECTURE
BIOMIMETIC ARCHITECTURE
 
Biomimicry
BiomimicryBiomimicry
Biomimicry
 

Similar to Fast Analytics for Cassandra with Hadoop

Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to SchoolAdam Doyle
 
Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...NashvilleTechCouncil
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introductiondewang_mistry
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 

Similar to Fast Analytics for Cassandra with Hadoop (20)

Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Training
TrainingTraining
Training
 
Hadoop - A Very Short Introduction
Hadoop - A Very Short IntroductionHadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 

More from Jeremy Hanna

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraJeremy Hanna
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for DevelopersJeremy Hanna
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting CassandraJeremy Hanna
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraJeremy Hanna
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraJeremy Hanna
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsJeremy Hanna
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop IntegrationJeremy Hanna
 
Intro to cassandra + hadoop
Intro to cassandra + hadoopIntro to cassandra + hadoop
Intro to cassandra + hadoopJeremy Hanna
 

More from Jeremy Hanna (12)

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for Developers
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting Cassandra
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Intro to cassandra + hadoop
Intro to cassandra + hadoopIntro to cassandra + hadoop
Intro to cassandra + hadoop
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Fast Analytics for Cassandra with Hadoop

  • 2.  BigTable + Dynamo  Semi-structured data model  Decentralized – no special roles  Ridiculously fast writes, fast reads  Tunably consistent  Cross-DC capable
  • 3.  You design your data model based off of your query model  Real-time ad-hoc queries aren’t viable  Secondary indexes help (0.7)  What about analytics?
  • 4.  Hadoop has analytics  MapReduce  Pig/Hive and other tools built above MapReduce  Configurable data sources/destinations  Many already familiar with it  Active community
  • 5.  Always able to output to Cassandra directly  0.6  ColumnFamilyInputFormat  Pig support – Cassandra LoadFunc  0.7  ColumnFamilyOutputFormat  Hadoop Streaming Output  Streamlined configuration
  • 6.  Recipe  Overlay Hadoop on top of Cassandra  Separate server for name node and job tracker  Co-locate task trackers with Cassandra nodes  Add data nodes to taste  Voilà  Data locality  Analytics engine scales with data  Example
  • 7.  Cassandra specific InputFormat  Configuration – ConfigHelper, Hadoop variables  InputSplits over the data – tunable  Example usage in contrib/word_count
  • 8.  OutputFormat  Configuration – ConfigHelper, Hadoop variables  Batches output – tunable  Don’t have to use Cassandra api  Some optimizations (e.g. ConsistencyLevel.ONE)  Example usage in contrib/word_count
  • 9.  60,000+ Documented UFO Sightings  Data set from http://infochimps.com sighted_at reported_at location shape duration description 19951009 19951009 Iowa City, IA Man repts.Witnessing “flash, followed by a classic UFO, w/ a tailfin at back.” … 19940801 19950220 Renton, WA Man repts. seeing 2x large ships hovering in night sky while using Russian-made night binoculars. 19970111 19970111 St. Cloud, MN pyramid 2 min. Summary : Right when me and my friend left my house we saw a bright green glowing object that looked like a 4 sided pyramid then after about 2 min it took off straight into the sky leaving a yellow trail behind it…
  • 10.  What about languages outside of Java?  Build on what Hadoop uses - Streaming  Output streaming in 0.7.0  Example in contrib/hadoop_streaming_output  Input streaming in progress, likely 0.7.1
  • 11.  Developed atYahoo!  PigLatin/Grunt shell  Powerful scripting language for analytics  Example usage in contrib/pig  Configuration – Hadoop/Env variables
  • 12.  Raptr.com  Home grown solution -> Cassandra + Hadoop  Query time: hours -> minutes  Pig obviated their need for multi-lingual MR  Speed and ease are enabling  Imagini/Visual DNA  US Government (Digital Reasoning)  See http://github.com/digitalreasoning/PyStratus
  • 13.  Hive support in progress (HIVE-1434)  Hadoop Input Streaming (likely 0.7.1)  Performance improvements
  • 14.  Hadoop analytics for Cassandra  Data locality for processing  Scales with the cluster
  • 15.  More information  http://cassandra.apache.org  http://wiki.apache.org/cassandra/HadoopSupport  Cassandra:The Definitive Guide  About me:  jeremy.hanna@rackspace.com  @jeromatron onTwitter  jeromatron on IRC in #cassandra

Editor's Notes

  1. Talk a little about background of the theme – hippies, The Turtles, readability.
  2. Mention Jeff Hodges, Johan, Stu, and Todd Lipcon.
  3. Mention how InputSplit works and how it can choose among replicas – array of locations returned.
  4. Highlight how this is the same extension point that is used with HDFS, HBase and any other data source/destination for MapReduce.
  5. IOW, are people using this stuff in the real world? In production? Put some notes in here about raptr and imagini’s use cases.