SlideShare ist ein Scribd-Unternehmen logo
1 von 32
HBase for Dealing with
Large Matrices
Who am I?
Leads data team at Dilisim
Researcher at Anadolu University
Machine Learning
Some big problems
Classifying huge text collections
Recommending to millions of users
Predicting links in a social network
Recommender Systems
Recommenders input large sparse
matrices
How would you input a millions X
millions matrix?
Recommender Systems
m users
3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 …
2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00 2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00…
3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 …
0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 …
0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 …
4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00 4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00…
4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00…
1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 …
1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00 1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00…
3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00 3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00…
0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00 0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00…
4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00…
1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 …
3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 …
0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 …
…………………………………………………………………………………………………………………… .
…………………………………………………………………………………………………………………… .
………………………………………………………………………………………………………………… … .
n items
Input
Recommender Systems
State-of-the-art recommender systems learn
large models
One factor vector per each user and item
One parameter vector (on side info) per
each user and item
Recommender Systems
m users
3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 …
2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00 …
3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 …
0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 …
0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 …
4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00 …
4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 …
1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 …
1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00 …
3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00 …
0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00 …
………………………………………………………… .
………………………………………………………… .
………………………………………………………… .
n items
Input
User Model
Item Model
m x k
n x k
0.54 0.48 0.83 0.75 0.28 …
0.02 0.29 0.99 0.85 0.68 …
0.05 0.53 0.60 0.98 0.19 …
0.52 0.47 0.50 0.12 0.98 …
0.26 0.39 0.29 0.91 0.50 …
0.15 0.43 0.66 0.07 0.51 …
0.52 0.36 0.01 0.87 0.53 …
…………………………. .
………………………….. .
…………………………... .
0.93 0.78 0.56 0.77 0.75 …
0.21 0.44 0.99 0.01 0.00 …
0.04 0.42 0.36 0.72 0.19 …
0.77 0.07 0.24 0.67 0.87 …
0.42 0.79 0.62 0.80 0.79 …
0.42 0.32 0.26 0.50 0.85 …
0.94 0.76 0.93 0.34 0.46 …
…………………………. .
………………………….. .
…………………………... .
Learning Process
What does a machine learning algorithm
require to do with that matrix?
Machine Learning - Techniques
Batch Learning
All parameters are updated once per
iteration
Machine Learning - Techniques
Batch Learning
Updates can be calculated in parallel
using MapReduce
(SequenceFile might be enough)
Machine Learning - Techniques
Batch Learning
Output model should provide random
access to rows
Machine Learning - Techniques
Online Learning
Parameters are updated per training
example
Machine Learning - Techniques
Online Learning
Each update results in updates in
a row
Needs random access while learning
Machine Learning - Techniques
Online Learning
Output model should provide random
access to rows
Deployment Process
How do you decide to deploy a machine
learning model in production?
Machine Learning - Deployment
Usual process
Works
good?
Deploy in
production
Experiment
on prototype
Y
N
Machine Learning - Deployment
How would you turn your prototype into
production easily?
Common matrix interface for in-
memory and persistent versions
HBase Backed Matrix
Implements Mahout matrix
Dense or sparse
HBase Backed Matrix
Random access to cells
Random access to rows
Iteration over rows
Lazy loading while iterating
HBase Backed Matrix
Common interface for prototype and
product
Easy to deploy (Model already persisted)
HBase Backed Matrix
Matrix operations with existing mahout-
math library
Logical Schema
Composite row keys:
12_0:
12_9:
12_22000:
data:value:0.41
data:value:0.41
data:value:0.41
Logical Schema
Composite row keys:
Row access by scan
Cell access by get
Atomic row update should be
handled in application
Logical Schema
Row indices as row keys
12:
data:0:0.41 data:22000:0.41data:9:0.41
Logical Schema
Row indices as row keys
Atomic updates are handled
automatically
Speed – Cell access/write
GET SET
row index as row key
composite row key
Speed – Row access/write
GET SET
row index as row key
composite row key
Code
github.com/gcapan/mahout/tree/hbase-matrix
Future Work
MatrixInputFormat
Might replace SequenceFile based
MapReduce inputs
Future Work – A little digression
Recommender Systems
Calculating score for a user-item
pair is easy with HBaseMatrix
Future Work – A little digression
Recommender Systems
top-N recommendation?
All candidate items for a user in
the user row as a nested entity
(See Ian Varley's HBase Schema
Design)
Thank you!

Weitere ähnliche Inhalte

Andere mochten auch

Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRApplication architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRmarkgrover
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsCloudera, Inc.
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!Cloudera, Inc.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashCloudera, Inc.
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.
 

Andere mochten auch (20)

Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MRApplication architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
 
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 

Ähnlich wie HBaseCon 2013: Using Apache HBase for Large Matrices

IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018Amazon Web Services
 
Big data technologies : A survey
Big data technologies : A survey Big data technologies : A survey
Big data technologies : A survey fatimabenjelloun1
 
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Ivan Kitov
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?Sematext Group, Inc.
 
Advanced online search through the web
Advanced online search through the webAdvanced online search through the web
Advanced online search through the webnetknowlogy
 
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptxLuis Beltran
 
Crab: A Python Framework for Building Recommender Systems
Crab: A Python Framework for Building Recommender Systems Crab: A Python Framework for Building Recommender Systems
Crab: A Python Framework for Building Recommender Systems Marcel Caraciolo
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 
Lazard network correlation_architecture
Lazard network correlation_architectureLazard network correlation_architecture
Lazard network correlation_architectureJean Meilhoc Ricaume
 
Chatbot mohinh sinh
Chatbot mohinh sinhChatbot mohinh sinh
Chatbot mohinh sinhpqtrung5th1
 
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...vinoth raja
 
Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarPanos Gemos
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Luis Beltran
 
There and Back Again - A Tale of Programming Languages
There and Back Again - A Tale of Programming LanguagesThere and Back Again - A Tale of Programming Languages
There and Back Again - A Tale of Programming LanguagesBADR
 
report.doc
report.docreport.doc
report.docbutest
 
Rock Paper Scissors with MLNET.pptx
Rock Paper Scissors with MLNET.pptxRock Paper Scissors with MLNET.pptx
Rock Paper Scissors with MLNET.pptxLuis Beltran
 
Machine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp SwitzerlandMachine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp SwitzerlandMemi Beltrame
 
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan ChardennauSearch Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan ChardennauPozzolini
 
Anti-MOOCs: The design of MACROSIMs
Anti-MOOCs: The design of MACROSIMsAnti-MOOCs: The design of MACROSIMs
Anti-MOOCs: The design of MACROSIMsdws1d
 
Stacy Fileccia TW Portfolio 2016
Stacy Fileccia TW Portfolio 2016Stacy Fileccia TW Portfolio 2016
Stacy Fileccia TW Portfolio 2016Stacy Fileccia
 

Ähnlich wie HBaseCon 2013: Using Apache HBase for Large Matrices (20)

IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
 
Big data technologies : A survey
Big data technologies : A survey Big data technologies : A survey
Big data technologies : A survey
 
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
 
Advanced online search through the web
Advanced online search through the webAdvanced online search through the web
Advanced online search through the web
 
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
 
Crab: A Python Framework for Building Recommender Systems
Crab: A Python Framework for Building Recommender Systems Crab: A Python Framework for Building Recommender Systems
Crab: A Python Framework for Building Recommender Systems
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 
Lazard network correlation_architecture
Lazard network correlation_architectureLazard network correlation_architecture
Lazard network correlation_architecture
 
Chatbot mohinh sinh
Chatbot mohinh sinhChatbot mohinh sinh
Chatbot mohinh sinh
 
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
CS8075 - Data Warehousing and Data Mining (Ripped from Amazon Kindle eBooks b...
 
Recommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum RadarRecommendation Subsystem - Museum Radar
Recommendation Subsystem - Museum Radar
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
 
There and Back Again - A Tale of Programming Languages
There and Back Again - A Tale of Programming LanguagesThere and Back Again - A Tale of Programming Languages
There and Back Again - A Tale of Programming Languages
 
report.doc
report.docreport.doc
report.doc
 
Rock Paper Scissors with MLNET.pptx
Rock Paper Scissors with MLNET.pptxRock Paper Scissors with MLNET.pptx
Rock Paper Scissors with MLNET.pptx
 
Machine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp SwitzerlandMachine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp Switzerland
 
Search Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan ChardennauSearch Engine Risk Dependency by Ronan Chardennau
Search Engine Risk Dependency by Ronan Chardennau
 
Anti-MOOCs: The design of MACROSIMs
Anti-MOOCs: The design of MACROSIMsAnti-MOOCs: The design of MACROSIMs
Anti-MOOCs: The design of MACROSIMs
 
Stacy Fileccia TW Portfolio 2016
Stacy Fileccia TW Portfolio 2016Stacy Fileccia TW Portfolio 2016
Stacy Fileccia TW Portfolio 2016
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Kürzlich hochgeladen (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

HBaseCon 2013: Using Apache HBase for Large Matrices

  • 1. HBase for Dealing with Large Matrices
  • 2. Who am I? Leads data team at Dilisim Researcher at Anadolu University
  • 3. Machine Learning Some big problems Classifying huge text collections Recommending to millions of users Predicting links in a social network
  • 4. Recommender Systems Recommenders input large sparse matrices How would you input a millions X millions matrix?
  • 5. Recommender Systems m users 3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 … 2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00 2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00… 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 … 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 … 0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 … 4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00 4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00… 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00… 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 … 1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00 1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00… 3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00 3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00… 0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00 0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00… 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00… 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 … 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 … 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 … …………………………………………………………………………………………………………………… . …………………………………………………………………………………………………………………… . ………………………………………………………………………………………………………………… … . n items Input
  • 6. Recommender Systems State-of-the-art recommender systems learn large models One factor vector per each user and item One parameter vector (on side info) per each user and item
  • 7. Recommender Systems m users 3.00 0.00 2.00 0.00 4.00 2.00 3.00 2.00 1.00 3.00 0.00 … 2.00 3.00 2.00 1.00 1.00 4.00 2.00 3.00 0.00 2.00 3.00 … 3.00 4.00 3.00 2.00 3.00 4.00 1.00 1.00 1.00 1.00 3.00 … 0.00 0.00 0.00 0.00 4.00 1.00 3.00 4.00 2.00 1.00 0.00 … 0.00 2.00 3.00 2.00 2.00 4.00 3.00 0.00 3.00 4.00 2.00 … 4.00 2.00 1.00 4.00 2.00 3.00 3.00 0.00 3.00 1.00 2.00 … 4.00 0.00 4.00 2.00 2.00 3.00 3.00 3.00 2.00 0.00 0.00 … 1.00 3.00 2.00 4.00 2.00 3.00 0.00 0.00 3.00 0.00 3.00 … 1.00 4.00 1.00 1.00 3.00 4.00 2.00 2.00 0.00 1.00 4.00 … 3.00 3.00 2.00 3.00 4.00 1.00 4.00 0.00 1.00 0.00 1.00 … 0.00 1.00 2.00 4.00 2.00 2.00 3.00 4.00 4.00 4.00 1.00 … ………………………………………………………… . ………………………………………………………… . ………………………………………………………… . n items Input User Model Item Model m x k n x k 0.54 0.48 0.83 0.75 0.28 … 0.02 0.29 0.99 0.85 0.68 … 0.05 0.53 0.60 0.98 0.19 … 0.52 0.47 0.50 0.12 0.98 … 0.26 0.39 0.29 0.91 0.50 … 0.15 0.43 0.66 0.07 0.51 … 0.52 0.36 0.01 0.87 0.53 … …………………………. . ………………………….. . …………………………... . 0.93 0.78 0.56 0.77 0.75 … 0.21 0.44 0.99 0.01 0.00 … 0.04 0.42 0.36 0.72 0.19 … 0.77 0.07 0.24 0.67 0.87 … 0.42 0.79 0.62 0.80 0.79 … 0.42 0.32 0.26 0.50 0.85 … 0.94 0.76 0.93 0.34 0.46 … …………………………. . ………………………….. . …………………………... .
  • 8. Learning Process What does a machine learning algorithm require to do with that matrix?
  • 9. Machine Learning - Techniques Batch Learning All parameters are updated once per iteration
  • 10. Machine Learning - Techniques Batch Learning Updates can be calculated in parallel using MapReduce (SequenceFile might be enough)
  • 11. Machine Learning - Techniques Batch Learning Output model should provide random access to rows
  • 12. Machine Learning - Techniques Online Learning Parameters are updated per training example
  • 13. Machine Learning - Techniques Online Learning Each update results in updates in a row Needs random access while learning
  • 14. Machine Learning - Techniques Online Learning Output model should provide random access to rows
  • 15. Deployment Process How do you decide to deploy a machine learning model in production?
  • 16. Machine Learning - Deployment Usual process Works good? Deploy in production Experiment on prototype Y N
  • 17. Machine Learning - Deployment How would you turn your prototype into production easily? Common matrix interface for in- memory and persistent versions
  • 18. HBase Backed Matrix Implements Mahout matrix Dense or sparse
  • 19. HBase Backed Matrix Random access to cells Random access to rows Iteration over rows Lazy loading while iterating
  • 20. HBase Backed Matrix Common interface for prototype and product Easy to deploy (Model already persisted)
  • 21. HBase Backed Matrix Matrix operations with existing mahout- math library
  • 22. Logical Schema Composite row keys: 12_0: 12_9: 12_22000: data:value:0.41 data:value:0.41 data:value:0.41
  • 23. Logical Schema Composite row keys: Row access by scan Cell access by get Atomic row update should be handled in application
  • 24. Logical Schema Row indices as row keys 12: data:0:0.41 data:22000:0.41data:9:0.41
  • 25. Logical Schema Row indices as row keys Atomic updates are handled automatically
  • 26. Speed – Cell access/write GET SET row index as row key composite row key
  • 27. Speed – Row access/write GET SET row index as row key composite row key
  • 29. Future Work MatrixInputFormat Might replace SequenceFile based MapReduce inputs
  • 30. Future Work – A little digression Recommender Systems Calculating score for a user-item pair is easy with HBaseMatrix
  • 31. Future Work – A little digression Recommender Systems top-N recommendation? All candidate items for a user in the user row as a nested entity (See Ian Varley's HBase Schema Design)