SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Sponsored By:
Big Data Warehousing Meetup
Today’s Topic: Introduction to
NoSQL with 10Gen
WELCOME!
Joe Caserta
Founder & President, Caserta Concepts
7:00 Networking
Grab a slice of pizza and a drink...
7:15 Joe Caserta
President, Caserta Concepts
Author, Data Warehouse ETL Toolkit
Welcome
About the Meetup and about Caserta Concepts
7:30 Elliott Cordo
Principal Consultant, Caserta Concepts
Intro to NoSQL
7:50 Mike O’Brian
10Gen
MongoDB
8:10 -
9:00
More Networking
Tell us what you’re up to…
Agenda
About BDW Meetup
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Opportunities to collaborate on exciting
projects
• Next BDW Meetup: June 10.
• Topic: TBD (What would you like to see?)
Send ideas to joe@casertaconcepts.com
About Caserta Concepts
• Financial Services
• Healthcare / Insurance
• Retail / eCommerce
• Digital Media / Marketing
• K-12 / Higher Education
Industries Served
• President: Joe Caserta, industry thought leader,
consultant, educator and co-author, The Data
Warehouse ETL Toolkit (Wiley, 2004)
Founded in 2001
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Strategic Data
Ecosystems
Focused
Expertise
Client Portfolio
Finance
& Insurance
Retail/eCommerce
& Manufacturing
Education
& Services
Expertise & Offerings
Strategic Roadmap/
Assessment/Consulting
Database
BI/Visualization/
Analytics
Master Data Management
Big Data
Analytics
Storm
Opportunities
Does this word cloud excite you?
Speak with us about our open positions: jobs@casertaconcepts.com
Contacts
Joe Caserta
President & Founder, Caserta Concepts
P: (855) 755-2246 x227
E: joe@casertaconcepts.com
Dana Canavan
Director, Sales & Marketing
P: (855) 755-2246 x226
E: dana@casertaconcepts.com
Elliott Cordo
Principal Consultant, Caserta Concepts
P: (855) 755-2246 x267
E: elliott@casertaconcepts.com
info@casertaconcepts.com
1(855) 755-2246
www.casertaconcepts.com
ANALYZING DATA: INTRO TO NOSQL
Elliott Cordo
Principal Consultant, Caserta Concepts
Soo.. No More SQL?
• Relational databases still have their place
• Flexible/General Purpose
• Rich Query Syntax
• Familiar
• However there are some interesting alternatives for
analytic databases
• Columnar/Key Value
• Document
• Graph
• PS. many NoSQL databases have SQL-Like interfaces 
Think Not Only SQL!
Why are we doing this?
Not all data is efficiently stored in a relational DB.
• Sparse Data
• Data with a lot of variation
• Relationships -> funny how relational databases are not
great at relations
Scale and Performance
Performance:
• Relational databases have a lot of features, overhead that we
don’t need in many cases. Although we will miss some…
Scaling:
• Most relational databases scale vertically giving them limits to
how large they can get. Federation and Sharding is an
awkward manual process.
• Most NoSQL scale horizontally on commodity hardware
Note Graph database architecture lends itself to a single graph
existing on one server. Several vendors have overcome this:
Titan, InfiniteGraph.
Object Impedance Mismatch
Relational databases rarely look the way our applications want
them too. So much time is assembling and disassembling
relational data.
GetSale
Select * Sales_Header Join Sales_Detail Join
Sales_Tender join User Join Order Type Join
Tender Type Join Product Join Channel Join
User_Account etc, etc
CreateSale
Insert into Sales Header
Insert into Sales Detail
Insert/Update User_Account
Insert into Sales Tender
etc, etc
But what will we sacrifice?
• NoSQL DB’s have fairly simple query languages. Limited
support for the following:
• Joins
• Aggregation
• Secondary indexes
Why? - NoSQL databases were born to be high
performance
• Data is stored as it is to be used (tuned to a query) rather
than modeled around entities. So a sophisticated query
language is not needed.
So what about NoSQL as the Data
Warehouse?
• NoSQL databases are generally not as flexible as relational
databases for ad-hoc questions.
• Secondary indexes provide some flexibility but lack of Joins
requires denormalization
• Materialized views: Joins and aggregates can be implemented
via Map Reduce. Even using our animal friends:
• However materializing the world has it’s drawbacks!
NoSQL can be a good fit for certain
analytic applications
• High volumes/Low Latency analytic
environments
• Queries are largely known and can be
precomuted in-stream (via application itself or
Storm) or in batch using Map Reduce
• Cassandra also has counter functions which
can be helpful in pre-computing aggregates.
• Sweet spot is very high volumes with relatively
static analytic requirements.
RDBMS NoSQL
Volume
QueryFlexibility
• Platforms: Cassandra, HBase
• Column families are the equivalent to a table in a RDMS
• Primary unit of storage is a column, they are stored
contiguously
Skinny Rows: Most like relational database. Except
columns are optional and not stored if omitted:
Wide Rows: Rows can be billions of columns wide, used
for time series, relationships, secondary indexes:
Columnar
Document
• Platforms: MongoDB, CouchDB
• Collections are the equivalent to a table in a RDMS
• Primary unit of storage is a document
{ “User" : ”Bobby”,
“Email”: bobby@db-lover.com,
“Channel”: “Web”,
“State”: “NJ” }
{ “User" : ”Susie”,
“Email”: “Susie@sql-enthusiast.com”,
“PreferredCategories: [
{ Category: “Fashion”,
CategoryAdded: “2012-01-01” },
{ Category: “Outdoor Equipment”,
CategoryAdded: “2013-01-01” } ],
“Channel”: In-Store }
Graph
• Platforms: NeoJ4, Titan
• Relationship are front and center! Relationships can have properties
of their own.
Bobby
Jillian
Frank
Hair bowsChainsaw
Friends
Likes
Purchased
Date: 2013-02-14
Channel: In-Store
Friends
Susie
Purchased
Date: 2013-01-31
Recommendation: Maybe
Jillian wants a Chainsaw too!
Friends
Likes Profile
Date: 2013-01-01
Gremlin query language:
• Find all Franks outgoing Relationships
• Find all Products related to Jillian
• Find shortest path from Frank to Susie
• Cool collaborative filtering functions too!
Our Use Case: High Volume Sensor
Analytics
• Ingestion and analytics of Sensor Data
• 6 to 12 BILLION records being ingested daily (average
140k records per second at peek load)!
• Ingested data must be stored to disk and highly available
• Pre-defined aggregates and event monitors must be near
real-time
• Ad-hoc query capabilities required on historical data
How do we hope to accomplish this?
Storm Cluster
Sensor
Data
d3.js Analytics
Hadoop Cluster
Low Latency
Analytics
Cassandra
Cluster
Kafka
Atomic data
Aggregates
Event Monitors
• The Kafka messaging system is used for ingestion
• Storm is used for real-time ETL and outputs atomic data
and derived data needed for analytics
• Real time analytics are produced from the aggregated
data.
• Higher latency ad-hoc analytics are done in Hadoop
using Pig and Hive
Parting Thought
Polyglot Persistence – “where any decent sized
enterprise will have a variety of different data storage
technologies for different kinds of data. There will still
be large amounts of it managed in relational stores,
but increasingly we'll be first asking how we want to
manipulate the data and only then figuring out what
technology is the best bet for it.”
-- Martin Fowler

Weitere ähnliche Inhalte

Andere mochten auch

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Le Coaching Corporatif Pdf
Le Coaching Corporatif PdfLe Coaching Corporatif Pdf
Le Coaching Corporatif Pdfthetrainercaoch
 
Les 4 manifestations du Charisme
Les 4 manifestations du CharismeLes 4 manifestations du Charisme
Les 4 manifestations du CharismeJean-Claude Assal
 
Mind Mapping et formation d'adultes
Mind Mapping et formation d'adultesMind Mapping et formation d'adultes
Mind Mapping et formation d'adultesFormation 3.0
 
LVR Fleet - Logiciel Entretien Post'Accident
LVR Fleet - Logiciel Entretien Post'AccidentLVR Fleet - Logiciel Entretien Post'Accident
LVR Fleet - Logiciel Entretien Post'Accidentlavieroutiere
 
Etsi on s'intéressait aussi à ce qui est
Etsi on s'intéressait aussi à ce qui estEtsi on s'intéressait aussi à ce qui est
Etsi on s'intéressait aussi à ce qui estExperience Conseil
 
Le jargon managérial, une novlangue
Le jargon managérial, une novlangueLe jargon managérial, une novlangue
Le jargon managérial, une novlanguefarahgueldi
 
Posh by FERI Magazine
Posh by FERI MagazinePosh by FERI Magazine
Posh by FERI MagazineRoselyn Moyo
 
Les cartes heuristiques au service de la pédagogie
Les cartes heuristiques au service de la pédagogieLes cartes heuristiques au service de la pédagogie
Les cartes heuristiques au service de la pédagogieThomas LONGEON
 
Enseigner autrement avec le mind mapping
Enseigner autrement avec le mind mapping  Enseigner autrement avec le mind mapping
Enseigner autrement avec le mind mapping Formation 3.0
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Formation professionnelle BAD / Tunisie Courte Présentation
Formation professionnelle BAD / Tunisie Courte PrésentationFormation professionnelle BAD / Tunisie Courte Présentation
Formation professionnelle BAD / Tunisie Courte PrésentationSatu Järvinen
 
Cas insertion signos
Cas insertion signosCas insertion signos
Cas insertion signosSignos
 
Ebook mind-mapping-et-cerveau-individuel
Ebook mind-mapping-et-cerveau-individuelEbook mind-mapping-et-cerveau-individuel
Ebook mind-mapping-et-cerveau-individuelElsa von Licy
 

Andere mochten auch (16)

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
CV 2015 Arts & culture
CV 2015 Arts & cultureCV 2015 Arts & culture
CV 2015 Arts & culture
 
Le Coaching Corporatif Pdf
Le Coaching Corporatif PdfLe Coaching Corporatif Pdf
Le Coaching Corporatif Pdf
 
Les 4 manifestations du Charisme
Les 4 manifestations du CharismeLes 4 manifestations du Charisme
Les 4 manifestations du Charisme
 
Mind Mapping et formation d'adultes
Mind Mapping et formation d'adultesMind Mapping et formation d'adultes
Mind Mapping et formation d'adultes
 
LVR Fleet - Logiciel Entretien Post'Accident
LVR Fleet - Logiciel Entretien Post'AccidentLVR Fleet - Logiciel Entretien Post'Accident
LVR Fleet - Logiciel Entretien Post'Accident
 
Etsi on s'intéressait aussi à ce qui est
Etsi on s'intéressait aussi à ce qui estEtsi on s'intéressait aussi à ce qui est
Etsi on s'intéressait aussi à ce qui est
 
Le jargon managérial, une novlangue
Le jargon managérial, une novlangueLe jargon managérial, une novlangue
Le jargon managérial, une novlangue
 
Posh by FERI Magazine
Posh by FERI MagazinePosh by FERI Magazine
Posh by FERI Magazine
 
Les cartes heuristiques au service de la pédagogie
Les cartes heuristiques au service de la pédagogieLes cartes heuristiques au service de la pédagogie
Les cartes heuristiques au service de la pédagogie
 
Enseigner autrement avec le mind mapping
Enseigner autrement avec le mind mapping  Enseigner autrement avec le mind mapping
Enseigner autrement avec le mind mapping
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Formation professionnelle BAD / Tunisie Courte Présentation
Formation professionnelle BAD / Tunisie Courte PrésentationFormation professionnelle BAD / Tunisie Courte Présentation
Formation professionnelle BAD / Tunisie Courte Présentation
 
Cas insertion signos
Cas insertion signosCas insertion signos
Cas insertion signos
 
Ebook mind-mapping-et-cerveau-individuel
Ebook mind-mapping-et-cerveau-individuelEbook mind-mapping-et-cerveau-individuel
Ebook mind-mapping-et-cerveau-individuel
 
Mind mapping-et-efficacite-cognitive
Mind mapping-et-efficacite-cognitiveMind mapping-et-efficacite-cognitive
Mind mapping-et-efficacite-cognitive
 

Mehr von Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

Mehr von Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Kürzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Kürzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Big Data Warehousing Meetup: Intro to NoSQL databases

  • 1. Sponsored By: Big Data Warehousing Meetup Today’s Topic: Introduction to NoSQL with 10Gen
  • 2. WELCOME! Joe Caserta Founder & President, Caserta Concepts
  • 3. 7:00 Networking Grab a slice of pizza and a drink... 7:15 Joe Caserta President, Caserta Concepts Author, Data Warehouse ETL Toolkit Welcome About the Meetup and about Caserta Concepts 7:30 Elliott Cordo Principal Consultant, Caserta Concepts Intro to NoSQL 7:50 Mike O’Brian 10Gen MongoDB 8:10 - 9:00 More Networking Tell us what you’re up to… Agenda
  • 4. About BDW Meetup • Big Data is a complex, rapidly changing landscape • We want to share our stories and hear about yours • Great networking opportunity for like minded data nerds • Opportunities to collaborate on exciting projects • Next BDW Meetup: June 10. • Topic: TBD (What would you like to see?) Send ideas to joe@casertaconcepts.com
  • 5. About Caserta Concepts • Financial Services • Healthcare / Insurance • Retail / eCommerce • Digital Media / Marketing • K-12 / Higher Education Industries Served • President: Joe Caserta, industry thought leader, consultant, educator and co-author, The Data Warehouse ETL Toolkit (Wiley, 2004) Founded in 2001 • Big Data Analytics • Data Warehousing • Business Intelligence • Strategic Data Ecosystems Focused Expertise
  • 6. Client Portfolio Finance & Insurance Retail/eCommerce & Manufacturing Education & Services
  • 7. Expertise & Offerings Strategic Roadmap/ Assessment/Consulting Database BI/Visualization/ Analytics Master Data Management Big Data Analytics Storm
  • 8. Opportunities Does this word cloud excite you? Speak with us about our open positions: jobs@casertaconcepts.com
  • 9. Contacts Joe Caserta President & Founder, Caserta Concepts P: (855) 755-2246 x227 E: joe@casertaconcepts.com Dana Canavan Director, Sales & Marketing P: (855) 755-2246 x226 E: dana@casertaconcepts.com Elliott Cordo Principal Consultant, Caserta Concepts P: (855) 755-2246 x267 E: elliott@casertaconcepts.com info@casertaconcepts.com 1(855) 755-2246 www.casertaconcepts.com
  • 10. ANALYZING DATA: INTRO TO NOSQL Elliott Cordo Principal Consultant, Caserta Concepts
  • 11. Soo.. No More SQL? • Relational databases still have their place • Flexible/General Purpose • Rich Query Syntax • Familiar • However there are some interesting alternatives for analytic databases • Columnar/Key Value • Document • Graph • PS. many NoSQL databases have SQL-Like interfaces  Think Not Only SQL!
  • 12. Why are we doing this? Not all data is efficiently stored in a relational DB. • Sparse Data • Data with a lot of variation • Relationships -> funny how relational databases are not great at relations
  • 13. Scale and Performance Performance: • Relational databases have a lot of features, overhead that we don’t need in many cases. Although we will miss some… Scaling: • Most relational databases scale vertically giving them limits to how large they can get. Federation and Sharding is an awkward manual process. • Most NoSQL scale horizontally on commodity hardware Note Graph database architecture lends itself to a single graph existing on one server. Several vendors have overcome this: Titan, InfiniteGraph.
  • 14. Object Impedance Mismatch Relational databases rarely look the way our applications want them too. So much time is assembling and disassembling relational data. GetSale Select * Sales_Header Join Sales_Detail Join Sales_Tender join User Join Order Type Join Tender Type Join Product Join Channel Join User_Account etc, etc CreateSale Insert into Sales Header Insert into Sales Detail Insert/Update User_Account Insert into Sales Tender etc, etc
  • 15. But what will we sacrifice? • NoSQL DB’s have fairly simple query languages. Limited support for the following: • Joins • Aggregation • Secondary indexes Why? - NoSQL databases were born to be high performance • Data is stored as it is to be used (tuned to a query) rather than modeled around entities. So a sophisticated query language is not needed.
  • 16. So what about NoSQL as the Data Warehouse? • NoSQL databases are generally not as flexible as relational databases for ad-hoc questions. • Secondary indexes provide some flexibility but lack of Joins requires denormalization • Materialized views: Joins and aggregates can be implemented via Map Reduce. Even using our animal friends: • However materializing the world has it’s drawbacks!
  • 17. NoSQL can be a good fit for certain analytic applications • High volumes/Low Latency analytic environments • Queries are largely known and can be precomuted in-stream (via application itself or Storm) or in batch using Map Reduce • Cassandra also has counter functions which can be helpful in pre-computing aggregates. • Sweet spot is very high volumes with relatively static analytic requirements. RDBMS NoSQL Volume QueryFlexibility
  • 18. • Platforms: Cassandra, HBase • Column families are the equivalent to a table in a RDMS • Primary unit of storage is a column, they are stored contiguously Skinny Rows: Most like relational database. Except columns are optional and not stored if omitted: Wide Rows: Rows can be billions of columns wide, used for time series, relationships, secondary indexes: Columnar
  • 19. Document • Platforms: MongoDB, CouchDB • Collections are the equivalent to a table in a RDMS • Primary unit of storage is a document { “User" : ”Bobby”, “Email”: bobby@db-lover.com, “Channel”: “Web”, “State”: “NJ” } { “User" : ”Susie”, “Email”: “Susie@sql-enthusiast.com”, “PreferredCategories: [ { Category: “Fashion”, CategoryAdded: “2012-01-01” }, { Category: “Outdoor Equipment”, CategoryAdded: “2013-01-01” } ], “Channel”: In-Store }
  • 20. Graph • Platforms: NeoJ4, Titan • Relationship are front and center! Relationships can have properties of their own. Bobby Jillian Frank Hair bowsChainsaw Friends Likes Purchased Date: 2013-02-14 Channel: In-Store Friends Susie Purchased Date: 2013-01-31 Recommendation: Maybe Jillian wants a Chainsaw too! Friends Likes Profile Date: 2013-01-01 Gremlin query language: • Find all Franks outgoing Relationships • Find all Products related to Jillian • Find shortest path from Frank to Susie • Cool collaborative filtering functions too!
  • 21. Our Use Case: High Volume Sensor Analytics • Ingestion and analytics of Sensor Data • 6 to 12 BILLION records being ingested daily (average 140k records per second at peek load)! • Ingested data must be stored to disk and highly available • Pre-defined aggregates and event monitors must be near real-time • Ad-hoc query capabilities required on historical data
  • 22. How do we hope to accomplish this? Storm Cluster Sensor Data d3.js Analytics Hadoop Cluster Low Latency Analytics Cassandra Cluster Kafka Atomic data Aggregates Event Monitors • The Kafka messaging system is used for ingestion • Storm is used for real-time ETL and outputs atomic data and derived data needed for analytics • Real time analytics are produced from the aggregated data. • Higher latency ad-hoc analytics are done in Hadoop using Pig and Hive
  • 23. Parting Thought Polyglot Persistence – “where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.” -- Martin Fowler