Suche senden
Hochladen
May 2012 HUG: The Changing Big Data Landscape
•
Als KEY, PDF herunterladen
•
1 gefällt mir
•
1,383 views
Yahoo Developer Network
Folgen
Technologie
Melden
Teilen
Melden
Teilen
1 von 22
Jetzt herunterladen
Empfohlen
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
Ian Mulvany
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
InnoTech
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
Fitzgerald Analytics, Inc.
Hadoop do data warehousing rules apply
Hadoop do data warehousing rules apply
DataWorks Summit
2015 04 bio it world
2015 04 bio it world
Chris Dwan
A & P Ch 6 Muscular System Lab Quiz Practice - Posterior Muscles
A & P Ch 6 Muscular System Lab Quiz Practice - Posterior Muscles
zernwoman
Ch 6 Lab quiz study practice anterior body muscles
Ch 6 Lab quiz study practice anterior body muscles
zernwoman
Axial Muscles
Axial Muscles
Kevin Young
Empfohlen
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
Ian Mulvany
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
InnoTech
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
Fitzgerald Analytics, Inc.
Hadoop do data warehousing rules apply
Hadoop do data warehousing rules apply
DataWorks Summit
2015 04 bio it world
2015 04 bio it world
Chris Dwan
A & P Ch 6 Muscular System Lab Quiz Practice - Posterior Muscles
A & P Ch 6 Muscular System Lab Quiz Practice - Posterior Muscles
zernwoman
Ch 6 Lab quiz study practice anterior body muscles
Ch 6 Lab quiz study practice anterior body muscles
zernwoman
Axial Muscles
Axial Muscles
Kevin Young
Big data introduction
Big data introduction
Chirag Ahuja
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
DataStax
Big Data in small words
Big Data in small words
Yogesh Tomar
Next Generation Hadoop Introduction
Next Generation Hadoop Introduction
Adam Muise
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
Adam Muise
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
Adam Muise
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
Adam Muise
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
MaRS Discovery District
Decoding Data Science
Decoding Data Science
Matt Fornito
Big data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Digital Reasoning
IBM Watson-How it works
IBM Watson-How it works
Virginia Fernandez
Ibm watson - how it works, and what it means for society beyond winning jeo...
Ibm watson - how it works, and what it means for society beyond winning jeo...
Rick Bouter
Watson how it works?
Watson how it works?
Ana Alves Sequeira
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
Doug Denton
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
Coert Du Plessis (杜康)
Data mining with big data
Data mining with big data
Sandip Tipayle Patil
Introduction to Big Data
Introduction to Big Data
IMC Institute
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Yahoo Developer Network
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Yahoo Developer Network
Weitere ähnliche Inhalte
Ähnlich wie May 2012 HUG: The Changing Big Data Landscape
Big data introduction
Big data introduction
Chirag Ahuja
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
DataStax
Big Data in small words
Big Data in small words
Yogesh Tomar
Next Generation Hadoop Introduction
Next Generation Hadoop Introduction
Adam Muise
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
Adam Muise
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
Adam Muise
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
Adam Muise
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
MaRS Discovery District
Decoding Data Science
Decoding Data Science
Matt Fornito
Big data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Digital Reasoning
IBM Watson-How it works
IBM Watson-How it works
Virginia Fernandez
Ibm watson - how it works, and what it means for society beyond winning jeo...
Ibm watson - how it works, and what it means for society beyond winning jeo...
Rick Bouter
Watson how it works?
Watson how it works?
Ana Alves Sequeira
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
Doug Denton
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
Coert Du Plessis (杜康)
Data mining with big data
Data mining with big data
Sandip Tipayle Patil
Introduction to Big Data
Introduction to Big Data
IMC Institute
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
Ähnlich wie May 2012 HUG: The Changing Big Data Landscape
(20)
Big data introduction
Big data introduction
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
Big Data in small words
Big Data in small words
Next Generation Hadoop Introduction
Next Generation Hadoop Introduction
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Decoding Data Science
Decoding Data Science
Big data with Hadoop - Introduction
Big data with Hadoop - Introduction
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
IBM Watson-How it works
IBM Watson-How it works
Ibm watson - how it works, and what it means for society beyond winning jeo...
Ibm watson - how it works, and what it means for society beyond winning jeo...
Watson how it works?
Watson how it works?
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
Data mining with big data
Data mining with big data
Introduction to Big Data
Introduction to Big Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Mehr von Yahoo Developer Network
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Yahoo Developer Network
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Yahoo Developer Network
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Yahoo Developer Network
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Yahoo Developer Network
CICD at Oath using Screwdriver
CICD at Oath using Screwdriver
Yahoo Developer Network
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
Yahoo Developer Network
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Yahoo Developer Network
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Yahoo Developer Network
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Yahoo Developer Network
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
Yahoo Developer Network
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
Yahoo Developer Network
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Yahoo Developer Network
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
Mehr von Yahoo Developer Network
(20)
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
CICD at Oath using Screwdriver
CICD at Oath using Screwdriver
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Kürzlich hochgeladen
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
The Digital Insurer
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Principled Technologies
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Kürzlich hochgeladen
(20)
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
May 2012 HUG: The Changing Big Data Landscape
1.
Just-In-Time Analytics Surfing the
new big data landscape Self-Service Big Data Analytics for Hadoop Matt Schumpert © 2012 Datameer, Inc. All rights reserved.
2.
Agenda Backdrop
Observations Solution Demo © 2012 Datameer, Inc. All rights reserved.
3.
Big Data Landscape Challenge
Enablers Needs Dramatic data growth Low cost storage and CPUs Democratize data access Structured and unstructured data Disruptive new technologies Crowd-source insights Scale economically Availability of cloud infrastructure Just-In-Time Delivery Maintain agility Source: Forrester © 2012 Datameer, Inc. All rights reserved.
4.
Hadoop - A
Disruptive Response Advantages Challenges Rapid Adoption Economics Raw technology, complexity Led by Yahoo, Facebook, etc Flexibility Requires significant resources Data-driven companies followed Scalability No packaged applications Fortune 500 rapidly deploying Goal Make Big Data analytics accessible to business users Shorten time-to-insight Seamless integration to all data types Low cost of ownership Demystify © 2012 Datameer, Inc. All rights reserved.
5.
Current State
Volume problem was solved with MPP DBs • TCO sometimes lower than Hadoop Variety problem is tractable with Hadoop People still struggle with velocity Time-to-insight is too high Will business agility decline? © 2012 Datameer, Inc. All rights reserved.
6.
What We’re Surfing
on... © 2012 Datameer, Inc. All rights reserved.
7.
Observations
“The Wild Wild West” • We’re in a lawless era for data formats “Amateur Night” • People re-invent crooked wheels all over their pipelines “Open Mic Night” • Dealing with data that talks too much “Social Data Gold Rush” • A rush to judgement on social media data leads to silos © 2012 Datameer, Inc. All rights reserved.
8.
1. “The Wild
Wild West” ... JSON (Twitter, Facebook, MongoDB, etc) • Not always well-formed • Difficult to split raw (backtrack to what, ‘{‘ ?) Sequence Files • Metadata is completely open-ended • Triple-packed content (Flume JSON w/ compressed files, etc.) Raw • “the delimiter of the week” ‣ u0001 (Hive) ‣ Þ (DoubleClick) • Various text encoding schemes ‣ ISO-8859 vs. UTF-8 © 2012 Datameer, Inc. All rights reserved.
9.
2. “Amateur Night...”
Naive collection strategies • e.g. 1 file per record (Facebook user) • rudimentary use of batch requests / store-and-forward Naive ingestion strategies • e.g. per minute log ingestion with no compaction --> millions of small files • Partitioning for ease-of-ingestion, not analytics ‣ e.g. create files/keys/partitions by the server of origin Naive storage Strategies • Uncompressed, all-String storage of mostly numerical fields • Shimming compressed SEQ onto big compressed files --> not splittable • Mixing compression codecs with data formats (e.g. LzoTextInputFormat) © 2012 Datameer, Inc. All rights reserved.
10.
3. “Open Mic
Night” ... Data can be verbose • e.g. repeating key/value pairs Semi-structured is the norm Deep hierarchies that explode unexpectedly • Even beyond task JVM memory (too many friends/fans!) Low Signal-to-noise ratio Content in various languages • Makes sentiment analysis tricky © 2012 Datameer, Inc. All rights reserved.
11.
Example: FB Profile {"id":"10011666","name":"Test
user","first_name":"Test","last_name":"user","link":"http://www.facebook.com/test.user","username":"test.user","birthday":"09/19/ 1983","hometown":{"id":"103102203064024","name":"West Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"I'm an honorary Sean Connery, born '83r nThere's only one of mernSingle-handedly raising the economyrnAin't no chance of the record company dropping mernPress be asking do I care for sodomyrnI don't know, yeah, probablyrnI've been looking for serial monogamyrnNot some bird that looks like Billy ConnollyrnBut for now I'm down for ornithologyrnGrab your binoculars, come follow me","quotes":"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for - in order to get to the job you need to pay for the clothes and the car, and the house you leave vacant all day so you can afford to live in it. -Ellen GoodmanrnrnThe entire economy of the Western world is built on things that cause cancer.-From the movie "Bliss"rnrnNever give a party if you will be the most interesting person there. -Mickey Friedmanrn rnAhhh. A man with a sharp wit. Someone ought to take it away from him before he cuts himself. -Peter da SilvarnrnNow it seems the music Industry's working on marketing ploys. I remember back when it wasn't about looks or color but about the voice. -Jay SeanrnrnWhy are you trying so hard to fit in, when you were born to stand out? -RandomrnrnI think if you're ready to go out with Johnny. Now's the time to tell him about your one month limit. He wont mind he'll apreciate your fresh look on dating. And once you've dated someone else you can date him again. I'm sure he'll like it. Everyone will appreciate it. You so novel what a good idea. You can keep your time to your self. You don't need date insurance.You can go out with whoever you want to. Every boy, every boy, in the whole world could be yours. If you'll just listen to my planr nTHE TEENAGE GUIDE TO POPULARITY -Nada SurfrnrnThe difference between now and the future is simply greater destruction and more universal chaos_-Stephen Hawking rnrnIn archaeology you uncover the unknown. In diplomacy you cover the known. -Thomas PickeringrnrnYou know the disease u get when u get married..Onegina -Russel PetersrnrnI saw you standing in my headlights. (Blink, blink, blink.)rnI thought I'd run you down for the weight you left on me.rnInstead I pushed rewind, reversed and drove away.rnAnd seeing you disappear in my rearview brought to me the wordrn'Reciprocity!' -IncubusrnrnFew people are capable of expressing with equanimity opinions which differ from the prejudices of their social environment. Most people are even incapable of forming such opinions. -Albert EinsteinrnrnNinety- eight percent of the adults in this country are decent, hard-working, honest Americans. It's the other lousy two percent that get all the publicity. But then--we elected them. -Lily TomlinrnrnWhen You Are Not Practicing, Remember: Someone Somewhere Is Practicing And When You Meet Him- He Will WinrnrnIf not I, who? If not here, where? If not now, when?rnrnAll that is necessary for evil to triumph is for good people to stand by and do nothing -UnknownrnrnWe are the people our parents warned us about. -Jimmy BuffettrnrnNever explain--your friends do not need it and your enemies will not believe you anyway. -Elbert HubbardrnrnMy definition of a free society is a society where it is safe to be unpopular. -Adlai E. Stevenson Jr.rnrnToo many have dispensed with generosity in order to practice charity. -Albert Camus","work": [{"employer":{"id":"6185812851","name":"American Express"},"location":{"id":"105540216147364","name":"Phoenix, Arizona"},"position": {"id":"133619273341785","name":"Lead Programmer Analyst"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac group"},"position": {"id":"105630109469647","name":"Executive Producer"},"description":"We create music for Artist Placement and TV/Film.","start_date":"2002-01"},{"employer": {"id":"6185812851","name":"American Express"},"location":{"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"116439401740213","name":"Senior Database Administrator"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint Joseph Hospital"},"location": {"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint Joseph Hospital"},"location": {"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"144703015548786","name":"Pharmacy Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams": [{"id":"87169796810","name":"Philadelphia Flyers"},{"id":"93625750491","name":"Philadelphia Phillies"},{"id":"45898408995","name":"Phoenix Suns"}, {"id":"120163518021430","name":"Philadelphia Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve Nash"},{"id":"105590659475179","name":"Wayne Gretzky"},{"id":"62975399193","name":"Michael Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school": {"id":"109324275761313","name":"Corona del Sol High School"},"type":"High School"},{"school":{"id":"23680344606","name":"Arizona State University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism (One with all things)","political":"Liberal (Left of Center)","email":"app+22c90gj. 9hh9d.f7304b58ac646e08b5f0f10a73547e34u0040proxymail.facebook.com","website":"www.slashdot.orgr nwww.gizmodo.com","timezone":-7,"locale":"en_US","languages":[{"id":"106059522759137","name":"English"}, {"id":"112969428713061","name":"Hindi"}],"verified":true,"updated_time":"2012-03-22T17:24:25+0000"} © 2012 Datameer, Inc. All rights reserved.
12.
Example: Email (MBOX)
From common-user-return-16923-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org Thu Aug 20 14:02:59 2009 Return-Path: <common-user-return-16923-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org> Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 83137 invoked from network); 20 Aug 2009 14:02:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Aug 2009 14:02:58 -0000 Received: (qmail 23328 invoked by uid 500); 20 Aug 2009 14:03:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 23266 invoked by uid 500); 20 Aug 2009 14:03:14 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: <mailto:common-user-help@hadoop.apache.org> List-Unsubscribe: <mailto:common-user-unsubscribe@hadoop.apache.org> List-Post: <mailto:common-user@hadoop.apache.org> List-Id: <common-user.hadoop.apache.org> Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 23254 invoked by uid 99); 20 Aug 2009 14:03:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 14:03:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.85.219.209] (HELO mail-ew0-f209.google.com) (209.85.219.209) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 14:03:05 +0000 Received: by ewy5 with SMTP id 5so181532ewy.36 for <common-user@hadoop.apache.org>; Thu, 20 Aug 2009 07:02:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.39.85 with SMTP id c63mr1821542web.103.1250776964866; Thu, 20 Aug 2009 07:02:44 -0700 (PDT) In-Reply-To: <597eea000908200259o8e3bd78l385059f2b5d31555@mail.gmail.com> References: <597eea000908191855v579b9c4r8baeb638630cfb27@mail.gmail.com> <e01b80590908192249s5302cd26m7984a32816c0d58c@mail.gmail.com> <597eea000908200209o176aefacjca2a45369301c296@mail.gmail.com> <e01b80590908200230x608ad35en5f372a9fd5aba325@mail.gmail.com> <597eea000908200259o8e3bd78l385059f2b5d31555@mail.gmail.com> Date: Thu, 20 Aug 2009 15:02:44 +0100 Message-ID: <ac79ea400908200702u309a4fcey9ab1a7b358f313ce@mail.gmail.com> Subject: Re: File Chunk to Map Thread Association From: Tom White <tom@cloudera.com> To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Roman, Have a look at CombineFileInputFormat - it might be related to what you are trying to do. Cheers, Tom © 2012 Datameer, Inc. All rights reserved.
13.
What do we
need? © 2012 Datameer, Inc. All rights reserved.
14.
Just-In-Time Supply Chain
Slow Expensive Expertise ETL Data Warehouse Business Intelligence © 2012 Datameer, Inc. All rights reserved.
15.
Just-In-Time Supply Chain
Slow Expensive Expertise ETL Data Warehouse Business Intelligence Fast Economical Self Service Spreadsheets+ drag ‘n drop Raw Load Hadoop “schema on read” © 2012 Datameer, Inc. All rights reserved.
16.
A “One Stop
Shop” Compressing “Time-To-Insight” Fast Self Service Raw Load Spreadsheet Drag and Drop Visualization Hadoop Economical © 2012 Datameer, Inc. All rights reserved.
17.
What We Do:
© 2012 Datameer, Inc. All rights reserved.
18.
Datameer Capabilities Seamless Data
Integration Powerful Analytics Self-Service Dashboards Wizard-based integration Interactive spreadsheet UI Drag and drop Structured, semi- and Cleansing, transformation, Powerful visualizations unstructured analysis Mash-up anything No complex mappings/schemas Over 200 built-in functions Integrate into existing portals Pluggable data integration API Pluggable function API © 2012 Datameer, Inc. All rights reserved.
19.
Data-Center Ready
© 2012 Datameer, Inc. All rights reserved.
20.
Demo...
© 2012 Datameer, Inc. All rights reserved.
21.
Q/A
© 2012 Datameer, Inc. All rights reserved.
22.
Please Download Our
Trial Edition! www.datameer.com © 2012 Datameer, Inc. All rights reserved.
Hinweis der Redaktion
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Jetzt herunterladen