SlideShare ist ein Scribd-Unternehmen logo
1 von 24
CloudCon, Tuesday, October 2nd, 11am
Every Second – in over thousands of Categories
Value > Cost
                         $’s per year in incremental revenue




www.wallpapertimes.com
incremental   storage


                            Volume

                            DATA
    structured    Variety            Velocity      processing
semi-structured
                                                change
        un-structured
Analyze & Report
                                                                         Discover & Explore


       Structured                              Semi-Structured                                  Unstructured
          SQL                                      SQL++                                      Java/C++/Pig/Hive
Production Data Warehousing                Contextual-Complex Analytics                       Structure the Unstructured
 Large Concurrent User-base            Deep, Seasonal, Consumable Data Sets                        Detect Patterns




  Data Warehouse                            Data Warehouse +                                         Hadoop
                                               Behavioral



Enterprise-class System                Low End Enterprise-class System                    Commodity Hardware System



         8+PB                                     60+PB                                              40+PB
Data Growing Faster
Data


         questions later
         structure later



              (<$0.04/GB, <$80/2TB)

single HDFS instances >50PB




Value > Cost                          10
Designing for the Unknown
>85% of analytical workload is NEW & Unknown

The metrics you know are cheap

The metrics you don’t know are expensive – but high in potential ROI

Exploration & Testing are core pillars of an analytics-driven
  organization
•   Impact
Site   Key               Expansion       Top Query   Note
US     diaries           diary
US     baggies           baggy
US     cranberries       cranberry
US     jogging           jog
US     fishing sticker   fish stickers
UK     panels            panelling
UK     protection        protecter
UK     lining            lined
UK     animation         animated
UK     trucks            trucking
UK     edging            edges
UK     nets              netting
Site   Key               Expansion       Top Query                      Note
US     diaries           diary           vampire diaries
US     baggies           baggy           patagonia baggies              good for patagonia baggy, not good alone
US     cranberries       cranberry       the cranberries
US     jogging           jog             jogging stroller
US     fishing sticker   fish stickers   fishing sticker                sports vs. kids rooms
UK     panels            panelling       fence panels
UK     protection        protecter       mcafee total protection 2012   screen protecter is top US query
UK     lining            lined           pink lining changing bag
UK     animation         animated        animation cel
UK     trucks            trucking        corgi trucks
UK     edging            edges           garden edging
UK     nets              netting         purse nets
Value > Cost
                         $’s per year in incremental revenue




www.wallpapertimes.com
Toys and Hobbies
ATC   >   Artist trading card   in ART
ATC   >   Automatic Tool Change in Business and Industrial
German Compound Words
 •   German compound words can be arbitrarily created and extremely long
         Adidastrainingsanzug (Adidas track suit)
         Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
                  (beef labeling regulation & delegation of supervision law)
 •   Syntactically, words can be combined and split in many ways.
 •   Some words shouldn’t be de-compounded.
         beiden (both) – bei(at) den(the)
 •   Too many candidates for
         Granitpflastersteine (granite paving stones)
         Granit(granite) pflastersteine(cobblestones)
         Granit(granite) pflaster(paving/band-aid) steine(stones)
 •   Binding characters
     Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de)
     Hochzeitschuhe (129 hits on ebay.de).
Synonyms derived from top queries in item query clusters
texas instruments ba ii plus     ti ba ii plus
brighton handbag                 brighton purse
lenovo x200                      thinkpad x200
king bedspread                   king coverlet
rockabilly dress                 swing dress
1963 ford falcon                 63 falcon
jessica simpson hair extensions  jessica simpson hairdo

        Abbreviations/acronym derived from query transitions
stanford ky                      stanford kentucky
dc sub                           dc subwoofer
snowboard helmet l               snowboard helmet large
motorcycle cam                   motorcycle camera
diamond amp                      diamond amplifier
The New Alchemy Turning Data into Gold

Weitere ähnliche Inhalte

Ähnlich wie The New Alchemy Turning Data into Gold

High Availability Websites: part one
High Availability Websites: part oneHigh Availability Websites: part one
High Availability Websites: part oneAmazon Web Services
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...Amazon Web Services
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big DataPierre De Wilde
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingSean Cribbs
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard CloudInes Sombra
 
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Marc Villemade
 
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliNoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliSteve Maraspin
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computingdeimos
 
Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...
Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...
Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...Amazon Web Services
 
Introduction to Artificial Intelligence and Machine Learning services at AWS ...
Introduction to Artificial Intelligence and Machine Learning services at AWS ...Introduction to Artificial Intelligence and Machine Learning services at AWS ...
Introduction to Artificial Intelligence and Machine Learning services at AWS ...Amazon Web Services
 
Games + Amazon = Love - Presentation quo vadis 2011
Games + Amazon = Love - Presentation quo vadis 2011Games + Amazon = Love - Presentation quo vadis 2011
Games + Amazon = Love - Presentation quo vadis 2011Thomas Lobinger
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
Re-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayRe-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayDATAVERSITY
 
8 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp018 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp01Carl Chesal
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationSteven Francia
 
Building High-availability Websites on AWS
Building High-availability Websites on AWSBuilding High-availability Websites on AWS
Building High-availability Websites on AWSAmazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 

Ähnlich wie The New Alchemy Turning Data into Gold (20)

High Availability Websites: part one
High Availability Websites: part oneHigh Availability Websites: part one
High Availability Websites: part one
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big Data
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf Training
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard Cloud
 
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
 
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliNoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
 
Jeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud ComputingJeff Barr Amazon Services Cloud Computing
Jeff Barr Amazon Services Cloud Computing
 
Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...
Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...
Deep Dive on Data Archiving in Amazon S3 & Amazon Glacier, with Special Guest...
 
Introduction to Artificial Intelligence and Machine Learning services at AWS ...
Introduction to Artificial Intelligence and Machine Learning services at AWS ...Introduction to Artificial Intelligence and Machine Learning services at AWS ...
Introduction to Artificial Intelligence and Machine Learning services at AWS ...
 
Games + Amazon = Love - Presentation quo vadis 2011
Games + Amazon = Love - Presentation quo vadis 2011Games + Amazon = Love - Presentation quo vadis 2011
Games + Amazon = Love - Presentation quo vadis 2011
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Re-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw AwayRe-inventing the Database: What to Keep and What to Throw Away
Re-inventing the Database: What to Keep and What to Throw Away
 
8 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp018 mattwoodaws-intro-pdf-110411093115-phpapp01
8 mattwoodaws-intro-pdf-110411093115-phpapp01
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combination
 
Building High-availability Websites on AWS
Building High-availability Websites on AWSBuilding High-availability Websites on AWS
Building High-availability Websites on AWS
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 

Mehr von exponential-inc

Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013exponential-inc
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012exponential-inc
 
Keynote Address at 2013 CloudCon: A day in the life of the SMB by Michael To...
Keynote Address at 2013 CloudCon: A day in the life of the SMB  by Michael To...Keynote Address at 2013 CloudCon: A day in the life of the SMB  by Michael To...
Keynote Address at 2013 CloudCon: A day in the life of the SMB by Michael To...exponential-inc
 
Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...
Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...
Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...exponential-inc
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Dataexponential-inc
 
CloudCon 2012 Keynote Address
CloudCon 2012 Keynote AddressCloudCon 2012 Keynote Address
CloudCon 2012 Keynote Addressexponential-inc
 
Future Cloud Infrastructure
Future Cloud InfrastructureFuture Cloud Infrastructure
Future Cloud Infrastructureexponential-inc
 

Mehr von exponential-inc (8)

Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013Delivering Big Data - By Rod Smith at the CloudCon 2013
Delivering Big Data - By Rod Smith at the CloudCon 2013
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
 
Keynote Address at 2013 CloudCon: A day in the life of the SMB by Michael To...
Keynote Address at 2013 CloudCon: A day in the life of the SMB  by Michael To...Keynote Address at 2013 CloudCon: A day in the life of the SMB  by Michael To...
Keynote Address at 2013 CloudCon: A day in the life of the SMB by Michael To...
 
Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...
Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...
Keynote Address at 2013 CloudCon: Future of Enterprise IT: Manage Cloud Spraw...
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
CloudCon 2012 Keynote Address
CloudCon 2012 Keynote AddressCloudCon 2012 Keynote Address
CloudCon 2012 Keynote Address
 
Future Cloud Infrastructure
Future Cloud InfrastructureFuture Cloud Infrastructure
Future Cloud Infrastructure
 

The New Alchemy Turning Data into Gold

  • 2. Every Second – in over thousands of Categories
  • 3. Value > Cost $’s per year in incremental revenue www.wallpapertimes.com
  • 4.
  • 5.
  • 6. incremental storage Volume DATA structured Variety Velocity processing semi-structured change un-structured
  • 7. Analyze & Report Discover & Explore Structured Semi-Structured Unstructured SQL SQL++ Java/C++/Pig/Hive Production Data Warehousing Contextual-Complex Analytics Structure the Unstructured Large Concurrent User-base Deep, Seasonal, Consumable Data Sets Detect Patterns Data Warehouse Data Warehouse + Hadoop Behavioral Enterprise-class System Low End Enterprise-class System Commodity Hardware System 8+PB 60+PB 40+PB
  • 9.
  • 10. Data questions later structure later (<$0.04/GB, <$80/2TB) single HDFS instances >50PB Value > Cost 10
  • 11. Designing for the Unknown >85% of analytical workload is NEW & Unknown The metrics you know are cheap The metrics you don’t know are expensive – but high in potential ROI Exploration & Testing are core pillars of an analytics-driven organization
  • 12. Impact
  • 13.
  • 14. Site Key Expansion Top Query Note US diaries diary US baggies baggy US cranberries cranberry US jogging jog US fishing sticker fish stickers UK panels panelling UK protection protecter UK lining lined UK animation animated UK trucks trucking UK edging edges UK nets netting
  • 15. Site Key Expansion Top Query Note US diaries diary vampire diaries US baggies baggy patagonia baggies good for patagonia baggy, not good alone US cranberries cranberry the cranberries US jogging jog jogging stroller US fishing sticker fish stickers fishing sticker sports vs. kids rooms UK panels panelling fence panels UK protection protecter mcafee total protection 2012 screen protecter is top US query UK lining lined pink lining changing bag UK animation animated animation cel UK trucks trucking corgi trucks UK edging edges garden edging UK nets netting purse nets
  • 16.
  • 17.
  • 18. Value > Cost $’s per year in incremental revenue www.wallpapertimes.com
  • 19.
  • 20.
  • 21. Toys and Hobbies ATC > Artist trading card in ART ATC > Automatic Tool Change in Business and Industrial
  • 22. German Compound Words • German compound words can be arbitrarily created and extremely long Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz (beef labeling regulation & delegation of supervision law) • Syntactically, words can be combined and split in many ways. • Some words shouldn’t be de-compounded. beiden (both) – bei(at) den(the) • Too many candidates for Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones) • Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).
  • 23. Synonyms derived from top queries in item query clusters texas instruments ba ii plus ti ba ii plus brighton handbag brighton purse lenovo x200 thinkpad x200 king bedspread king coverlet rockabilly dress swing dress 1963 ford falcon 63 falcon jessica simpson hair extensions jessica simpson hairdo Abbreviations/acronym derived from query transitions stanford ky stanford kentucky dc sub dc subwoofer snowboard helmet l snowboard helmet large motorcycle cam motorcycle camera diamond amp diamond amplifier

Hinweis der Redaktion

  1. I work at eBay, every second…BLANK SLIDEGrocery store – 2 cans of soupPoint of No Return – people haven’t changed, motivation still the same, everyone loves free, don’t waste hard earned resources, make intelligent decisions – technology HAS changed, and it is still accelerating, behavior easier to capture/analyzeSkip - Costco – Netflix – Blu-Ray – Players I considered – Players I DIDN”T consider – Person asked, data collected - WHY? – Mobile Phone – 3-5x speed of home network – blend online/offline – just commerce
  2. You are in business to make moneyHow do you know if changes you make, make moneyYou HAVE to testYou can’t manage what you don’t measureTesting is crucialImage http://www.wallpapertimes.com/files/q/Yf/4j/qYf4jp9q86379020_800x600.jpg
  3. Is my data BIG enough – who caresI don’t really care about defining how big, big is.Big is whatever you need to detail level (not aggregate) analysisImage http://www.skimountaineer.com/ROF/OcAnt/BigBen/BigBenHeardIsland.jpg
  4. Beyond aggregate dataSession level detailItem impression data – logging the items people DON’T clickWe always knew what items people clicked on (view item page log)What about the items people did NOT click on, need impression logging, they’re just as informativeLet’s bring this closer to home for youMarket basket data – buy this buy thatCowboy hats – detailed data
  5. Before we talk about the systems we have in place, let’s take a look at what happens in the industry and describe the buzz word of the year – Big Data.A big data warehouse is a data warehouse that is a magnitude bigger than the one you have. So just the data volume is no what Big Data is about. The key change is the form of the data and its processing requirements. Since 2003 there is more data processed in 2 days than what human mankind has produced in the last 40.000 years. The rise of the machines!Classical data warehousing stores data attributes in columns, nicely separated by the source application, or the ETL process. Data that is usually generated by direct user interactions and clearly defined transactions. The big boost in data volume comes from new data types like free form text, audio, video, pictures, and graphs that do not easily fit into the structures of a database, or pose quite some challenge on the processing of it. The third key characteristic of Big Data is the velocity, both in regards to speed of processing as well as speed of change. Initial use cases of Hadoop like spam filtering imply real time processing combined with tremendous amounts of data.With this in mind now, let’s look at what analytics systems we have in place today.
  6. What do we have at eBayDW for analysts comfortable with SQL &amp; reportingHadoop for developersYou don’t have to do everything all at once, start and evolve
  7. Data is growingLand it ONCEAdd moore’s law graphicGet back up data for data rate changeJeff H slides?Google VP Marissa Mayer made last August 2009, &quot;The Physics of Data,&quot; Mayer noted that there have been three big changes to Internet data in recent times:Speed (real-time data);Scale (&quot;unprecedented processing power&quot;);Sensors (&quot;new kinds of data&quot;).Mayer went on to say that there were 5 exabytes of data online in 2002, which had risen to 281 exabytes in 2009. That&apos;s a growth rate of 56 times over seven years. Partly, she said, this has been the result of people uploading more data. Mayer said that the average person uploaded 15 times more data in 2009 than they did in 2006.http://blog.appro.com/the-big-data-challenge-for-data-intensive-computing-applications/http://www.enterpriseirregulars.com/40616/the-enterprise-opportunity-of-big-data-closing-the-clue-gap/http://www.ameinfo.com/231603.htmlhttp://www.f5.com/images/news-press-events/data-growth-monster.pnghttp://www.veecom.co.uk/2010/the-difficulties-of-streaming-video-over-3g/http://www.kurzweilai.net/the-law-of-accelerating-returnshttp://techcrunch.com/2010/03/16/big-data-freedom/
  8. Data is growingLand it ONCEAdd moore’s law graphicGet back up data for data rate changeJeff H slides?Google VP Marissa Mayer made last August 2009, &quot;The Physics of Data,&quot; Mayer noted that there have been three big changes to Internet data in recent times:Speed (real-time data);Scale (&quot;unprecedented processing power&quot;);Sensors (&quot;new kinds of data&quot;).Mayer went on to say that there were 5 exabytes of data online in 2002, which had risen to 281 exabytes in 2009. That&apos;s a growth rate of 56 times over seven years. Partly, she said, this has been the result of people uploading more data. Mayer said that the average person uploaded 15 times more data in 2009 than they did in 2006.http://blog.appro.com/the-big-data-challenge-for-data-intensive-computing-applications/http://www.enterpriseirregulars.com/40616/the-enterprise-opportunity-of-big-data-closing-the-clue-gap/http://www.ameinfo.com/231603.htmlhttp://www.f5.com/images/news-press-events/data-growth-monster.pnghttp://www.veecom.co.uk/2010/the-difficulties-of-streaming-video-over-3g/http://www.kurzweilai.net/the-law-of-accelerating-returnshttp://techcrunch.com/2010/03/16/big-data-freedom/
  9. Let me summarize before search behavioral data I work with to show you how you can use these principles to analyze your data
  10. Would you throw away money?Collect data, what seems big and expensive today will be be cheap and valuable tomorrow. Don’t throw good data away.
  11. Embed analytics in your businessMake it easyAgile Analytics – is ability to support analytical requirements in a TIMELY manner, irrespective of the their complexity.Enable business agility vs development agilityAgile Analytics enables business to quickly and accurately make decisions.Image from http://jonmell.co.uk/enterprise-20-enables-business-agility/
  12. Documents not enough anymoreNeed behavioral data – Yandex beating Google in Russia, why, they have users, refrigerators in Moscow vs. isolated small town
  13. 1 week &gt; 6 months, 50 GB &gt; 100 TB, related search collaborative &gt; collaborative + success + NLP + overlap/partition + …
  14. Get started
  15. You are in business to make moneyHow do you know if changes you make, make moneyYou HAVE to testYou can’t manage what you don’t measureTesting is crucialImage http://www.wallpapertimes.com/files/q/Yf/4j/qYf4jp9q86379020_800x600.jpg
  16. How do we do thisSimple counting – that’s it, you “just” have to countImage http://www.csie.ntnu.edu.tw/~u91029/Matching.html
  17. Detail mattersContext is important
  18. &quot;beef labeling regulation &amp; delegation of supervision law” - long word
  19. Synonyms for example…Wordle http://www.wordle.net/show/wrdl/4067504/biglarge bigample, sizeableastronomic, astronomical, galacticbear-sizedblown-upbroad, spacious, widebulkycapaciouscolossal, prodigious, stupendousdeepdoubleenormous, tremendouscosmicelephantine, gargantuan, giant, jumboepic, heroic,extensive, extendedgigantic, mammothgreatgrandhuge, immense, vast, Brobdingnagianhulking, humongous, banging, thumping, whopping, wallopingking-sizelarge-scalelife-size,macroscopicmacromassive, monolithic, monumentalmassivemonstrousmountainousoutsize, outsized, oversize,supertitanicvoluminouswhacking