SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Exploring Big and not so Big Data:
  Opportunities and Challenges

                       Juliana Freire
                  juliana.freire@nyu.edu
      Visualization and Data Analysis (ViDA) Center
                  http://bigdata.poly.edu
                         NYU Poly
Big Data: What is the Big deal?




         http://www.google.com/trends/explore#q=%22big%20data%22!


ViDA Center                                            Juliana Freire   2
Big Data: What is the Big deal?

     Many success stories
       –  Google: many billions of pages indexed, products,
          structured data
       –  Facebook: 1.1 billion users using the site each month
       –  Twitter: 517 million accounts, 250 million tweets/day
     This is changing society!




ViDA Center                                               Juliana Freire   3
Big Data: What is the Big deal?

    Smart Cities: 50% of the world population lives in
     cities
     –  Census, crime, emergency visits, cabs, public transportation,
        real estate, noise, energy, …
     –  Make cities more efficient and sustainable, and improve the
        lives of their citizens
     http://www.nyu.edu/about/university-initiatives/center-for-urban-science-progress.html

    Enable scientific discoveries: science is now data rich
     –  Petabytes of data generated each day, e.g., Australian radio
        telescopes, Large Hadron Collider
     –  Social data, e.g., Facebook, Twitter (2,380,000 and 2,880,000
        results in Google Scholar!)
    Data is currency

ViDA Center                                                                  Juliana Freire   4
Big Data: What is the Big deal?

    Smart Cities
     –  Census, crime, emergency visits, cabs, public transportation,
        real estate, noise, energy, …
     –  Make cities more efficient and sustainable, and improve the
        lives of their citizens
    Enable scientific discoveries: science is now data rich
     –  Petabytes of data generated each day, e.g., Australian radio
        telescopes, Large Hadron Collider
     –  Social data, e.g., Facebook, Twitter
    Data is currency




ViDA Center                                              Juliana Freire   5
Big Data: What is the Big deal?

                       Big data is not new: financial transactions, call
                        detail records, astronomy, …
                       What is new is that there are many more data
                        enthusiasts
                       More data are widely available, e.g.,and Halperin, DEB 2012
                                                   Plot from Howe Web, data.gov,
      data volumes, % IT investment




                            Astronomy
                        scientific data
                       Computing is cheap and easy to access
                              Physics

                                      –  Server with 64 cores, 512GB RAM ~$11k
                                      –  ClusterMedicine1000 cores ~$150k
                                                 with
                                      –  Pay as you go: Amazon EC2
                                                 Geosciences                                                2020
                                                               Microbiology   Chemistry   Social Sciences
                                                                                                            2010

                                                                rank
ViDA Center                                                                                      Juliana Freire    6
Big Data: What is the Big deal?

     Big data is not new: financial transactions, call
      detail records, astronomy, …
     What is new is that there are many more data
      enthusiasts
     More data are widely available, e.g., Web, data.gov,
      scientific data, social and urban data
     Computing is cheap and easy to access
       –  Server with 64 cores, 512GB RAM ~$11k
       –  Cluster with 1000 cores ~$150k
       –  Pay as you go: Amazon EC2




ViDA Center                                       Juliana Freire   7
Big Data: What is hard?
     Scalability is not the problem…
     Usability is the Big issue


                algorithms       data visual encodings
                    technology      user interfaces
              statistics provenance interaction modes
                                math
               machine learning      data management



  data                                                   knowledge
ViDA Center                                                 Juliana Freire   8
algorithms     data visual encodings
          technology       user interfaces
   statistics    provenance interaction modes
                        math
       machine learning      data management



data                                           knowledge
                 Exploring data is hard
algorithms     data visual encodings
          technology       user interfaces
   statistics    provenance interaction modes
                        math
       machine learning      data management



data                                           knowledge
                Exploring data is hard,
            regardless of whether the data
                    is big or small
Case Study: Studying Cab Trips in NYC

 Prepare data for analysis

     Raw data for 2011 63 GB
       –  24 csv files, 2 csv files for each month - one for trip data,
          and snother for fare data
       –  ~170M trips
     Cleaning
       –  ~60,000 fare records do not have trip records
       –  ~200 duplicates per month




ViDA Center                                                  Juliana Freire   11
Storage Solutions: Temporal Queries

     SQLite                      Custom storage
       –  20 GB of storage         –  12 GB of storage (in-
          (index on                   memory binary search
          pickup_time)                instead of index)
       –  Ordered queries:         –  Ordered queries: 0.6s
          9.39s                    –  Reverse ordered
       –  Reverse ordered             queries: 1.4s
          queries: 9.41s           –  Shuffled queries: 1.2s
       –  Shuffled queries:
          9.37s



ViDA Center                                       Juliana Freire   12
Storage Solutions: Spatial-Temporal

      All trips for a week in a given region
      All trips in a week for a given taxi
      All trips in a week for a given taxi in a
       given region

                 Needs a complex indexing scheme that
              combines spatial, temporal, and taxi id searches




ViDA Center                                              Juliana Freire   13
Storage Solutions: Spatial-Temporal

     SQLite                         Custom storage
       –  20+10 GB of storage         (ours)
          (index on time and          –  12+4 GB of storage
          id, r-tree for                 (using (4d) kd-tree
          coordinates)                   on time, id and
       –  Creating indexes:              coordinates)
          52hrs                       –  Building kd-tree: 8
       –  Range queries: 2.1s            mins
       –  Combined queries:           –  Range queries: 0.2s
          15.3s                       –  Combined queries:
       –  Cross-table queries:           0.2s
          57s                         –  Cross-table queries:
                                         2s

ViDA Center                                         Juliana Freire   14
Summary Statistics

        13,237 Medallion Cabs           Analysis/Modeling
        42,000 Taxi Drivers
        Average Number of Rides: 485k/day
        Average Number of Passengers: 660k/day

                                 Rides in 2011
590k




 29k     Jan   Feb   Mar   Apr   May   Jun   Jul   Aug    Sep   Oct      Nov       Dec

                       Apr 2                             Aug 28               Dec 25
ViDA Center
                       Apr 3                             Irene        Juliana Freire     15
Weekly Patterns

                0h
                     Rides per Hour June 2011




Between
5k and 35k
rides/hour
                                                  Night Life!



                                                   Rides at
                                                   Midnight


   Analysis/
   Modeling
                         0h




                                             0h
                              0h




                                   0h




                                        0h




  ViDA Center                                     Juliana Freire   16
TLCVis




ViDA Center   Juliana Freire   17
Drop-offs vs. Pickups

                                                  Drop-off

                                                   Pickup

                            Most of the drop-
                            off’s occur on the
                            avenues while
                            most of the pick-
                            up’s occur on the
                            streets



ViDA Center                              Juliana Freire   18
Studying Anomalies

                      Sunday, May 1st 2011
      4:00AM-4:30AM      6:00AM-6:30AM       8:00AM-8:30AM




ViDA Center                                       Juliana Freire   19
Studying Anomalies
                        Sunday, May 1st 2011
        4:00AM-4:30AM       6:00AM-6:30AM      8:00AM-8:30AM




ViDA Center                                        Juliana Freire   20
Studying Anomalies
                         Sunday, May 1st 2011
              8:00AM-8:30AM                9:30AM-10:00AM




ViDA Center                                          Juliana Freire   21
Studying Anomalies                                Interpretation

                         Sunday, May 1st 2011
              8:00AM-8:30AM                  9:30AM-10:00AM

                              Five Borough
                                Bike Tour




ViDA Center                                            Juliana Freire   22
Studying Anomalies


                         Sunday May 1st
                              2011
                        07:00AM-08:00AM




ViDA Center                   Juliana Freire   23
Studying Anomalies


                         Sunday May 1st
                              2011
                        08:00AM-10:00AM




ViDA Center                   Juliana Freire   24
Studying Anomalies


                         Sunday May 1st
                              2011
                        10:00AM-11:00AM




ViDA Center                   Juliana Freire   25
Studying Patterns


                         May 1st – May 7th
                              2011
                         3.6 Million Trips



       Compare
   movement in the
  airports against the
  large train stations
ViDA Center                    Juliana Freire   26
Studying Patterns




    Train Stations
    Airports
 May 1st – May
       7th 2011
     3.6 Million
          Trips
ViDA Center           Juliana Freire   27
Studying Patterns




    Train Stations
    Airports
 May 1st – May
       7th 2011
     3.6 Million
          Trips
ViDA Center           Juliana Freire   28
Data exploration reveals bad data…




ViDA Center                        Juliana Freire   29
Uses of Clean Data: FindMeACab App




ViDA Center                      Juliana Freire   30
Take Away

     Data exploration is challenging for both small and
      big data
     It is hard to prepare data for exploration
     For many tasks, existing tools are either too
      cumbersome, not scalable, etc.
     Need better, usable tools
       –  Tools for data enthusiasts who are not computer scientists!
     Visualization is essential for exploring large volumes
      of data --- “A picture is worth a thousand words’’
     Pictures help us think [Tamara Munzner]
       –  Substitute perception for cognition
       –  Free up limited cognitive/memory resources for higher-
          level problems
ViDA Center                                               Juliana Freire   31
Masters in Big Data

     New degree at NYU Poly – Spring 2014
     Courses:
       –      Machine learning
       –      Massive data analysis
       –      Visualization
       –      Visual Analytics
       –      Database Systems
       –      Algorithms
       –      …




ViDA Center                                  Juliana Freire   32
Thanks

Weitere ähnliche Inhalte

Ähnlich wie Juliana Freire PPT

Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptSangrangBargayary3
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
UNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfUNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfNancykumari47
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
IDEAS 2013 Presentation
IDEAS 2013 PresentationIDEAS 2013 Presentation
IDEAS 2013 PresentationMuntazir Mehdi
 

Ähnlich wie Juliana Freire PPT (20)

DBMS
DBMSDBMS
DBMS
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Unit 1
Unit 1Unit 1
Unit 1
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
UNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfUNIT2-Data Mining.pdf
UNIT2-Data Mining.pdf
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Dwdm
DwdmDwdm
Dwdm
 
IDEAS 2013 Presentation
IDEAS 2013 PresentationIDEAS 2013 Presentation
IDEAS 2013 Presentation
 

Mehr von Laura Manley

Citizen cyberscience for gov30 final
Citizen cyberscience for gov30 finalCitizen cyberscience for gov30 final
Citizen cyberscience for gov30 finalLaura Manley
 
Smart disclosure ppt
Smart disclosure pptSmart disclosure ppt
Smart disclosure pptLaura Manley
 
Michael holland ppt
Michael holland pptMichael holland ppt
Michael holland pptLaura Manley
 
Rootstock: Local Community Service Goes Global
Rootstock: Local Community Service Goes GlobalRootstock: Local Community Service Goes Global
Rootstock: Local Community Service Goes GlobalLaura Manley
 
Rootstock Pitch Deck v3
Rootstock Pitch Deck v3Rootstock Pitch Deck v3
Rootstock Pitch Deck v3Laura Manley
 
Rootstock Pitch Deck (v2:Images)
Rootstock Pitch Deck (v2:Images)Rootstock Pitch Deck (v2:Images)
Rootstock Pitch Deck (v2:Images)Laura Manley
 
Rootstock Pitch Deck (v1)
Rootstock Pitch Deck (v1)Rootstock Pitch Deck (v1)
Rootstock Pitch Deck (v1)Laura Manley
 

Mehr von Laura Manley (8)

Citizen cyberscience for gov30 final
Citizen cyberscience for gov30 finalCitizen cyberscience for gov30 final
Citizen cyberscience for gov30 final
 
Smart disclosure ppt
Smart disclosure pptSmart disclosure ppt
Smart disclosure ppt
 
Michael holland ppt
Michael holland pptMichael holland ppt
Michael holland ppt
 
Rootstock: Local Community Service Goes Global
Rootstock: Local Community Service Goes GlobalRootstock: Local Community Service Goes Global
Rootstock: Local Community Service Goes Global
 
Rootstock Pitch Deck v3
Rootstock Pitch Deck v3Rootstock Pitch Deck v3
Rootstock Pitch Deck v3
 
Funding models
Funding modelsFunding models
Funding models
 
Rootstock Pitch Deck (v2:Images)
Rootstock Pitch Deck (v2:Images)Rootstock Pitch Deck (v2:Images)
Rootstock Pitch Deck (v2:Images)
 
Rootstock Pitch Deck (v1)
Rootstock Pitch Deck (v1)Rootstock Pitch Deck (v1)
Rootstock Pitch Deck (v1)
 

Juliana Freire PPT

  • 1. Exploring Big and not so Big Data: Opportunities and Challenges Juliana Freire juliana.freire@nyu.edu Visualization and Data Analysis (ViDA) Center http://bigdata.poly.edu NYU Poly
  • 2. Big Data: What is the Big deal? http://www.google.com/trends/explore#q=%22big%20data%22! ViDA Center Juliana Freire 2
  • 3. Big Data: What is the Big deal?   Many success stories –  Google: many billions of pages indexed, products, structured data –  Facebook: 1.1 billion users using the site each month –  Twitter: 517 million accounts, 250 million tweets/day   This is changing society! ViDA Center Juliana Freire 3
  • 4. Big Data: What is the Big deal?   Smart Cities: 50% of the world population lives in cities –  Census, crime, emergency visits, cabs, public transportation, real estate, noise, energy, … –  Make cities more efficient and sustainable, and improve the lives of their citizens http://www.nyu.edu/about/university-initiatives/center-for-urban-science-progress.html   Enable scientific discoveries: science is now data rich –  Petabytes of data generated each day, e.g., Australian radio telescopes, Large Hadron Collider –  Social data, e.g., Facebook, Twitter (2,380,000 and 2,880,000 results in Google Scholar!)   Data is currency ViDA Center Juliana Freire 4
  • 5. Big Data: What is the Big deal?   Smart Cities –  Census, crime, emergency visits, cabs, public transportation, real estate, noise, energy, … –  Make cities more efficient and sustainable, and improve the lives of their citizens   Enable scientific discoveries: science is now data rich –  Petabytes of data generated each day, e.g., Australian radio telescopes, Large Hadron Collider –  Social data, e.g., Facebook, Twitter   Data is currency ViDA Center Juliana Freire 5
  • 6. Big Data: What is the Big deal?   Big data is not new: financial transactions, call detail records, astronomy, …   What is new is that there are many more data enthusiasts   More data are widely available, e.g.,and Halperin, DEB 2012 Plot from Howe Web, data.gov, data volumes, % IT investment Astronomy scientific data   Computing is cheap and easy to access Physics –  Server with 64 cores, 512GB RAM ~$11k –  ClusterMedicine1000 cores ~$150k with –  Pay as you go: Amazon EC2 Geosciences 2020 Microbiology Chemistry Social Sciences 2010 rank ViDA Center Juliana Freire 6
  • 7. Big Data: What is the Big deal?   Big data is not new: financial transactions, call detail records, astronomy, …   What is new is that there are many more data enthusiasts   More data are widely available, e.g., Web, data.gov, scientific data, social and urban data   Computing is cheap and easy to access –  Server with 64 cores, 512GB RAM ~$11k –  Cluster with 1000 cores ~$150k –  Pay as you go: Amazon EC2 ViDA Center Juliana Freire 7
  • 8. Big Data: What is hard?   Scalability is not the problem…   Usability is the Big issue algorithms data visual encodings technology user interfaces statistics provenance interaction modes math machine learning data management data knowledge ViDA Center Juliana Freire 8
  • 9. algorithms data visual encodings technology user interfaces statistics provenance interaction modes math machine learning data management data knowledge Exploring data is hard
  • 10. algorithms data visual encodings technology user interfaces statistics provenance interaction modes math machine learning data management data knowledge Exploring data is hard, regardless of whether the data is big or small
  • 11. Case Study: Studying Cab Trips in NYC Prepare data for analysis   Raw data for 2011 63 GB –  24 csv files, 2 csv files for each month - one for trip data, and snother for fare data –  ~170M trips   Cleaning –  ~60,000 fare records do not have trip records –  ~200 duplicates per month ViDA Center Juliana Freire 11
  • 12. Storage Solutions: Temporal Queries   SQLite   Custom storage –  20 GB of storage –  12 GB of storage (in- (index on memory binary search pickup_time) instead of index) –  Ordered queries: –  Ordered queries: 0.6s 9.39s –  Reverse ordered –  Reverse ordered queries: 1.4s queries: 9.41s –  Shuffled queries: 1.2s –  Shuffled queries: 9.37s ViDA Center Juliana Freire 12
  • 13. Storage Solutions: Spatial-Temporal   All trips for a week in a given region   All trips in a week for a given taxi   All trips in a week for a given taxi in a given region Needs a complex indexing scheme that combines spatial, temporal, and taxi id searches ViDA Center Juliana Freire 13
  • 14. Storage Solutions: Spatial-Temporal   SQLite   Custom storage –  20+10 GB of storage (ours) (index on time and –  12+4 GB of storage id, r-tree for (using (4d) kd-tree coordinates) on time, id and –  Creating indexes: coordinates) 52hrs –  Building kd-tree: 8 –  Range queries: 2.1s mins –  Combined queries: –  Range queries: 0.2s 15.3s –  Combined queries: –  Cross-table queries: 0.2s 57s –  Cross-table queries: 2s ViDA Center Juliana Freire 14
  • 15. Summary Statistics   13,237 Medallion Cabs Analysis/Modeling   42,000 Taxi Drivers   Average Number of Rides: 485k/day   Average Number of Passengers: 660k/day Rides in 2011 590k 29k Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Apr 2 Aug 28 Dec 25 ViDA Center Apr 3 Irene Juliana Freire 15
  • 16. Weekly Patterns 0h Rides per Hour June 2011 Between 5k and 35k rides/hour Night Life! Rides at Midnight Analysis/ Modeling 0h 0h 0h 0h 0h ViDA Center Juliana Freire 16
  • 17. TLCVis ViDA Center Juliana Freire 17
  • 18. Drop-offs vs. Pickups Drop-off Pickup Most of the drop- off’s occur on the avenues while most of the pick- up’s occur on the streets ViDA Center Juliana Freire 18
  • 19. Studying Anomalies Sunday, May 1st 2011 4:00AM-4:30AM 6:00AM-6:30AM 8:00AM-8:30AM ViDA Center Juliana Freire 19
  • 20. Studying Anomalies Sunday, May 1st 2011 4:00AM-4:30AM 6:00AM-6:30AM 8:00AM-8:30AM ViDA Center Juliana Freire 20
  • 21. Studying Anomalies Sunday, May 1st 2011 8:00AM-8:30AM 9:30AM-10:00AM ViDA Center Juliana Freire 21
  • 22. Studying Anomalies Interpretation Sunday, May 1st 2011 8:00AM-8:30AM 9:30AM-10:00AM Five Borough Bike Tour ViDA Center Juliana Freire 22
  • 23. Studying Anomalies Sunday May 1st 2011 07:00AM-08:00AM ViDA Center Juliana Freire 23
  • 24. Studying Anomalies Sunday May 1st 2011 08:00AM-10:00AM ViDA Center Juliana Freire 24
  • 25. Studying Anomalies Sunday May 1st 2011 10:00AM-11:00AM ViDA Center Juliana Freire 25
  • 26. Studying Patterns May 1st – May 7th 2011 3.6 Million Trips Compare movement in the airports against the large train stations ViDA Center Juliana Freire 26
  • 27. Studying Patterns Train Stations Airports May 1st – May 7th 2011 3.6 Million Trips ViDA Center Juliana Freire 27
  • 28. Studying Patterns Train Stations Airports May 1st – May 7th 2011 3.6 Million Trips ViDA Center Juliana Freire 28
  • 29. Data exploration reveals bad data… ViDA Center Juliana Freire 29
  • 30. Uses of Clean Data: FindMeACab App ViDA Center Juliana Freire 30
  • 31. Take Away   Data exploration is challenging for both small and big data   It is hard to prepare data for exploration   For many tasks, existing tools are either too cumbersome, not scalable, etc.   Need better, usable tools –  Tools for data enthusiasts who are not computer scientists!   Visualization is essential for exploring large volumes of data --- “A picture is worth a thousand words’’   Pictures help us think [Tamara Munzner] –  Substitute perception for cognition –  Free up limited cognitive/memory resources for higher- level problems ViDA Center Juliana Freire 31
  • 32. Masters in Big Data   New degree at NYU Poly – Spring 2014   Courses: –  Machine learning –  Massive data analysis –  Visualization –  Visual Analytics –  Database Systems –  Algorithms –  … ViDA Center Juliana Freire 32