SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Data Science: An Emerging
Field for Future Jobs
Jian Qin
School of Information Studies
Syracuse University

A presentation for the Graduate School, Syracuse University
February 22, 2013
DS	

   Talk points
        ›  Data science (DS) and data scientists in the context of
           research data
        ›  Implications and expectations of future research workforce
        ›  Preparing for the challenges and opportunities




                                         GRADUATION SCHOOL PRESENTATION 2013-2-22   2
DS	

   Feeling the pressure
        of data deluge in the
        digital information
        world …




        http://readwrite.com/2011/11/17/
        infographic-data-deluge---8-ze

                                           GRADUATION SCHOOL PRESENTATION 2013-2-22   3
DS	

       …in science research




             http://www.sciencemag.org/content/
             331/6018.cover-expansion


        GRADUATION SCHOOL PRESENTATION 2013-2-22   4
…in our health care
DS	





        http://ars.els-cdn.com/content/image/1-s2.0-S1053811905002508-gr4.jpg

                                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   5
http://www.redfin.com/homes-for-sale#!
                                        market=boston&region_id=112&region_
        …in our neighborhood            type=1&v=8
DS	





                               GRADUATION SCHOOL PRESENTATION 2013-2-22   6
Shift in Science Paradigms
DS	

        Thousand         A few hundred       A few decades                     Today
             years ago          years ago              ago




                                                             Data exploration (eScience)
                                                              unify theory, experiment, and
                                                                        simulation
                                             A computational -- Data captured by
                                                 approach    instruments or generated by
                                                 simulating  simulator
                               Theoretical        complex    -- Processed by software
                                 branch         phenomena    -- Information/Knowledge
                              using models,                  stored in computer
                             generalizations                 -- Scientist analyzes
          Science was                                        database/files using data
           empirical                                         management and statistics
        describing natural         Gray, J. & Szalay, A. (2007). eScience – A transformed scientific method.
          phenomena                http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
           2/22/13 13:54
                                                                 GRADUATION SCHOOL PRESENTATION 2013-2-22           7
DS	

        Research data collections
                     Size          Metadata                      Management
                                   Standards

                     Larger,          Multiple,                        Organized
                    discipline-    comprehensive                    Institutionalized,
                      based




                                                                          Heroic
                                                                        individual
                  Smaller, team-      None or                           inside the
                     based            random                               team

                                   GRADUATION SCHOOL PRESENTATION 2013-2-22          8
Emerging concepts
DS	

                  that are going to stay and
                      matter to your career




                              GRADUATION SCHOOL PRESENTATION 2013-2-22   9
What is data science?
DS	




                     “An emerging area of work
                   concerned with the collection,
                presentation, analysis, visualization,
                 management, and preservation of
                  large collections of information.”


                    Stanton, J. (2012). Introduction to Data Science.
                    http://ischool.syr.edu/media/documents/2012/3/
                                DataScienceBook1_1.pdf
                                             GRADUATION SCHOOL PRESENTATION 2013-2-22   10
DS	

   Data science and scientific research

    Management domain                                    Technical domain
    Plan, design, consult                                    Ingest, store,
     for, implement, and                                  organize, merge,
        evaluate data                                   filter, and transform
    management projects                                    data and create
         and services                                   analysis-ready data




                                 GRADUATION SCHOOL PRESENTATION 2013-2-22   11
Data management is essential
       DS	

                                Laboratory Data                         Data Modeling/
                                            Management Specialist                   Management Specialist
Scientific Data Management                  •  Administer operational database      •  Work closely with the high
Specialist                                  •  Assure the quality of data              performance computing and
•     Design, develop, implement, and          database content                        the IT manager
      manage high-throughput automatic      •  Interact closely with researchers,   •  Develop a data model for
      data processing infrastructure for       lab managers, and platform              complex multi-scale rocks
      large databases in a mature system       coordinators                         •  Design and organize a
•     Develop and improve the               •  Track deliverables against budget       database and complex
      infrastructure supporting this system    and prepare data reports                queries
•     Interface with multiple data          •  Collaborate closely with IT and      •  Integrate and mange multi-
      providers to design, build, and          bioinformatics colleagues               scale rocks subjected to
      maintain their customized databases •  Assist IT in gathering workflow           large-scale scientific
•     Clarify requirements, feature            requirements                            computing applications
      requests and bug reports for software •  Test changes and updates in IT
                                               systems                                http://www.ingrainrocks.com/
      developers and assist in testing                                                data-management-specialist/
      code.                                 •  Create and maintain app
                                               documentation
     http://www.bioinformatics.org/
     forums/forum.php?forum_id=9670
                                                                  GRADUATION SCHOOL PRESENTATION 2013-2-22   12
DS	


         “We’re increasingly finding data in
        the wild, and data scientists are
        involved with gathering data,
        massaging it into a tractable form,
        making it tell its story, and presenting
        that story to others.”
          Loukides, M. (2011). What is data science? Sebastopol, CA: O’Reilly.



                                                  GRADUATION SCHOOL PRESENTATION 2013-2-22   13
DS	

   Emerging job market: Data scientists
        ›  Data scientists are more likely to be involved across the
           data lifecycle:
           –  Acquiring new data sets: 33%
           –  Parsing data sets: 29%
           –  Filtering and organizing data: 40%
           –  Mining data for patterns: 30%
           –  Advanced algorithms to solve analytical problems: 29%
           –  Representing data visually: 38%
           –  Telling a story with data: 34%
           –  Interacting with data dynamically: 37%
           –  Making business decisions based on data: 40%
    http://mashable.com/2012/01/13/career-of-
                                                GRADUATION SCHOOL PRESENTATION 2013-2-22   14
    the-future-data-scientist-infographic/
Are you ready for the data
        challenges and opportunities?


DS	

                   GRADUATION SCHOOL PRESENTATION 2013-2-22   15
Ability to use a       Knowledge
                                                                  Data
DS	

      wide variety         of a subject
                                                                modeling,
             tools for            domain
         documentation,                                       database and
          analysis, and                                       query design
          report of data



                                       Data                               OS,
             Collaboration,
            communication,
                                     scientists                       Programming
                                                                       languages
                and co-
              ordination


                              Content and                  Encoding
    What are                   repository                 languages
                                systems
    expected of data
    scientists?                                GRADUATION SCHOOL PRESENTATION 2013-2-22   16
DS	

Analytical    skills: domain modeling
   Requirement analysis
                                Interview skills, analysis and
                                generalization skills
    Workflow analysis
                                Ability to capture components and
                                sequences in workflows
      Data modeling

                                Ability to translate domain analysis
   Data transformation          into data models
     needs analysis
                                Ability to envision the data model
     Data provenance            within the larger system architecture
      needs analysis


                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   17
Analytical skills: from data sources to patterns,
DS	

   relationships, and trends
                                    Analytical tools


                     “Hacking”


                                                                        Knowledge



                                 Data
                                 products


                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   18
Data management skills: data lifecycle and
DS	

 infrastructural services

      Metadata    Encoding       Semantic         Identify                Infrastructural
      standards   language        control       management                services

     Processed, transformed, derived, calculated, … data                  •  Data source
                                                                             discovery
                                                                          •  Data curation
                      Common data format
                          Image formats
                                                                          •  Data preservation
                          Matrix formats                                  •  Data integration and
                      Microarray file formats                                mashup
                     Communication protocols                              •  Data citation,
                                                                             publication, and
                                                                             distribution
                                                                          •  Data linking and
                                                                             interoperability
                                                                          •  …
                                                   GRADUATION SCHOOL PRESENTATION 2013-2-22   19
Technology skills with excellent communication
DS	

   skills

        TECHNOLOGY SKILLS            COMMUNICATION SKILLS
        ›  Operation systems         ›  Interviews
        ›  Repository systems        ›  “Ice breaking”
        ›  Database systems          ›  Community building
        ›  Programming languages     ›  Institutionalization
        ›  Encoding languages        ›  Stakeholder buy-in
        ›  Specialized programming



                                      GRADUATION SCHOOL PRESENTATION 2013-2-22   20
GRADUATION SCHOOL PRESENTATION 2013-2-22   21
DS	

   Four tracks: choose what you are good at

               Data                                         Data storage
             analytics                                          and
                             Data Science                   management
                             core course:
                              Applied data
                                science
                               Databases
              General
              system                                            Data
            management                                      visualization
                         http://ischool.syr.edu/
                                future/cas/
                           datascience.aspx
                                      GRADUATION SCHOOL PRESENTATION 2013-2-22   22
The iSchool’s version of data science
DS	

    education
                               Ability to use a        Knowledge
                                 wide variety          of a subject                 Data
                                   tools for             domain                   modeling,
                               documentation,                                   database and
                                analysis, and                                   query design
        Eventually the          report of data

        iSchool data science
        program will build                                      Data                       OS,
                                   Collaboration,
        the foundation for        communication,
                                                              scientists               Programming
                                                                                        languages
                                      and co-
        super data                  ordination

        scientists…
                                                    Content and               Encoding
                                                     repository              languages
                                                      systems


                                                    GRADUATION SCHOOL PRESENTATION 2013-2-22     23
DS	




        Thank You!

        Questions?
                GRADUATION SCHOOL PRESENTATION 2013-2-22   24

Weitere ähnliche Inhalte

Was ist angesagt?

Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data scienceTanyaAgarwal71
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Edureka!
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensChase McMichael
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Data Science
Data ScienceData Science
Data ScienceRabin BK
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Edureka!
 
AI today and its power to transform healthcare
AI today and its power to transform healthcareAI today and its power to transform healthcare
AI today and its power to transform healthcareBonnie Cheuk
 

Was ist angesagt? (20)

Data science
Data scienceData science
Data science
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data science
Data scienceData science
Data science
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data science
 
Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...Data Science Applications | Data Science For Beginners | Data Science Trainin...
Data Science Applications | Data Science For Beginners | Data Science Trainin...
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Data Science
Data ScienceData Science
Data Science
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
 
AI today and its power to transform healthcare
AI today and its power to transform healthcareAI today and its power to transform healthcare
AI today and its power to transform healthcare
 

Andere mochten auch

Andere mochten auch (6)

Viatge
ViatgeViatge
Viatge
 
Kinerja dan daya saing indonesia
Kinerja dan daya saing indonesiaKinerja dan daya saing indonesia
Kinerja dan daya saing indonesia
 
Infrastructure, Standards, and Policies for Research Data Management
Infrastructure, Standards, and Policies for Research Data Management Infrastructure, Standards, and Policies for Research Data Management
Infrastructure, Standards, and Policies for Research Data Management
 
República bolivariana de venezuela
República bolivariana de venezuelaRepública bolivariana de venezuela
República bolivariana de venezuela
 
Presentation SmaileX.com
Presentation SmaileX.comPresentation SmaileX.com
Presentation SmaileX.com
 
Transmedia Ailove conference
Transmedia Ailove conferenceTransmedia Ailove conference
Transmedia Ailove conference
 

Ähnlich wie Data Science: An Emerging Field for Future Jobs

Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin
 
Managing and Sharing Research Data
Managing and Sharing Research DataManaging and Sharing Research Data
Managing and Sharing Research DataMartin Donnelly
 
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012sherif user group
 
Simon Hodson
Simon HodsonSimon Hodson
Simon HodsonEduserv
 
Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12ASIS&T
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-drivenJoshua Chudy
 
THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009Fiona Salvage
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 
DataViz & Future of Research - LDirks SXSWiMar12
DataViz & Future of Research - LDirks SXSWiMar12DataViz & Future of Research - LDirks SXSWiMar12
DataViz & Future of Research - LDirks SXSWiMar12Lee Dirks
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
Partnering for Research Data
Partnering for Research DataPartnering for Research Data
Partnering for Research DataLiz Lyon
 
Jeff's what isdatascience
Jeff's what isdatascienceJeff's what isdatascience
Jeff's what isdatasciencelizliddy
 
Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...
Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...
Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...Glen Newton
 
research publish journal
research publish journalresearch publish journal
research publish journalrikaseorika
 
research publish journal
research publish journalresearch publish journal
research publish journalrikaseorika
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictiveEditorJST
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkAdina Chuang Howe
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptxHakkinsRaj
 

Ähnlich wie Data Science: An Emerging Field for Future Jobs (20)

Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management
 
Managing and Sharing Research Data
Managing and Sharing Research DataManaging and Sharing Research Data
Managing and Sharing Research Data
 
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
Carmen O'Dell and Barbara Sen JIBS-RLUK event July 2012
 
Simon Hodson
Simon HodsonSimon Hodson
Simon Hodson
 
Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12Data Science - Poster - Kirk Borne - RDAP12
Data Science - Poster - Kirk Borne - RDAP12
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-driven
 
THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
DataViz & Future of Research - LDirks SXSWiMar12
DataViz & Future of Research - LDirks SXSWiMar12DataViz & Future of Research - LDirks SXSWiMar12
DataViz & Future of Research - LDirks SXSWiMar12
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Partnering for Research Data
Partnering for Research DataPartnering for Research Data
Partnering for Research Data
 
Jeff's what isdatascience
Jeff's what isdatascienceJeff's what isdatascience
Jeff's what isdatascience
 
Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...
Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...
Research Data Management for Researchers: Module 1: Intro to Data, Metadata a...
 
research publish journal
research publish journalresearch publish journal
research publish journal
 
research publish journal
research publish journalresearch publish journal
research publish journal
 
6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive6.a survey on big data challenges in the context of predictive
6.a survey on big data challenges in the context of predictive
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data Talk
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
 

Data Science: An Emerging Field for Future Jobs

  • 1. Data Science: An Emerging Field for Future Jobs Jian Qin School of Information Studies Syracuse University A presentation for the Graduate School, Syracuse University February 22, 2013
  • 2. DS Talk points ›  Data science (DS) and data scientists in the context of research data ›  Implications and expectations of future research workforce ›  Preparing for the challenges and opportunities GRADUATION SCHOOL PRESENTATION 2013-2-22 2
  • 3. DS Feeling the pressure of data deluge in the digital information world … http://readwrite.com/2011/11/17/ infographic-data-deluge---8-ze GRADUATION SCHOOL PRESENTATION 2013-2-22 3
  • 4. DS …in science research http://www.sciencemag.org/content/ 331/6018.cover-expansion GRADUATION SCHOOL PRESENTATION 2013-2-22 4
  • 5. …in our health care DS http://ars.els-cdn.com/content/image/1-s2.0-S1053811905002508-gr4.jpg GRADUATION SCHOOL PRESENTATION 2013-2-22 5
  • 6. http://www.redfin.com/homes-for-sale#! market=boston&region_id=112&region_ …in our neighborhood type=1&v=8 DS GRADUATION SCHOOL PRESENTATION 2013-2-22 6
  • 7. Shift in Science Paradigms DS Thousand A few hundred A few decades Today years ago years ago ago Data exploration (eScience) unify theory, experiment, and simulation A computational -- Data captured by approach instruments or generated by simulating simulator Theoretical complex -- Processed by software branch phenomena -- Information/Knowledge using models, stored in computer generalizations -- Scientist analyzes Science was database/files using data empirical management and statistics describing natural Gray, J. & Szalay, A. (2007). eScience – A transformed scientific method. phenomena http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt 2/22/13 13:54 GRADUATION SCHOOL PRESENTATION 2013-2-22 7
  • 8. DS Research data collections Size Metadata Management Standards Larger, Multiple, Organized discipline- comprehensive Institutionalized, based Heroic individual Smaller, team- None or inside the based random team GRADUATION SCHOOL PRESENTATION 2013-2-22 8
  • 9. Emerging concepts DS that are going to stay and matter to your career GRADUATION SCHOOL PRESENTATION 2013-2-22 9
  • 10. What is data science? DS “An emerging area of work concerned with the collection, presentation, analysis, visualization, management, and preservation of large collections of information.” Stanton, J. (2012). Introduction to Data Science. http://ischool.syr.edu/media/documents/2012/3/ DataScienceBook1_1.pdf GRADUATION SCHOOL PRESENTATION 2013-2-22 10
  • 11. DS Data science and scientific research Management domain Technical domain Plan, design, consult Ingest, store, for, implement, and organize, merge, evaluate data filter, and transform management projects data and create and services analysis-ready data GRADUATION SCHOOL PRESENTATION 2013-2-22 11
  • 12. Data management is essential DS Laboratory Data Data Modeling/ Management Specialist Management Specialist Scientific Data Management •  Administer operational database •  Work closely with the high Specialist •  Assure the quality of data performance computing and •  Design, develop, implement, and database content the IT manager manage high-throughput automatic •  Interact closely with researchers, •  Develop a data model for data processing infrastructure for lab managers, and platform complex multi-scale rocks large databases in a mature system coordinators •  Design and organize a •  Develop and improve the •  Track deliverables against budget database and complex infrastructure supporting this system and prepare data reports queries •  Interface with multiple data •  Collaborate closely with IT and •  Integrate and mange multi- providers to design, build, and bioinformatics colleagues scale rocks subjected to maintain their customized databases •  Assist IT in gathering workflow large-scale scientific •  Clarify requirements, feature requirements computing applications requests and bug reports for software •  Test changes and updates in IT systems http://www.ingrainrocks.com/ developers and assist in testing data-management-specialist/ code. •  Create and maintain app documentation http://www.bioinformatics.org/ forums/forum.php?forum_id=9670 GRADUATION SCHOOL PRESENTATION 2013-2-22 12
  • 13. DS “We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” Loukides, M. (2011). What is data science? Sebastopol, CA: O’Reilly. GRADUATION SCHOOL PRESENTATION 2013-2-22 13
  • 14. DS Emerging job market: Data scientists ›  Data scientists are more likely to be involved across the data lifecycle: –  Acquiring new data sets: 33% –  Parsing data sets: 29% –  Filtering and organizing data: 40% –  Mining data for patterns: 30% –  Advanced algorithms to solve analytical problems: 29% –  Representing data visually: 38% –  Telling a story with data: 34% –  Interacting with data dynamically: 37% –  Making business decisions based on data: 40% http://mashable.com/2012/01/13/career-of- GRADUATION SCHOOL PRESENTATION 2013-2-22 14 the-future-data-scientist-infographic/
  • 15. Are you ready for the data challenges and opportunities? DS GRADUATION SCHOOL PRESENTATION 2013-2-22 15
  • 16. Ability to use a Knowledge Data DS wide variety of a subject modeling, tools for domain documentation, database and analysis, and query design report of data Data OS, Collaboration, communication, scientists Programming languages and co- ordination Content and Encoding What are repository languages systems expected of data scientists? GRADUATION SCHOOL PRESENTATION 2013-2-22 16
  • 17. DS Analytical skills: domain modeling Requirement analysis Interview skills, analysis and generalization skills Workflow analysis Ability to capture components and sequences in workflows Data modeling Ability to translate domain analysis Data transformation into data models needs analysis Ability to envision the data model Data provenance within the larger system architecture needs analysis GRADUATION SCHOOL PRESENTATION 2013-2-22 17
  • 18. Analytical skills: from data sources to patterns, DS relationships, and trends Analytical tools “Hacking” Knowledge Data products GRADUATION SCHOOL PRESENTATION 2013-2-22 18
  • 19. Data management skills: data lifecycle and DS infrastructural services Metadata Encoding Semantic Identify Infrastructural standards language control management services Processed, transformed, derived, calculated, … data •  Data source discovery •  Data curation Common data format Image formats •  Data preservation Matrix formats •  Data integration and Microarray file formats mashup Communication protocols •  Data citation, publication, and distribution •  Data linking and interoperability •  … GRADUATION SCHOOL PRESENTATION 2013-2-22 19
  • 20. Technology skills with excellent communication DS skills TECHNOLOGY SKILLS COMMUNICATION SKILLS ›  Operation systems ›  Interviews ›  Repository systems ›  “Ice breaking” ›  Database systems ›  Community building ›  Programming languages ›  Institutionalization ›  Encoding languages ›  Stakeholder buy-in ›  Specialized programming GRADUATION SCHOOL PRESENTATION 2013-2-22 20
  • 22. DS Four tracks: choose what you are good at Data Data storage analytics and Data Science management core course: Applied data science Databases General system Data management visualization http://ischool.syr.edu/ future/cas/ datascience.aspx GRADUATION SCHOOL PRESENTATION 2013-2-22 22
  • 23. The iSchool’s version of data science DS education Ability to use a Knowledge wide variety of a subject Data tools for domain modeling, documentation, database and analysis, and query design Eventually the report of data iSchool data science program will build Data OS, Collaboration, the foundation for communication, scientists Programming languages and co- super data ordination scientists… Content and Encoding repository languages systems GRADUATION SCHOOL PRESENTATION 2013-2-22 23
  • 24. DS Thank You! Questions? GRADUATION SCHOOL PRESENTATION 2013-2-22 24