SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Big Data and You
Preparing Current & Future Information Specialists

Sands Fish
Data Scientist / MIT Libraries
@sandsfish
sands@mit.edu
Knowing in the Age of
Networked Knowledge
Nothing is static.
Everything is connected.
Knowledge representation
is now complex
Scholarly Primitives
- Discovering
- Annotating
- Comparing
- Referring
- Sampling
- Illustrating
- Representing
John Unsworth, 2000.
http://people.brandeis.edu/~unsworth/Kings.5-00/primitives.html
Complex Knowledge
Objects
- Have multiple representations & ways of being consumed
- Can be a link in a chain, node in a graph, or ecosystem of
knowledge.

- Allow different perspectives or ways to ask questions of.

(none of these are true of physical books)
Complex Knowledge
Objects
Data Examples:
•
•
•
•
•

JSON, XML, etc. esp. from a URL that allows it to be updated
Visualizations, sonifications, etc. (mind-maps, interactives)
Geospatial data, layered, constrained by area
APIs
Linked Data, integrated with many other resources
Complex Knowledge
Objects
• Tool / Platform Examples:
•
•
•
•
•

Integrated Data Platforms
Courseware
MOOCs
Interactive Visualizations
Commons-based peer production
(wikis, reviews, software, etc.)
• Tweets
• Data analysis tools
• Data Enclaves (limited access processing endpoints)
Methods of Exploration
In this diverse ecosystem, there is no one way of exploring a
topic.
- Manual Browsing

- Automated Spidering (e.g. Berkman / Media Cloud)
- Collection / Trawling (e.g. Browser Plugins)
- Conventional Big Data (e.g. Hadoop, Map/Reduce)
- Using Linked Data to branch out through related concepts
- Algorithmic Data Processing (e.g. Topic Modeling)
Problems of Completeness
- When do you know that you have enough information?

- What kind of compromises are made when information is
more massive than anyone can consume?
Problems of Integration
When data comes from many different silos, in many
different structures and formats, how do you bring all of this
knowledge together?

- One solution is RDF, which provides a common data
generic data model. Collaborative ontology development
can allow communities to work together.
- Open standards.
- Build tools and services that provide easy access to the
underlying data.
How To Get A Grip
- Keep abreast of W3C developments and other standards
bodies.
- Don’t focus too much on single technologies. They will
shift quickly.
- Learn at least one data visualization technology.
- Remember to frame questions of data in more than one
way.
- Ask your own questions of the data yourself. Understand
it from the point of the user.
Big Data and You: Preparing for the Future of Information

Weitere ähnliche Inhalte

Was ist angesagt?

Rise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsRise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsLibrary_Connect
 
Library Connect Webinar - Data Sharing
Library Connect Webinar - Data Sharing Library Connect Webinar - Data Sharing
Library Connect Webinar - Data Sharing Library_Connect
 
Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106jeffreylancaster
 
Towards a digital library for York
Towards a digital library for YorkTowards a digital library for York
Towards a digital library for YorkJulie Allinson
 
Linked Data for Knowledge Discovery: Introduction
Linked Data for Knowledge Discovery: IntroductionLinked Data for Knowledge Discovery: Introduction
Linked Data for Knowledge Discovery: IntroductionMathieu d'Aquin
 
Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...Mathieu d'Aquin
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Incremental idcc 08_12_10_slideshare
Incremental idcc 08_12_10_slideshareIncremental idcc 08_12_10_slideshare
Incremental idcc 08_12_10_slideshareIncremental Project
 
Your digital humanities are in my library! No, your library is in my digital ...
Your digital humanities are in my library! No, your library is in my digital ...Your digital humanities are in my library! No, your library is in my digital ...
Your digital humanities are in my library! No, your library is in my digital ...Rebekah Cummings
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadatalibrarianrafia
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...EDINA, University of Edinburgh
 
The liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleThe liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
 

Was ist angesagt? (20)

Rise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen RomboutsRise of the Databrarian - Jeroen Rombouts
Rise of the Databrarian - Jeroen Rombouts
 
Library Connect Webinar - Data Sharing
Library Connect Webinar - Data Sharing Library Connect Webinar - Data Sharing
Library Connect Webinar - Data Sharing
 
Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106
 
Towards a digital library for York
Towards a digital library for YorkTowards a digital library for York
Towards a digital library for York
 
Linked Data for Knowledge Discovery: Introduction
Linked Data for Knowledge Discovery: IntroductionLinked Data for Knowledge Discovery: Introduction
Linked Data for Knowledge Discovery: Introduction
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...
 
The Blossoming of the Semantic Web
The Blossoming of the Semantic WebThe Blossoming of the Semantic Web
The Blossoming of the Semantic Web
 
DH2012_Bellamy
DH2012_BellamyDH2012_Bellamy
DH2012_Bellamy
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Presentation to KILT
Presentation to KILTPresentation to KILT
Presentation to KILT
 
Incremental idcc 08_12_10_slideshare
Incremental idcc 08_12_10_slideshareIncremental idcc 08_12_10_slideshare
Incremental idcc 08_12_10_slideshare
 
Your digital humanities are in my library! No, your library is in my digital ...
Your digital humanities are in my library! No, your library is in my digital ...Your digital humanities are in my library! No, your library is in my digital ...
Your digital humanities are in my library! No, your library is in my digital ...
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadata
 
MANTRA & Open Educational Resources
MANTRA & Open Educational ResourcesMANTRA & Open Educational Resources
MANTRA & Open Educational Resources
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
Is Linked Open Data the way forward?
Is Linked Open Data the way forward?Is Linked Open Data the way forward?
Is Linked Open Data the way forward?
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
The liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleThe liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycle
 
Benoit Visual Only Retrieval
Benoit Visual Only RetrievalBenoit Visual Only Retrieval
Benoit Visual Only Retrieval
 

Andere mochten auch

Evaluation of preliminary task
Evaluation of preliminary task Evaluation of preliminary task
Evaluation of preliminary task emh3196
 
Plan for music magazine
Plan for music magazinePlan for music magazine
Plan for music magazineemh3196
 
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...Dániel Kristóf
 
Roman, britanian roman, peninggalan bangsa roman
Roman, britanian roman, peninggalan bangsa romanRoman, britanian roman, peninggalan bangsa roman
Roman, britanian roman, peninggalan bangsa romanApep Wahyudin
 
A Life Changing Opportunity
A Life Changing OpportunityA Life Changing Opportunity
A Life Changing OpportunityAnil Kumar
 
Evaluation question 3
Evaluation question 3Evaluation question 3
Evaluation question 3emh3196
 
Huge UX Design School Exercise
Huge UX Design School ExerciseHuge UX Design School Exercise
Huge UX Design School ExerciseElliott Romano
 
Handsketches for Huge Design Exercise
Handsketches for Huge Design ExerciseHandsketches for Huge Design Exercise
Handsketches for Huge Design ExerciseElliott Romano
 
Dampak IPTEK Terhadap kehidupan Sosial
Dampak IPTEK Terhadap kehidupan SosialDampak IPTEK Terhadap kehidupan Sosial
Dampak IPTEK Terhadap kehidupan SosialApep Wahyudin
 
Makalah ISBD(manusia dan lingkungan)
Makalah ISBD(manusia dan lingkungan)Makalah ISBD(manusia dan lingkungan)
Makalah ISBD(manusia dan lingkungan)Apep Wahyudin
 

Andere mochten auch (14)

Evaluation of preliminary task
Evaluation of preliminary task Evaluation of preliminary task
Evaluation of preliminary task
 
Plan for music magazine
Plan for music magazinePlan for music magazine
Plan for music magazine
 
Marcela ocampo
Marcela ocampoMarcela ocampo
Marcela ocampo
 
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
 
Roman, britanian roman, peninggalan bangsa roman
Roman, britanian roman, peninggalan bangsa romanRoman, britanian roman, peninggalan bangsa roman
Roman, britanian roman, peninggalan bangsa roman
 
A Life Changing Opportunity
A Life Changing OpportunityA Life Changing Opportunity
A Life Changing Opportunity
 
Evaluation question 3
Evaluation question 3Evaluation question 3
Evaluation question 3
 
Huge UX Design School Exercise
Huge UX Design School ExerciseHuge UX Design School Exercise
Huge UX Design School Exercise
 
ART OBZOR
ART OBZORART OBZOR
ART OBZOR
 
Handsketches for Huge Design Exercise
Handsketches for Huge Design ExerciseHandsketches for Huge Design Exercise
Handsketches for Huge Design Exercise
 
Makalah Raeding
Makalah RaedingMakalah Raeding
Makalah Raeding
 
Makalah
MakalahMakalah
Makalah
 
Dampak IPTEK Terhadap kehidupan Sosial
Dampak IPTEK Terhadap kehidupan SosialDampak IPTEK Terhadap kehidupan Sosial
Dampak IPTEK Terhadap kehidupan Sosial
 
Makalah ISBD(manusia dan lingkungan)
Makalah ISBD(manusia dan lingkungan)Makalah ISBD(manusia dan lingkungan)
Makalah ISBD(manusia dan lingkungan)
 

Ähnlich wie Big Data and You: Preparing for the Future of Information

Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEnno Meijers
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationMathieu d'Aquin
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 

Ähnlich wie Big Data and You: Preparing for the Future of Information (20)

Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Implementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource ConditionsImplementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource Conditions
 
Web Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide WebWeb Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide Web
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 

Kürzlich hochgeladen

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Kürzlich hochgeladen (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Big Data and You: Preparing for the Future of Information

  • 1. Big Data and You Preparing Current & Future Information Specialists Sands Fish Data Scientist / MIT Libraries @sandsfish sands@mit.edu
  • 2.
  • 3.
  • 4. Knowing in the Age of Networked Knowledge
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 14. Scholarly Primitives - Discovering - Annotating - Comparing - Referring - Sampling - Illustrating - Representing John Unsworth, 2000. http://people.brandeis.edu/~unsworth/Kings.5-00/primitives.html
  • 15. Complex Knowledge Objects - Have multiple representations & ways of being consumed - Can be a link in a chain, node in a graph, or ecosystem of knowledge. - Allow different perspectives or ways to ask questions of. (none of these are true of physical books)
  • 16. Complex Knowledge Objects Data Examples: • • • • • JSON, XML, etc. esp. from a URL that allows it to be updated Visualizations, sonifications, etc. (mind-maps, interactives) Geospatial data, layered, constrained by area APIs Linked Data, integrated with many other resources
  • 17. Complex Knowledge Objects • Tool / Platform Examples: • • • • • Integrated Data Platforms Courseware MOOCs Interactive Visualizations Commons-based peer production (wikis, reviews, software, etc.) • Tweets • Data analysis tools • Data Enclaves (limited access processing endpoints)
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Methods of Exploration In this diverse ecosystem, there is no one way of exploring a topic. - Manual Browsing - Automated Spidering (e.g. Berkman / Media Cloud) - Collection / Trawling (e.g. Browser Plugins) - Conventional Big Data (e.g. Hadoop, Map/Reduce) - Using Linked Data to branch out through related concepts - Algorithmic Data Processing (e.g. Topic Modeling)
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Problems of Completeness - When do you know that you have enough information? - What kind of compromises are made when information is more massive than anyone can consume?
  • 31. Problems of Integration When data comes from many different silos, in many different structures and formats, how do you bring all of this knowledge together? - One solution is RDF, which provides a common data generic data model. Collaborative ontology development can allow communities to work together. - Open standards. - Build tools and services that provide easy access to the underlying data.
  • 32. How To Get A Grip - Keep abreast of W3C developments and other standards bodies. - Don’t focus too much on single technologies. They will shift quickly. - Learn at least one data visualization technology. - Remember to frame questions of data in more than one way. - Ask your own questions of the data yourself. Understand it from the point of the user.

Hinweis der Redaktion

  1. We have crossed technological thresholds in the past, where the scale of operational parameters exceeded human conceptualization. 10-60rpms.
  2. 1000-6000rpms
  3. 40,000rpms
  4. We are crossing a similar threshold now with big data. Mathematical complexity in 3 dimensions is conceivable.
  5. Anything beyond that is basically impossible to visualize or conceptualize, even though the Euclidean geometry scales and continues to work for things like the Vector Space Model for representing high-dimensionality text for data mining.
  6. The scale of network complexity has crossed this threshold as well. (2007)
  7. Linked Open Data Cloud, 2010. Has scaled enormously since.
  8. This all leaves us in an environment where the knowledge we need to educate ourselves about a given topic are not static, or found in a stack of books, but live, connected, spanning the borders of conventional containers.
  9. We now have to contend with new forms of knowledge, new ways of discovering it, and new challenges to making use of it.Amazon deleted copies of Farrentheit 451 off of users’ Kindles. We are no longer in a world where knowledge is a simple, static asset. If authors can change their writing, the government can change their data.
  10. Unsworth’s Scholarly Primitives are worth considering when planning for high-level functionality in a world where we rely on abstractions in the form of tools and algorithms to represent things in a lower-dimensionality. What are the tasks we need to provide access to on top of great scale.
  11. Knowledge is being represented in vastly more complex structures than it used to be. Previously, books, tables, encyclopedias, and simple web pages were the standard. Now we have a heterogeneous mix of knowledge representation. These go beyond the page or document. They have the following properties.
  12. They range from the concrete to the abstract.
  13. Enigma.io, connecting disparate but linkable gov’t data sets.
  14. WikiData as a complex knowledge object / ecosystem.
  15. Tweet Metadatahttp://stackoverflow.com/questions/16600099/how-to-output-json-data-correctly-using-php
  16. Mind maps (even in planning for this talk) are complex knowledge objects.
  17. Browsing History Data; Civic Media; Catherine D’Ignazio doing work to learn about where you pay attention to and where you don’t. Browsing trackers are not independent, but networked.http://skyeome.net/wordpress/?paged=2
  18. Linked Data Wanderer: Archie Bunker -> Singing -> List of Sovereign States -> Republican Party -> Imigrants to Cuba -> Human Voice -> Homeward Bound (the album)
  19. Influence sub-graph on top of live DBpedia data. ( @sandsfish – http://dbpeople.herokuapp.com )
  20. Highly linked.Influence sub-graph on top of live DBpedia data. ( @sandsfish – http://dbpeople.herokuapp.com )
  21. Geo-parsing of place mentions in MIT Open Access collection. http://dspace.mit.edu
  22. Geo-parsing of place mentions in MIT Open Access collection. http://dspace.mit.edu
  23. For an Open Access collection to be useful to a wide population, it needs to be exposed for mining.I’m building an API to allow anyone to mine this information, instead of being limited to the representation of it on a web page or in a PDF. @sandsfish
  24. If you primarily work with raw data, or are a librarian, learn even the most basic methods of visualization. This will give you a vocabulary with which to interact with data owners or patrons, and provide a better way of conceptualizing what questions are possible with data.Shifting your perspective from geographic to temporal, or from a single answer to a range of possibilities will help expand the knowledge you can acquire about a topic. This is one of the benefits of knowledge being complex.