SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Bionic Info Pro:
New Takes on an Old Theme
Machine Learning, Taxonomy Creation, Big Data,
Competitive Intelligence, and the Human Element
Elaine M. Lasda Bergman
Annual Conference
Special Libraries Association
Vancouver, BC, Canada
Monday, June 9, 2014
Overview
• A little bit about Machine Learning
• A little bit about Taxonomies
• A little bit about Big Data
• A little bit about Hybrid Techniques
NOT NEW:
Machine Learning for CI
Mena, Jesus. (1996). Data Mining for
Competitive Intelligence, Competitive
Intelligence Review, 7(4):18-25.
Refinement of Machine Learning
• Decision Trees/Classification
• Clustering
• Anomaly Detection
Refinement of Machine Learning
• Support Vector Machines-
– Predictive Classification
• Association Rules
– Marketbasket analysis
• Natural Language Processing
– Sentiment Analysis
Getting up to Speed
• http://efytimes.com
• 6 Video Tutorials and Playlists on
Machine Learning (January 2014)
NOT NEW: Taxonomies in
Information Retrieval
http://comsaad.blogspot.com/p/old-computer-photos.html
http://commons.wikimedia.org/wiki/File:A_Library_Primer_illustration_Joined_Hand.jpg
Need for Taxonomic Structures
http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg
NOT NEW: Datasets
http://www.conceptdraw.com/solution-park/resource/images/solutions/entity-relationship-diagram-(erd)/Diagramming-Crow's-Foot-ERD-Sample60.png
Enter BIG DATA
http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
BigData Sources and AnalysisDataType Qualities Analysis Tools Result
Social Media Demographics API integration More profiles of like-
minded users
“Social Influencers” User Reviews NLP, Text Analysis Sentiment readings
“Internet of Things” Logs/Sensors/Check-Ins Parsing Usage and behavior
patterns
SaaS Cloud/Web-based/Subscription
software
Dist. data integration/in-memory caching
technology/API integration
Usage behavior patterns,
customer data, etc.
Public Data e.g., Amazon Data Market,
WorldBank, Wikipedia
All above (depends on data structure) Depends on Dataset (and
there are LOTS of them!)
Hadoop/MapReduce Volume! Parallel Processing/Parsing/Reduction Big patterns, correlations,
needles in haystacks
Data Warehouses Internal transactional data Likely same as above Correlations,
marketbasket, etc.
NoSQL/Columnar Volume! Fills gaps in Parallel processing tools Real time activity and
patterns
In-Stream Monitoring Network traffic (streaming
videos, system outages)
Packet evaluation, distributed query processing Network/Stream usage
patterns
Legacy Data Usually PDFs &
Documents/SemiStructured
Transformation tools(eg, Xenos d2e) + above Depends on content (could
be all)
http://www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926/
Why “Concept Hierarchies” in
an Unstructured Environment?
Advantages
• When term is too low to appear in
frequent item/rulesets
• Create more interesting rules using
more general, aggregated concepts
[DVD, wheat bread, home electronics,
electronitcs, food]
Kumar, T.S. (2005) Introduction to Data Science
Disadvantages
• How low and how high in the hierarchy
do you set the threshold?
• Increased computation time
• If threshold is to high, redundant rules
for more specific terms can be
summarized by rules using more
general terms
Hybrid Taxonomic Development
• Understand your auto-classification
model
• Work with domain experts to create
basic taxonomy
• Test Taxonomy in the Model
• Rinse, repeat
Wendy Pohs,ASIS&T Bulletin 12/1/13
Domain Knowledge
and Thick Data
• Thick Data analysis primarily relies on human brain power to
process a small “N” while big data analysis requires
computational power (of course with humans writing the
algorithms) to process a large “N”.
• Big Data reveals insights with a particular range of data
points, while Thick Data reveals the social context of and
connections between data points. Big Data delivers numbers;
thick data delivers stories. Big data relies on machine
learning; thick data relies on human learning.
http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/ (Tricia Wang)
Data Driven CI is Meaningless
Without Human/Domain
Knowledge
http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real-
world/
Recap
• Data Mining for CI is not new
• Refinement and Improvement
• Bigger, Weirder Data
Recap
• Where it’s at: Hybrid Schemas
• Thick Data, not just Big Data
• HUMAN ELEMENT IS ESSENTIAL
Questions?
Elaine Lasda Bergman
University at Albany
http://www.slideshare.net/librarian68
elasdabergman@albany.edu
@ElaineLibrarian

Weitere ähnliche Inhalte

Was ist angesagt?

Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of UtahRebekah Cummings
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010Jisc
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghEDINA, University of Edinburgh
 
RDM and DMP intro
RDM and DMP introRDM and DMP intro
RDM and DMP introSarah Jones
 
LEARN Conference - How to cost
LEARN Conference - How to costLEARN Conference - How to cost
LEARN Conference - How to costJisc RDM
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012University of South Australlia
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinarSarah Jones
 
NISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLNISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLCarly Strasser
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data ManagementJulia Gross
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportLibrary_Connect
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 

Was ist angesagt? (20)

Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010SPARC Repositories conference in Baltimore - Nov 2010
SPARC Repositories conference in Baltimore - Nov 2010
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Research Data Management at the University of Edinburgh
Research Data Management at the University of EdinburghResearch Data Management at the University of Edinburgh
Research Data Management at the University of Edinburgh
 
RDM and DMP intro
RDM and DMP introRDM and DMP intro
RDM and DMP intro
 
LEARN Conference - How to cost
LEARN Conference - How to costLEARN Conference - How to cost
LEARN Conference - How to cost
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
 
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
Introduction to Research Data Management - 2017-02-15 - MPLS Division, Univer...
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinar
 
NISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLNISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDL
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research support
 
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
Research Data Management Plan: How to Write One - 2017-02-01 - University of ...
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 

Ähnlich wie Bionic Info Pro: Machine Learning, Taxonomies and Big Data

No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupDavid Johnston
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance DiscoveryMining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance DiscoveryMary Ellen Bates
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016IzzyChad
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 

Ähnlich wie Bionic Info Pro: Machine Learning, Taxonomies and Big Data (20)

No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Database part1-
Database part1-Database part1-
Database part1-
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance DiscoveryMining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 

Mehr von Elaine Lasda

Your Systematic Review: Getting Started
Your Systematic Review: Getting StartedYour Systematic Review: Getting Started
Your Systematic Review: Getting StartedElaine Lasda
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesElaine Lasda
 
The New Metrics: conference presentation
The New Metrics: conference presentationThe New Metrics: conference presentation
The New Metrics: conference presentationElaine Lasda
 
Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Elaine Lasda
 
Scholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsScholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsElaine Lasda
 
Personal Time Management
Personal Time ManagementPersonal Time Management
Personal Time ManagementElaine Lasda
 
Early Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactEarly Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactElaine Lasda
 
Computers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsComputers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsElaine Lasda
 
Computers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesComputers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesElaine Lasda
 
Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Elaine Lasda
 
Data Literacy for Librarians
Data Literacy for LibrariansData Literacy for Librarians
Data Literacy for LibrariansElaine Lasda
 
UAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantUAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantElaine Lasda
 
Open Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopOpen Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopElaine Lasda
 
Data and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetData and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetElaine Lasda
 
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandAltmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandElaine Lasda
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsElaine Lasda
 
Open Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdOpen Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdElaine Lasda
 
Research Impact Roadshow
Research Impact RoadshowResearch Impact Roadshow
Research Impact RoadshowElaine Lasda
 
Gaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisGaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisElaine Lasda
 
Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Elaine Lasda
 

Mehr von Elaine Lasda (20)

Your Systematic Review: Getting Started
Your Systematic Review: Getting StartedYour Systematic Review: Getting Started
Your Systematic Review: Getting Started
 
Research Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case StudiesResearch Impact in Specialized Settings: 3 Case Studies
Research Impact in Specialized Settings: 3 Case Studies
 
The New Metrics: conference presentation
The New Metrics: conference presentationThe New Metrics: conference presentation
The New Metrics: conference presentation
 
Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!Maximizing Your Research Impact: 5 Quick Hits!
Maximizing Your Research Impact: 5 Quick Hits!
 
Scholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized SettingsScholarly Metrics in Specialized Settings
Scholarly Metrics in Specialized Settings
 
Personal Time Management
Personal Time ManagementPersonal Time Management
Personal Time Management
 
Early Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly ImpactEarly Career Tactics to Increase Scholarly Impact
Early Career Tactics to Increase Scholarly Impact
 
Computers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly MetricsComputers in Libraries 2018 Workshop on Scholarly Metrics
Computers in Libraries 2018 Workshop on Scholarly Metrics
 
Computers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics FreebiesComputers in Libraries Scholarly Metrics Freebies
Computers in Libraries Scholarly Metrics Freebies
 
Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2Data Literacy for Librarians - Day 2
Data Literacy for Librarians - Day 2
 
Data Literacy for Librarians
Data Literacy for LibrariansData Literacy for Librarians
Data Literacy for Librarians
 
UAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER GrantUAlbany Open Access Day Presentation on OER Grant
UAlbany Open Access Day Presentation on OER Grant
 
Open Educational Resources Faculty Workshop
Open Educational Resources Faculty WorkshopOpen Educational Resources Faculty Workshop
Open Educational Resources Faculty Workshop
 
Data and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheetData and Libraries: How I learned to stop worrying and love the spreadsheet
Data and Libraries: How I learned to stop worrying and love the spreadsheet
 
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the LandAltmetrics & Scholarly Publishing: the LIbrary Lay of the Land
Altmetrics & Scholarly Publishing: the LIbrary Lay of the Land
 
From Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly MetricsFrom Reputation to Citation: Varying Roles for Scholarly Metrics
From Reputation to Citation: Varying Roles for Scholarly Metrics
 
Open Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher EdOpen Educational Resources (OERs): A Game Changer For Higher Ed
Open Educational Resources (OERs): A Game Changer For Higher Ed
 
Research Impact Roadshow
Research Impact RoadshowResearch Impact Roadshow
Research Impact Roadshow
 
Gaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric AnalysisGaining Insights Through Bibliometric Analysis
Gaining Insights Through Bibliometric Analysis
 
Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!Getting "Fancy" With Your Library Data!
Getting "Fancy" With Your Library Data!
 

Kürzlich hochgeladen

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Kürzlich hochgeladen (20)

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Bionic Info Pro: Machine Learning, Taxonomies and Big Data

  • 1. Bionic Info Pro: New Takes on an Old Theme Machine Learning, Taxonomy Creation, Big Data, Competitive Intelligence, and the Human Element Elaine M. Lasda Bergman Annual Conference Special Libraries Association Vancouver, BC, Canada Monday, June 9, 2014
  • 2. Overview • A little bit about Machine Learning • A little bit about Taxonomies • A little bit about Big Data • A little bit about Hybrid Techniques
  • 3. NOT NEW: Machine Learning for CI Mena, Jesus. (1996). Data Mining for Competitive Intelligence, Competitive Intelligence Review, 7(4):18-25.
  • 4. Refinement of Machine Learning • Decision Trees/Classification • Clustering • Anomaly Detection
  • 5. Refinement of Machine Learning • Support Vector Machines- – Predictive Classification • Association Rules – Marketbasket analysis • Natural Language Processing – Sentiment Analysis
  • 6. Getting up to Speed • http://efytimes.com • 6 Video Tutorials and Playlists on Machine Learning (January 2014)
  • 7. NOT NEW: Taxonomies in Information Retrieval http://comsaad.blogspot.com/p/old-computer-photos.html http://commons.wikimedia.org/wiki/File:A_Library_Primer_illustration_Joined_Hand.jpg
  • 8. Need for Taxonomic Structures http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg
  • 11. BigData Sources and AnalysisDataType Qualities Analysis Tools Result Social Media Demographics API integration More profiles of like- minded users “Social Influencers” User Reviews NLP, Text Analysis Sentiment readings “Internet of Things” Logs/Sensors/Check-Ins Parsing Usage and behavior patterns SaaS Cloud/Web-based/Subscription software Dist. data integration/in-memory caching technology/API integration Usage behavior patterns, customer data, etc. Public Data e.g., Amazon Data Market, WorldBank, Wikipedia All above (depends on data structure) Depends on Dataset (and there are LOTS of them!) Hadoop/MapReduce Volume! Parallel Processing/Parsing/Reduction Big patterns, correlations, needles in haystacks Data Warehouses Internal transactional data Likely same as above Correlations, marketbasket, etc. NoSQL/Columnar Volume! Fills gaps in Parallel processing tools Real time activity and patterns In-Stream Monitoring Network traffic (streaming videos, system outages) Packet evaluation, distributed query processing Network/Stream usage patterns Legacy Data Usually PDFs & Documents/SemiStructured Transformation tools(eg, Xenos d2e) + above Depends on content (could be all) http://www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926/
  • 12. Why “Concept Hierarchies” in an Unstructured Environment?
  • 13. Advantages • When term is too low to appear in frequent item/rulesets • Create more interesting rules using more general, aggregated concepts [DVD, wheat bread, home electronics, electronitcs, food] Kumar, T.S. (2005) Introduction to Data Science
  • 14. Disadvantages • How low and how high in the hierarchy do you set the threshold? • Increased computation time • If threshold is to high, redundant rules for more specific terms can be summarized by rules using more general terms
  • 15. Hybrid Taxonomic Development • Understand your auto-classification model • Work with domain experts to create basic taxonomy • Test Taxonomy in the Model • Rinse, repeat Wendy Pohs,ASIS&T Bulletin 12/1/13
  • 16. Domain Knowledge and Thick Data • Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. • Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning. http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/ (Tricia Wang)
  • 17. Data Driven CI is Meaningless Without Human/Domain Knowledge http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real- world/
  • 18. Recap • Data Mining for CI is not new • Refinement and Improvement • Bigger, Weirder Data
  • 19. Recap • Where it’s at: Hybrid Schemas • Thick Data, not just Big Data • HUMAN ELEMENT IS ESSENTIAL
  • 20. Questions? Elaine Lasda Bergman University at Albany http://www.slideshare.net/librarian68 elasdabergman@albany.edu @ElaineLibrarian

Hinweis der Redaktion

  1. Not an expert, I am a “LEARNER” a student
  2. “automatic discovery of patterns using software to analyze vast amounts of records in a database” What else was going on in techi n 1996
  3. The 1996 article mentioned transactional data, “all the rage” Marketing, Infentory, Risk mitigation Efficiency and waste allow us to formulate solutions in englisn
  4. “Library Hand” – we’ve been doing indexing, taxonomies, classsification since the beginning of our profession Machine created taxonomies are not new, text mining, extraction, and indexing have been automated since the 1960s. The earliest I could find was a paper published by the RAND corporation in 1961
  5. Wider need for classification- Building Enterprise Taxonomies, Stewart The pendulum – “searching” versus “browsing” paradigms Search = lack of context, precision versus recall, relevancy ranking, choice of terminology Proper syntax for each search tool, where to search? Spelling variants, bad labels Where do we find taxonomies and ontologies today? Here are some of their natural habitats Web sites Discipline/Domain Classification Machine Learning Algorithms Training dataset and a testing dataset. As heather points out in her book the Accidental Taxonoist, the efficacy of machine created taxonomies improves dramatically with human quality control
  6. Relational DBs – ENTITY RELATIONSHIP Legacy systems Hierarchical models Network models Diagram for a realtional database is in rows and columns, Classes, variables, attributies, qualities, fields observations instances, records, cases
  7. NoSQL Multimedia Unstructured
  8. Andrew Brust “Bigger data means weirder data” <-Jeffry Stanton in Intro to Data Science book
  9. Big Data a revolution that will transform how we live work and think Weed out data noise Algorithms can be programed with human quality control to account for redundancy and catch inconsistencies, different terms http://it.toolbox.com/blogs/irm-blog/the-benefits-of-a-data-taxonomy-4916 https://www.earley.com/blog/why-taxonomy-critical-master-data-management-mdm
  10. Autoclassification model: Linguistic/lexical: gather and rank representative words and phrases that are associated with the concepts to be classified;  Rules Based: no common syntax for developing rules; varies by tool. Rules syntax could be Boolean to the more complex syntax more commonly used in programming languages. Because of this lack of consistency, the people who create and maintain these rules will have a more specialized skill set and will require more training. Machine Learning/Predictive: And these systems rely on iteration to continuously validate. Traditional hierarchical taxonomy may not be needed, reference terms or document sets to model. Maintenance of machine learning systems = repeated training, especially when you add new content. You will also help revise the larger machine-learning model as you learn more about your content.
  11. Examples of Domain Knowledge -Big data revolution book – buliding inspectors needed to predict which buildings should have priority inspections wEb design for user generated content – automatically ccategorizes user driven content but taxonomy is refined by humans As refined, the autoclassifier improves,”gets smarter” We as knowledge experts fill in the gaps! We can be facilitators with those in the field/analysts and those programming the algorithms
  12. Example of meaningless data: Google Flu trends Scientific controlled experiments limit external sources, domain knowledge fills in the gaps in the real world data analysis http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real-world/