SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Extract – Analyse – Search - Visualise
Text mining and machine learning for Research Data Management
Dr Tom Parsons and Mitchell Murphy
28/06/2017
2
Co founder, RDM, Knowledge Management
DR. TOM PARSONS
React.js panel and Node.js
WILL EVANS
Python/R data scientist Machine learning and computer vision
DR. STUART BOWE & MITCH MURPHY
Co founder, Software delivery
TIM VENISON
Python, architecture, processing pipeline
BARNABY KEENE
About Spotlight Data
Rapid development of innovative products
OUR AGILE CROSS FUNCTIONAL TEAM
28/06/2017
Developers, architects and researchers
POOL OF ASSOCIATES AND PLACEMENTS
3
Gathering and
making sense of
unstructured data
captured from a
variety of sources
We use charting,
network graphs,
maps and other
techniques for data
investigation
Mining data from
archives, websites
social media and API
sources
Analysis Tools
From simple interfaces
and powerful searches
to end to end large
scale processing
systems
We utilise machine
learning techniques
to extract and
investigate data.
What we do
Data science
Dark DataData Mining Data VisualisationArtificial Intelligence
28/06/2017
4
Spotlight Data
Projects
• Large project with the UK Government and Durham University:
• Applying text mining and machine learning to large data sets
and document corpora
• Twitter and social media mining for ESRC Climate Change project
• Sensor data analysis and machine learning
28/06/2017
5
The Nanowire system
Cloud or on premise
Microservice containerised architecture
Ingest DiscoverProcess
Workers
User panel User panel
Data Processing –
Natural Language
Processing, text
mining, classifiers,
pattern recognition
MQ
Pre-
process
Storage
28/06/2017
6
Ability to process structured and unstructured data
DATA PROCESSING CAPABILITY
Built to adapt to use cases that constantly evolve through a
microservice architecture
ADAPTABILITY
Design for all levels of users with continual improvement
USER EXPERIENCE
Cloud and infrastructure agnostic with the ability to scale
from 100s to millions of files
SCALING
The ability to quickly change releases on a fast and robust
deployment system
FAST DEPLOYMENT
All components to be tested prior to release in a continuous
integration and deployment cycle
TESTED
Nanowire goals
Development targets
Utilising open source libraries with a permissive licence.
OPEN SOURCE
All services to be provided as Docker containers by default,
with no external dependencies
CONTAINERISED
28/06/2017
Introduction
Text mining
8
Text mining
What to do with this information:
• Mine information for research?
• Develop new products and drive innovation
• Allow reuse of research data?
28/06/2017
“The discovery by computer of new, previously unknown information, by automatically
extracting information from different written resources. A key element is the linking ... of
the extracted information ... to form new facts or new hypotheses to be explored
further” (Hearst, 2003)
“An estimated 2.4 million scientific articles published every year” Research Consulting TDM report
9
Text mining
Extracting information
Choose sources Extract text Clean text Analysis Clustering Results
28/06/2017
DATABASES, FILES,
FOLDERS, OFFICE 365
NATURAL LANGUAGE
PROCESSING –
ENTITIES, CONCEPTS,
TOPICS, KEYWORDS,
SENTIMENT
STOP WORD REMOVAL,
TOKENISATION
10
Results
Visualising data
28/06/2017
11
Clusters
Graph databases
28/06/2017
12
Enhanced data storage
JSON Linked Data format
{
"@context": "http://schema.org",
"@type": "DigitalDocument",
"mentions": [
{
"@type": "Person",
"email": "tom.parsons@nottingham.ac.uk"
},
{
"@type": "Thing",
"url": "http://admire.jiscinvolve.org/wp/"
}
],
"spatialCoverage": [
{
"@type": "Place",
"name": "Manchester"
},
{
"@type": "Place",
"name": "British Library"
},
{
"@type": "Place",
"name": "Nottingham"
}
],
"keywords": "rdm,project,nottingham,support,research data",
"inLanguage": {
"@type": "Language",
"name": "English"
},
"typicalAgeRange": ">=18"
}
ANALYSIS RESULTS VALIDATED JSON-LD
28/06/2017
13
Linking text to data
Relationships between data, articles and people
28/06/2017
RESEARCH OUTPUTS
AUTHORS, ACADEMICS, PI/CO-I
UNIVERSITIES, LOCATIONS
14
Linking text to data
Typical metadata
28/06/2017
15
Linking text to data
Data tables
28/06/2017
Data set: https://www.repository.cam.ac.uk/handle/1810/32806
16
Linking text to data
Automated relationships between data, articles and people
28/06/2017
RESEARCH OUTPUTS
AUTHORS, ACADEMICS, PI/CO-I
UNIVERSITIES, LOCATIONS
COMPACT SILTY-LOAM SOIL 2
COURTYARD DEPOSIT BY 2
DEPOSIT BY OVEN 2
DEPOSIT WHITE THIN 2
FI9710 ASHY COURTYARD 2
IIID 5705 FI9710 2
LAYER OF PHYTOLITHS 9
RESIDUE FROM POT 2
RM 4 RESIDUE 2
RM 97 BURNT 2
THIN LAYER OF 2
WHITE LAYER OF 7
WHITE THIN LAYER 2
Citation: Madella, M. (2004). Kilise Tepe Monograph Section F2 Phytolith Data
Table 1
Madella, M.
URL: https://www.repository.cam.ac.uk/handle/1810/32806
Places: Europe, Turkey
Organisations: University of Cambridge
Densham, M.
URL:
https://www.repository.cam.ac.uk/han
dle/1810/33130
17
Search and discovery
Graph databases
28/06/2017
RESEARCH OUTPUTS RELATED
TO PHYTOLITHS
AUTHORS CONNECTED TO
MULTIPLE KILISE TEPE TOPICS
18
Results
Visualising data
28/06/2017
19
Discussion
Text mining
• Discuss in groups for 10 minutes:
• Sources of text and data (files, images, video etc.)
• How could text mining be used for RDM?
• What do you struggle with?
• What are the top three priorities?
28/06/2017
Introduction
Machine learning and text
21
Overview
• What is it?
• Why is it needed?
• Why is it useful for research data management?
• How does it work?
• Demo
28/06/2017
Machine Learning
22
What Is It?
28/06/2017
Machine Learning
• How does an athlete learn to become good at their sport?
• How does a machine learn how to predict outcomes?
• So what is a machine learning algorithm?
23
Why Is It Needed?
28/06/2017
Machine Learning
24
Why Is It Useful For RDM?
28/06/2017
Machine Learning
FORMS
25
How Does It Work?
Machine Learning
• Finding the topic of a file using linear regression
20/06/17
Words (x) Topics (y)
26
Demo
Machine Learning
20/06/17
Introduction
Machine learning and images
28
Facial recognition
Machine learning across document content
Original image
Convert to
grayscale
Extract
face
Find possible
matches
Evaluation of algorithms LBPH, Eigenfaces,
Fisherfaces
TRAINING THE DATA
Allow a user to search for faces within a document corpus or
train the system to recognise individuals
FUTURE
MATCHING FACES IN THE TRAINED MODEL
TRAINING THE MODEL THEN TESTING
28/06/2017
29
Facial recognition
Sometimes makes mistakes…
28/06/2017
30
Image classifiers
TensorFlow machine learning
[”submarine, pigboat, sub, U-boat", "0.989818" ],
["indri, indris, Indri indri, Indri brevicaudatus", "0.00165158"
["killer whale, killer, orca, grampus, sea wolf, Orcinus orca","8.52245e-
05"],
["steam locomotive", "8.31971e-05" ]]},
28/06/2017
31
Review
Machine Learning
20/06/17
• What is it?
• Why is it needed?
• Why is it useful for research data management?
• How does it work?
32
Machine learning exercise
Discussion
Discuss in groups (10 mins):
• How could machine learning be used for RDM?
• Improving RDM:
• What are the ’painful’ manual tasks?
• What could be improved?
• What are the top three priorities?
28/06/2017
Beyond an RDM repository
The future?
34
Spotlight Data
The future
• Deploy text mining/machine learning system within the UK
Government
• Develop the ’next-generation’ of data repository
• Mining data repositories and OA outputs
• Office365 mining and optimisation
• Analysis of the data
28/06/2017
35
EMAIL
mitch@spotlightdata.co.uk
REGISTERED OFFICE
tom@spotlightdata.co.uk
The Ingenuity Centre,
University of Nottingham Innovation Park,
Triumph Road, Nottingham,
NG7 2TU.
Strategic KM Ltd is a Company Registered in England and Wales,
Reg No. 06433359

Weitere ähnliche Inhalte

Was ist angesagt?

Archivematica for research data
Archivematica for research dataArchivematica for research data
Archivematica for research dataJisc RDM
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case studyJisc RDM
 
Presenting RISE
Presenting RISEPresenting RISE
Presenting RISEJisc RDM
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela DappartJisc RDM
 
UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)Christopher Brown
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science CloudJisc RDM
 
Lightning Talks - Intro
Lightning Talks - IntroLightning Talks - Intro
Lightning Talks - IntroJisc RDM
 
RDM landscape in the Netherlands
RDM landscape in the NetherlandsRDM landscape in the Netherlands
RDM landscape in the NetherlandsJisc RDM
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFCJisc RDM
 
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...Nick Sheppard
 
HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021Jisc RDM
 
What I wish I’d known at the start!
What I wish I’d known at the start!What I wish I’d known at the start!
What I wish I’d known at the start!Jisc RDM
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding ProgrammeJisc RDM
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy updateJisc RDM
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc RDM
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaJisc RDM
 
Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Jisc RDM
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc RDM
 
Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016Jisc RDM
 

Was ist angesagt? (20)

Archivematica for research data
Archivematica for research dataArchivematica for research data
Archivematica for research data
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case study
 
Presenting RISE
Presenting RISEPresenting RISE
Presenting RISE
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela Dappart
 
UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science Cloud
 
Lightning Talks - Intro
Lightning Talks - IntroLightning Talks - Intro
Lightning Talks - Intro
 
RDM landscape in the Netherlands
RDM landscape in the NetherlandsRDM landscape in the Netherlands
RDM landscape in the Netherlands
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFC
 
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
 
HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021
 
What I wish I’d known at the start!
What I wish I’d known at the start!What I wish I’d known at the start!
What I wish I’d known at the start!
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding Programme
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy update
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...
 
Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016
 

Ähnlich wie Text mining and machine learning

Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Erin Robinson
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
 
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...Erin Robinson
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma
 
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...Open Science Fair
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceUniversity of Washington
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_Titash Mandal
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision MakingPatrick Sunter
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 

Ähnlich wie Text mining and machine learning (20)

Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScience
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Pl data science october 2017
Pl data science october 2017Pl data science october 2017
Pl data science october 2017
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 

Mehr von Jisc RDM

2019-06_Eunis_Burland
2019-06_Eunis_Burland2019-06_Eunis_Burland
2019-06_Eunis_BurlandJisc RDM
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc RDM
 
Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc RDM
 
Jisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc RDM
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data ModellingJisc RDM
 
Building a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewBuilding a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewJisc RDM
 
Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Jisc RDM
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data ToolkitJisc RDM
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318Jisc RDM
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okJisc RDM
 
Fair data - dinkum research - by Andy Turner
Fair data -  dinkum research - by Andy TurnerFair data -  dinkum research - by Andy Turner
Fair data - dinkum research - by Andy TurnerJisc RDM
 
2018 03 codata - making the case
2018 03 codata - making the case2018 03 codata - making the case
2018 03 codata - making the caseJisc RDM
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPCJisc RDM
 
Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Jisc RDM
 
Managing data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMManaging data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMJisc RDM
 
Managing data behind creative masterpieces
Managing data behind creative masterpiecesManaging data behind creative masterpieces
Managing data behind creative masterpiecesJisc RDM
 
Lightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanLightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanJisc RDM
 
Lightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardLightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardJisc RDM
 
Lightning talk - Adam Harwood
Lightning talk - Adam HarwoodLightning talk - Adam Harwood
Lightning talk - Adam HarwoodJisc RDM
 
Lightning Talk - Chris Awre
Lightning Talk - Chris AwreLightning Talk - Chris Awre
Lightning Talk - Chris AwreJisc RDM
 

Mehr von Jisc RDM (20)

2019-06_Eunis_Burland
2019-06_Eunis_Burland2019-06_Eunis_Burland
2019-06_Eunis_Burland
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 Paper
 
Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7
 
Jisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case study
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data Modelling
 
Building a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewBuilding a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture Overview
 
Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data Toolkit
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) ok
 
Fair data - dinkum research - by Andy Turner
Fair data -  dinkum research - by Andy TurnerFair data -  dinkum research - by Andy Turner
Fair data - dinkum research - by Andy Turner
 
2018 03 codata - making the case
2018 03 codata - making the case2018 03 codata - making the case
2018 03 codata - making the case
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPC
 
Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1
 
Managing data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMManaging data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCM
 
Managing data behind creative masterpieces
Managing data behind creative masterpiecesManaging data behind creative masterpieces
Managing data behind creative masterpieces
 
Lightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanLightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellan
 
Lightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardLightning Talk - Nick Sheppard
Lightning Talk - Nick Sheppard
 
Lightning talk - Adam Harwood
Lightning talk - Adam HarwoodLightning talk - Adam Harwood
Lightning talk - Adam Harwood
 
Lightning Talk - Chris Awre
Lightning Talk - Chris AwreLightning Talk - Chris Awre
Lightning Talk - Chris Awre
 

Kürzlich hochgeladen

Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Christina Parmionova
 
2024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 292024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 29JSchaus & Associates
 
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Climate change and occupational safety and health.
Climate change and occupational safety and health.Climate change and occupational safety and health.
Climate change and occupational safety and health.Christina Parmionova
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxPeter Miles
 
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation -  Humble BeginningsZechariah Boodey Farmstead Collaborative presentation -  Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginningsinfo695895
 
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Climate change and safety and health at work
Climate change and safety and health at workClimate change and safety and health at work
Climate change and safety and health at workChristina Parmionova
 
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…nishakur201
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxaaryamanorathofficia
 
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...anilsa9823
 
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...Suhani Kapoor
 
WIPO magazine issue -1 - 2024 World Intellectual Property organization.
WIPO magazine issue -1 - 2024 World Intellectual Property organization.WIPO magazine issue -1 - 2024 World Intellectual Property organization.
WIPO magazine issue -1 - 2024 World Intellectual Property organization.Christina Parmionova
 
Fair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTFair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTaccounts329278
 
CBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCongressional Budget Office
 
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Roomishabajaj13
 

Kürzlich hochgeladen (20)

Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.
 
2024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 292024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 29
 
How to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the ThreatHow to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the Threat
 
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
 
Climate change and occupational safety and health.
Climate change and occupational safety and health.Climate change and occupational safety and health.
Climate change and occupational safety and health.
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
 
Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...
Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...
Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...
 
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation -  Humble BeginningsZechariah Boodey Farmstead Collaborative presentation -  Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
 
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
(ANIKA) Call Girls Wadki ( 7001035870 ) HI-Fi Pune Escorts Service
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
 
Climate change and safety and health at work
Climate change and safety and health at workClimate change and safety and health at work
Climate change and safety and health at work
 
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptx
 
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
 
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
 
WIPO magazine issue -1 - 2024 World Intellectual Property organization.
WIPO magazine issue -1 - 2024 World Intellectual Property organization.WIPO magazine issue -1 - 2024 World Intellectual Property organization.
WIPO magazine issue -1 - 2024 World Intellectual Property organization.
 
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
Fair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTFair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CT
 
CBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related Topics
 
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
 

Text mining and machine learning

  • 1. Extract – Analyse – Search - Visualise Text mining and machine learning for Research Data Management Dr Tom Parsons and Mitchell Murphy 28/06/2017
  • 2. 2 Co founder, RDM, Knowledge Management DR. TOM PARSONS React.js panel and Node.js WILL EVANS Python/R data scientist Machine learning and computer vision DR. STUART BOWE & MITCH MURPHY Co founder, Software delivery TIM VENISON Python, architecture, processing pipeline BARNABY KEENE About Spotlight Data Rapid development of innovative products OUR AGILE CROSS FUNCTIONAL TEAM 28/06/2017 Developers, architects and researchers POOL OF ASSOCIATES AND PLACEMENTS
  • 3. 3 Gathering and making sense of unstructured data captured from a variety of sources We use charting, network graphs, maps and other techniques for data investigation Mining data from archives, websites social media and API sources Analysis Tools From simple interfaces and powerful searches to end to end large scale processing systems We utilise machine learning techniques to extract and investigate data. What we do Data science Dark DataData Mining Data VisualisationArtificial Intelligence 28/06/2017
  • 4. 4 Spotlight Data Projects • Large project with the UK Government and Durham University: • Applying text mining and machine learning to large data sets and document corpora • Twitter and social media mining for ESRC Climate Change project • Sensor data analysis and machine learning 28/06/2017
  • 5. 5 The Nanowire system Cloud or on premise Microservice containerised architecture Ingest DiscoverProcess Workers User panel User panel Data Processing – Natural Language Processing, text mining, classifiers, pattern recognition MQ Pre- process Storage 28/06/2017
  • 6. 6 Ability to process structured and unstructured data DATA PROCESSING CAPABILITY Built to adapt to use cases that constantly evolve through a microservice architecture ADAPTABILITY Design for all levels of users with continual improvement USER EXPERIENCE Cloud and infrastructure agnostic with the ability to scale from 100s to millions of files SCALING The ability to quickly change releases on a fast and robust deployment system FAST DEPLOYMENT All components to be tested prior to release in a continuous integration and deployment cycle TESTED Nanowire goals Development targets Utilising open source libraries with a permissive licence. OPEN SOURCE All services to be provided as Docker containers by default, with no external dependencies CONTAINERISED 28/06/2017
  • 8. 8 Text mining What to do with this information: • Mine information for research? • Develop new products and drive innovation • Allow reuse of research data? 28/06/2017 “The discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. A key element is the linking ... of the extracted information ... to form new facts or new hypotheses to be explored further” (Hearst, 2003) “An estimated 2.4 million scientific articles published every year” Research Consulting TDM report
  • 9. 9 Text mining Extracting information Choose sources Extract text Clean text Analysis Clustering Results 28/06/2017 DATABASES, FILES, FOLDERS, OFFICE 365 NATURAL LANGUAGE PROCESSING – ENTITIES, CONCEPTS, TOPICS, KEYWORDS, SENTIMENT STOP WORD REMOVAL, TOKENISATION
  • 12. 12 Enhanced data storage JSON Linked Data format { "@context": "http://schema.org", "@type": "DigitalDocument", "mentions": [ { "@type": "Person", "email": "tom.parsons@nottingham.ac.uk" }, { "@type": "Thing", "url": "http://admire.jiscinvolve.org/wp/" } ], "spatialCoverage": [ { "@type": "Place", "name": "Manchester" }, { "@type": "Place", "name": "British Library" }, { "@type": "Place", "name": "Nottingham" } ], "keywords": "rdm,project,nottingham,support,research data", "inLanguage": { "@type": "Language", "name": "English" }, "typicalAgeRange": ">=18" } ANALYSIS RESULTS VALIDATED JSON-LD 28/06/2017
  • 13. 13 Linking text to data Relationships between data, articles and people 28/06/2017 RESEARCH OUTPUTS AUTHORS, ACADEMICS, PI/CO-I UNIVERSITIES, LOCATIONS
  • 14. 14 Linking text to data Typical metadata 28/06/2017
  • 15. 15 Linking text to data Data tables 28/06/2017 Data set: https://www.repository.cam.ac.uk/handle/1810/32806
  • 16. 16 Linking text to data Automated relationships between data, articles and people 28/06/2017 RESEARCH OUTPUTS AUTHORS, ACADEMICS, PI/CO-I UNIVERSITIES, LOCATIONS COMPACT SILTY-LOAM SOIL 2 COURTYARD DEPOSIT BY 2 DEPOSIT BY OVEN 2 DEPOSIT WHITE THIN 2 FI9710 ASHY COURTYARD 2 IIID 5705 FI9710 2 LAYER OF PHYTOLITHS 9 RESIDUE FROM POT 2 RM 4 RESIDUE 2 RM 97 BURNT 2 THIN LAYER OF 2 WHITE LAYER OF 7 WHITE THIN LAYER 2 Citation: Madella, M. (2004). Kilise Tepe Monograph Section F2 Phytolith Data Table 1 Madella, M. URL: https://www.repository.cam.ac.uk/handle/1810/32806 Places: Europe, Turkey Organisations: University of Cambridge Densham, M. URL: https://www.repository.cam.ac.uk/han dle/1810/33130
  • 17. 17 Search and discovery Graph databases 28/06/2017 RESEARCH OUTPUTS RELATED TO PHYTOLITHS AUTHORS CONNECTED TO MULTIPLE KILISE TEPE TOPICS
  • 19. 19 Discussion Text mining • Discuss in groups for 10 minutes: • Sources of text and data (files, images, video etc.) • How could text mining be used for RDM? • What do you struggle with? • What are the top three priorities? 28/06/2017
  • 21. 21 Overview • What is it? • Why is it needed? • Why is it useful for research data management? • How does it work? • Demo 28/06/2017 Machine Learning
  • 22. 22 What Is It? 28/06/2017 Machine Learning • How does an athlete learn to become good at their sport? • How does a machine learn how to predict outcomes? • So what is a machine learning algorithm?
  • 23. 23 Why Is It Needed? 28/06/2017 Machine Learning
  • 24. 24 Why Is It Useful For RDM? 28/06/2017 Machine Learning FORMS
  • 25. 25 How Does It Work? Machine Learning • Finding the topic of a file using linear regression 20/06/17 Words (x) Topics (y)
  • 28. 28 Facial recognition Machine learning across document content Original image Convert to grayscale Extract face Find possible matches Evaluation of algorithms LBPH, Eigenfaces, Fisherfaces TRAINING THE DATA Allow a user to search for faces within a document corpus or train the system to recognise individuals FUTURE MATCHING FACES IN THE TRAINED MODEL TRAINING THE MODEL THEN TESTING 28/06/2017
  • 29. 29 Facial recognition Sometimes makes mistakes… 28/06/2017
  • 30. 30 Image classifiers TensorFlow machine learning [”submarine, pigboat, sub, U-boat", "0.989818" ], ["indri, indris, Indri indri, Indri brevicaudatus", "0.00165158" ["killer whale, killer, orca, grampus, sea wolf, Orcinus orca","8.52245e- 05"], ["steam locomotive", "8.31971e-05" ]]}, 28/06/2017
  • 31. 31 Review Machine Learning 20/06/17 • What is it? • Why is it needed? • Why is it useful for research data management? • How does it work?
  • 32. 32 Machine learning exercise Discussion Discuss in groups (10 mins): • How could machine learning be used for RDM? • Improving RDM: • What are the ’painful’ manual tasks? • What could be improved? • What are the top three priorities? 28/06/2017
  • 33. Beyond an RDM repository The future?
  • 34. 34 Spotlight Data The future • Deploy text mining/machine learning system within the UK Government • Develop the ’next-generation’ of data repository • Mining data repositories and OA outputs • Office365 mining and optimisation • Analysis of the data 28/06/2017
  • 35. 35 EMAIL mitch@spotlightdata.co.uk REGISTERED OFFICE tom@spotlightdata.co.uk The Ingenuity Centre, University of Nottingham Innovation Park, Triumph Road, Nottingham, NG7 2TU. Strategic KM Ltd is a Company Registered in England and Wales, Reg No. 06433359