SlideShare ist ein Scribd-Unternehmen logo
1 von 43
TEXT MINING, TERM MINING,
AND VISUALIZATION
IMPROVING THE IMPACT OF SCHOLARLY
PUBLISHING
MONDAY 16 APRIL 2012
NICE, FRANCE
Marjorie M.K. Hlava, President
Jay Ven Eman, CEO
Access Innovations, Inc.
mhlava@accessinn.com
J_ven_eman@accessinn.com
1
What we will cover today
• Term and Text Mining
• The basics of visualization
• Case studies
• Using subject terms as metrics
• Applications
• Visualizing the results
Definitions
• Term Mining - a systematic comparison processing
algorithmic method to find patterns in text
• Text Mining – using controlled vocabulary tags in text to
find patterns and directions
• Term & text mining
 Many similarities
 Can be complimentary; not mutually exclusive
Term mining
• Precise
 Meaningful semantic relationships; contextual
 Replicable; repeatable; consistent
 Vetted; controlled
 Based on a controlled vocabulary
 Trends; gaps; relationship analysis; visualizations
 Less data processing load
Text mining
 Algorithmic; formulaic
 Neural nets, statistical, latent semantic, co -
occurrence
 Serendipitous relationships
 Sentiment; hot topics; trends
 False drops; noise;
 Misleading semantic relationships
 Heavy processing load
Why take a visual look?
• Humans can process information 17 times faster in visual
presentations
• Now data can be analyzed, manipulated and presented as visual
displays.
• To see the trends effectively we need to make the data into rich
graph-able formats
6
Visualization of data
• Needs
− Measurement
− Metrics
− Numbers
• Shows
− Adjacency
− Relationships
− Trends
− Co – occurrence
− Conceptual distance
• Is richer with
− Linking
− Semantic enrichment
− Classification
• Supports
− Forecasting
− Trend analysis
− Segmentation
− Distribution
7
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Man’s attention to
visual display to convey
knowledge is ancient
8
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The art in maps
is a
longstanding
tradition
9
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Super imposing data is now common
A mash up example
10
Traffic Injury Map
UK Data Archive
US National Highway
Safety Administration
Google Maps Base
Accident categories include
children
automobile
bicycle
etc.
Data
time
place
type
Source:
JISC TechWatch: Data Mash-ups September 2010
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Mash up of bird flight migrations and
weather patterns
http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be
11
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
How does it work?
 Develop controlled vocabulary
Âť Prefer one with hierarchy
 Apply to full text
» Or to the “heads”
 Decide on data points to convey information
 Divide the XML into graphable sections
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Start with data – like this XML file
14
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Index or tag using subject terms from
thesaurus or taxonomy
 date, category, taxonomy term, frequency
15
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Many views of one set of data
16
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Load to a visualization program
Like Prefuse
17
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Or Pajek
18
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
National Information Center
for Educational Media
Albuquerque’s own
Âť Sandia developed VxInsight
Âť Access Innovations = NICEM
Same data – several views
Primary and Secondary Education in US
Shows the US Valley of Science
Little Science taught in elementary years
20
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Using visualization to show
 From a society / publisher perspective
Âť Identify Core, Boundary and Cross Border
Âť Provides Indicators
 Activity
 Growth
 Relatedness
 Centrality
Âť Locates Journal domains
 From a thesaurus perspective
Âť Identifies terms that are too broadly defined
Âť Potential Improvements in thesaurus structure using topic
structures
23
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Case Study:
Mapping IEEE thesaurus space
 We are interested in an expanded map that
includes adjacencies to the IEEE data
Âť Expanded term set shows adjacent white space;
opportunities for expansion
 Overlaps and edges of the science
Âť We need comparison data
 Learn the directions in the field
Âť Low occurrence rate in IEEE documents?
Âť Linkage to terms in IEEE documents?
 Where do we find these terms? How can we
add them?
24
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The process
 Built a rule base to auto index IEEE content
» “90 % accuracy out of the box on journal data”*
» “80% out of the box on proceedings data”*
 The overlapping data sets
Âť Auto indexed 1.2 million Xplore records
Âť Auto indexed 10 years of US Patent data
Âť Auto indexed 10 years of Medline
 Term sets used
Âť IEEE thesaurus terms rule base
Âť Medical Subject Headings (MeSH) (and simple rule base)
Âť Defense Technical Information Center (DTIC) Thesaurus (
and simple rule base)
Âť Similar level of detail to current IEEE thesaurus terms
25
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Defining expanded term space
26
IEEE
2kterms
1.2M documents
1. The data - Select related corpus
14kDTIC
475k patents
24kMeSH
PubMed
525k docs
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Defining expanded term space
27
IEEE
2kterms
1.2M documents
2. Identify related terms
Use the IEEE Thesaurus to index the three collections
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Defining expanded term space
28
IEEE
2kterms
1.2M documents
2. Identify related terms
Use MESH and DTIC to also index the three collections
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29
IEEE
2kterms
1.2M documents
3. Resulting term set
The co-indexed items from the three collections
Defining expanded term space
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30
4. Term:Term Matrix
Where do the articles and their indexing intersect?
Defining expanded term space
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31
Visualization Strategies
Matrix
Visualization
Software
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32
All data up-posted to the top level
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33
Many map options
IEEE ExperiencePrevious Experience
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34
Sensors
Council
Nucl Plasma
Sci Soc
Nanotech
Council
Ultrason,
Ferro …
Prod Saf
Engng Soc
Oceanic
Engng Soc
Geosci Rem
Sens Soc
Council
Supercond
Compon,
Packag …
Instr
Measur Soc
Magnetics
Soc
Dielectr El
Insul Soc
Electromag
Compat Soc
Antennas
Propag Soc
Power
Electron Soc
Electron
Dev Soc
Circuits &
Systems
Power &
Energy Soc
Industry
Appl Soc
Solid St
Circuits Soc
Industr
Electr Soc
Microwave
Theory Soc
Aerosp
Electr
Sys Soc
Sys Man
Cyber
Society
Computer
Intelligence
Society
Systems
Council
Reliability
Society Education
Society
Prof
Commun
Society
Computer
Society
Robot
Autom Soc
Social
Impl Techn
Council Electr
Design Auto
Signal
Proc Soc
Intell Transp
Sys Soc
Commun
Soc
Info Theory
Soc
Vehicular
Techn Soc
Consumer
Electr Soc
Broadcast
Techn Soc
Photonics
Soc
Eng Med
Biol Sci
IEEE Portfolio
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35
Radial Visualization
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36
Publication Strategy
JASIST reference
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37
Conference Strategy
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38
Turbines
Measurement
Circuits
Amplifiers
Displays
Games
Toys
Flow
Cooling
Heating
Components
Gearing
Brakes
Dynamics
Vehicles,
Parts
Disk
Optics
Photochem
Molding
Conductors
Coatings
Lasers
Lamps
Motors
Plants,
Micro-orgs
Control
Boats
Oilfield
Services
Med Instruments
Welding
Conveyers
Rubber
Acyclic Comp
Footwear
Lubricants
Radiology
Catalysis
Macromolecules
Sprayers
Electrochem
Fitness
Hygiene
Cleaning
Printing
Paper
IC Engines
Magn/Elect
Magnets
Textiles
Layers
Medical
Devices
Clocks
Pipes
Valves
Blasting
Cables
Appliances
Outerwear
Exhaust
Pumps
Packaging
Aircraft
Semiconductors
Use a Thesaurus to Label Maps
Agriculture
Food
Consumer
Products
Construction
Automotive +
Defense
Industrial
Products
Leisure
Energy
Telecom Computer
HW/SW
Electronics
Chemicals
Pharma
Metals
Health Care
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Questions Answered
 Is there a way, using our own information, to forecast our
direction?
 Where is the industry headed? What about by technology
sector?
 Does our coverage match our mission and vision?
 Can we become smarter about our data and potential
markets using our collection in new ways?
Are the societies publishing and talking about what their
charter indicates they cover?
 What are the trends – are topics emerging/cooling?
 Can we use technology and our own data to explore these
questions while enhancing our data?
39
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The research team
 Access Innovations / Data Harmony
Âť Founded in 1978
Âť Data enrichment and normalization
Âť Suite of Semantic Enrichment tools
 SciTechStrategies
Âť Understanding data through visualization
 IEEE Indexing & Abstracting Group
40
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
We looked at
visualization of data
 Finding the Metrics
Âť Measurement
Âť Numbers
Âť Terms as indicators
 Ways to show
Âť Adjacency
Âť Relationships
Âť Trends
» Co – occurrence
Âť Conceptual distance
 How to enrich with
Âť Linking
Âť Semantic enrichment
Âť Classification
 Maps supporting
Âť Forecasting
Âť Trend analysis
Âť Segmentation
Âť Distribution
41
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Effective maps require
 Contextual data
 Detailed data
 Classification methods
 At least two directions in the matrix
 A little art for fun
42
43
It just takes a little imagination
Thank you
Marjorie M.K. Hlava
President
mhlava@accessinn.com
Jay Ven Eman, CEO
J_ven_eman@accessinn.com
, Access Innovations
505-998-0800

Weitere ähnliche Inhalte

Was ist angesagt?

The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
Public engagement while you sleep
Public engagement while you sleepPublic engagement while you sleep
Public engagement while you sleepUoLResearchSupport
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014ICPSR
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...ICPSR
 
Introduction and E-Research Timeline Review
Introduction and E-Research Timeline ReviewIntroduction and E-Research Timeline Review
Introduction and E-Research Timeline ReviewKhadak Raj Adhikari
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipICPSR
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPISteven Miller
 
IBM Watson Classroom Experience
IBM Watson Classroom ExperienceIBM Watson Classroom Experience
IBM Watson Classroom ExperienceSteven Miller
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey BoultonLEARN Project
 

Was ist angesagt? (20)

The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Public engagement while you sleep
Public engagement while you sleepPublic engagement while you sleep
Public engagement while you sleep
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Introduction and E-Research Timeline Review
Introduction and E-Research Timeline ReviewIntroduction and E-Research Timeline Review
Introduction and E-Research Timeline Review
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the Challenge
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPI
 
IBM Watson Classroom Experience
IBM Watson Classroom ExperienceIBM Watson Classroom Experience
IBM Watson Classroom Experience
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey Boulton
 

Ähnlich wie Text Mining, Term Mining, and Visualization - Improving the Impact of Scholarly Publishing

II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...Dr. Haxel Consult
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Access Innovations, Inc.
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...TSoholt
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsArtificial Intelligence Institute at UofSC
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
we to deep learning
we to deep learning we to deep learning
we to deep learning TemesgenHabtamu
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for InsTemesgenHabtamu
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAResearch Data Alliance
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 

Ähnlich wie Text Mining, Term Mining, and Visualization - Improving the Impact of Scholarly Publishing (20)

II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
we to deep learning
we to deep learning we to deep learning
we to deep learning
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for Ins
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDA
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Öppen data och forskningens genomslag
Öppen data och forskningens genomslagÖppen data och forskningens genomslag
Öppen data och forskningens genomslag
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 

Mehr von Access Innovations, Inc.

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingAccess Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 

Mehr von Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

KĂźrzlich hochgeladen

Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

KĂźrzlich hochgeladen (20)

Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Text Mining, Term Mining, and Visualization - Improving the Impact of Scholarly Publishing

  • 1. TEXT MINING, TERM MINING, AND VISUALIZATION IMPROVING THE IMPACT OF SCHOLARLY PUBLISHING MONDAY 16 APRIL 2012 NICE, FRANCE Marjorie M.K. Hlava, President Jay Ven Eman, CEO Access Innovations, Inc. mhlava@accessinn.com J_ven_eman@accessinn.com 1
  • 2. What we will cover today • Term and Text Mining • The basics of visualization • Case studies • Using subject terms as metrics • Applications • Visualizing the results
  • 3. Definitions • Term Mining - a systematic comparison processing algorithmic method to find patterns in text • Text Mining – using controlled vocabulary tags in text to find patterns and directions • Term & text mining  Many similarities  Can be complimentary; not mutually exclusive
  • 4. Term mining • Precise  Meaningful semantic relationships; contextual  Replicable; repeatable; consistent  Vetted; controlled  Based on a controlled vocabulary  Trends; gaps; relationship analysis; visualizations  Less data processing load
  • 5. Text mining  Algorithmic; formulaic  Neural nets, statistical, latent semantic, co - occurrence  Serendipitous relationships  Sentiment; hot topics; trends  False drops; noise;  Misleading semantic relationships  Heavy processing load
  • 6. Why take a visual look? • Humans can process information 17 times faster in visual presentations • Now data can be analyzed, manipulated and presented as visual displays. • To see the trends effectively we need to make the data into rich graph-able formats 6
  • 7. Visualization of data • Needs − Measurement − Metrics − Numbers • Shows − Adjacency − Relationships − Trends − Co – occurrence − Conceptual distance • Is richer with − Linking − Semantic enrichment − Classification • Supports − Forecasting − Trend analysis − Segmentation − Distribution 7
  • 8. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Man’s attention to visual display to convey knowledge is ancient 8
  • 9. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony The art in maps is a longstanding tradition 9
  • 10. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Super imposing data is now common A mash up example 10 Traffic Injury Map UK Data Archive US National Highway Safety Administration Google Maps Base Accident categories include children automobile bicycle etc. Data time place type Source: JISC TechWatch: Data Mash-ups September 2010
  • 11. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Mash up of bird flight migrations and weather patterns http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be 11
  • 12. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded
  • 13. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony How does it work?  Develop controlled vocabulary Âť Prefer one with hierarchy  Apply to full text Âť Or to the “heads”  Decide on data points to convey information  Divide the XML into graphable sections
  • 14. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Start with data – like this XML file 14
  • 15. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Index or tag using subject terms from thesaurus or taxonomy  date, category, taxonomy term, frequency 15
  • 16. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Many views of one set of data 16
  • 17. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Load to a visualization program Like Prefuse 17
  • 18. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Or Pajek 18
  • 19. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
  • 20. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony National Information Center for Educational Media Albuquerque’s own Âť Sandia developed VxInsight Âť Access Innovations = NICEM Same data – several views Primary and Secondary Education in US Shows the US Valley of Science Little Science taught in elementary years 20
  • 21. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 22. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 23. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Using visualization to show  From a society / publisher perspective Âť Identify Core, Boundary and Cross Border Âť Provides Indicators  Activity  Growth  Relatedness  Centrality Âť Locates Journal domains  From a thesaurus perspective Âť Identifies terms that are too broadly defined Âť Potential Improvements in thesaurus structure using topic structures 23
  • 24. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Case Study: Mapping IEEE thesaurus space  We are interested in an expanded map that includes adjacencies to the IEEE data Âť Expanded term set shows adjacent white space; opportunities for expansion  Overlaps and edges of the science Âť We need comparison data  Learn the directions in the field Âť Low occurrence rate in IEEE documents? Âť Linkage to terms in IEEE documents?  Where do we find these terms? How can we add them? 24
  • 25. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony The process  Built a rule base to auto index IEEE content Âť “90 % accuracy out of the box on journal data”* Âť “80% out of the box on proceedings data”*  The overlapping data sets Âť Auto indexed 1.2 million Xplore records Âť Auto indexed 10 years of US Patent data Âť Auto indexed 10 years of Medline  Term sets used Âť IEEE thesaurus terms rule base Âť Medical Subject Headings (MeSH) (and simple rule base) Âť Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base) Âť Similar level of detail to current IEEE thesaurus terms 25
  • 26. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Defining expanded term space 26 IEEE 2kterms 1.2M documents 1. The data - Select related corpus 14kDTIC 475k patents 24kMeSH PubMed 525k docs
  • 27. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Defining expanded term space 27 IEEE 2kterms 1.2M documents 2. Identify related terms Use the IEEE Thesaurus to index the three collections
  • 28. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Defining expanded term space 28 IEEE 2kterms 1.2M documents 2. Identify related terms Use MESH and DTIC to also index the three collections
  • 29. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29 IEEE 2kterms 1.2M documents 3. Resulting term set The co-indexed items from the three collections Defining expanded term space
  • 30. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30 4. Term:Term Matrix Where do the articles and their indexing intersect? Defining expanded term space
  • 31. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31 Visualization Strategies Matrix Visualization Software
  • 32. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32 All data up-posted to the top level
  • 33. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33 Many map options IEEE ExperiencePrevious Experience
  • 34. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34 Sensors Council Nucl Plasma Sci Soc Nanotech Council Ultrason, Ferro … Prod Saf Engng Soc Oceanic Engng Soc Geosci Rem Sens Soc Council Supercond Compon, Packag … Instr Measur Soc Magnetics Soc Dielectr El Insul Soc Electromag Compat Soc Antennas Propag Soc Power Electron Soc Electron Dev Soc Circuits & Systems Power & Energy Soc Industry Appl Soc Solid St Circuits Soc Industr Electr Soc Microwave Theory Soc Aerosp Electr Sys Soc Sys Man Cyber Society Computer Intelligence Society Systems Council Reliability Society Education Society Prof Commun Society Computer Society Robot Autom Soc Social Impl Techn Council Electr Design Auto Signal Proc Soc Intell Transp Sys Soc Commun Soc Info Theory Soc Vehicular Techn Soc Consumer Electr Soc Broadcast Techn Soc Photonics Soc Eng Med Biol Sci IEEE Portfolio
  • 35. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35 Radial Visualization
  • 36. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36 Publication Strategy JASIST reference
  • 37. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37 Conference Strategy
  • 38. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38 Turbines Measurement Circuits Amplifiers Displays Games Toys Flow Cooling Heating Components Gearing Brakes Dynamics Vehicles, Parts Disk Optics Photochem Molding Conductors Coatings Lasers Lamps Motors Plants, Micro-orgs Control Boats Oilfield Services Med Instruments Welding Conveyers Rubber Acyclic Comp Footwear Lubricants Radiology Catalysis Macromolecules Sprayers Electrochem Fitness Hygiene Cleaning Printing Paper IC Engines Magn/Elect Magnets Textiles Layers Medical Devices Clocks Pipes Valves Blasting Cables Appliances Outerwear Exhaust Pumps Packaging Aircraft Semiconductors Use a Thesaurus to Label Maps Agriculture Food Consumer Products Construction Automotive + Defense Industrial Products Leisure Energy Telecom Computer HW/SW Electronics Chemicals Pharma Metals Health Care
  • 39. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Questions Answered  Is there a way, using our own information, to forecast our direction?  Where is the industry headed? What about by technology sector?  Does our coverage match our mission and vision?  Can we become smarter about our data and potential markets using our collection in new ways? Are the societies publishing and talking about what their charter indicates they cover?  What are the trends – are topics emerging/cooling?  Can we use technology and our own data to explore these questions while enhancing our data? 39
  • 40. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony The research team  Access Innovations / Data Harmony Âť Founded in 1978 Âť Data enrichment and normalization Âť Suite of Semantic Enrichment tools  SciTechStrategies Âť Understanding data through visualization  IEEE Indexing & Abstracting Group 40
  • 41. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony We looked at visualization of data  Finding the Metrics Âť Measurement Âť Numbers Âť Terms as indicators  Ways to show Âť Adjacency Âť Relationships Âť Trends Âť Co – occurrence Âť Conceptual distance  How to enrich with Âť Linking Âť Semantic enrichment Âť Classification  Maps supporting Âť Forecasting Âť Trend analysis Âť Segmentation Âť Distribution 41
  • 42. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Effective maps require  Contextual data  Detailed data  Classification methods  At least two directions in the matrix  A little art for fun 42
  • 43. 43 It just takes a little imagination Thank you Marjorie M.K. Hlava President mhlava@accessinn.com Jay Ven Eman, CEO J_ven_eman@accessinn.com , Access Innovations 505-998-0800

Hinweis der Redaktion

  1. Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.