Analyzing trajectories of technological knowledge, Dr Arho Suominen
1. VTT TECHNICAL RESEARCH CENTRE OF FINLAND LTD
Analyzing trajectories of
technological knowledge
Topic modelling approach to knowledge depth and breadth
The 1st Annual International Conference of the
IEEE Technology and Engineering Management
Society
Dr. Arho Suominen
2. 2
KNOWLEDGE – THE CORE ASSET OF CORPORATIONS
MANAGING IT REQUIRES US TO KNOW WHAT INTERNAL AND EXTERNAL KNOWLEDGE IS AVAILABLE
3. 331/05/2017 3
INTRODUCTION
Technology management and planning requires that we are
able to quantify knowledge embedded in and outside the
organization.
Depth and breadth of knowledge are the main dimensions used
to make this happen.
Knowledge depth is defined as an actors level of expertise or
sophistication.
Knowledge breadth is defined an actors capabilities to exploit
adjacent technologies or the multi-dimensionality of its knowledge
base.
Knowledge depth and breadth have been shown to have a
significant impact to company performance
4. 431/05/2017 4
WHAT WE HAVE DONE BEFORE
THAT MIGHT HAVE SOME LIMITATIONS
Patent data, admit its caveats, have been seen as the most
practical vantage point into a companies knowledge.
Previous studies have operationalized companies knowledge
structure by looking at patent classifications:
This approach has significant caveats, due to
classifications errors,
overall noisiness
challenges related to the taxonomy of patents and
the classification system inability represent novelty by forcing new
thing in historical classes
Above is written with the understanding that there have been
recent studies looking at keyword and machine learning based
approaches in operationalizing patents.
5. TWO EXAMPLES WHY OUR APPROACH CAN ADD VALUE
OVERCOMING LIMITATIONS OF PATENT CLASSIFICATIONS AND ABSTRACT BASED ANALYSIS
6. 6631/05/2017
CLASSIFICATION VS. MACHINE LEARNING
MACHINE LEARNED TOPICS ALIGN POORLY WITH HUMAN CLASSIFICATION
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
Topic1
Topic7
Topic13
Topic19
Topic25
Topic31
Topic37
Topic43
Topic49
Topic55
Topic61
Topic67
Topic73
Analysis of
biological materials
Audio-visual
technology
Basic
communication
processes
Basic materials
chemistry
Biotechnology
Chemical
engineering
7. 731/05/2017 7
CLASSIFICATION VS. MACHINE LEARNING
MACHINE LEARNED TOPICS ALIGN POORLY WITH HUMAN CLASSIFICATION
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
Topic1
Topic3
Topic5
Topic7
Topic9
Topic11
Topic13
Topic15
Topic17
Topic19
Topic21
Topic23
Topic25
Topic27
Topic29
Topic31
Topic33
Topic35
Topic37
Topic39
Topic41
Topic43
Topic45
Topic47
Topic49
Topic51
Topic53
Topic55
Topic57
Topic59
Topic61
Topic63
Topic65
Topic67
Topic69
Topic71
Topic73
Topic75
Analysis of biological materials
Audio-visual technology
Basic communication processes
Basic materials chemistry
Biotechnology
Chemical engineering
Civil engineering
Computer technology
Control
Digital communication
Electrical machinery, apparatus, energy
Engines, pumps, turbines
Environmental technology
Food chemistry
Furniture, games
Handling
8. 8831/05/2017
EXAMPLE US9185203B2
Mobile device display management
Abstract
The display of a mobile device is managed
during a voice communication session using
a proximity sensor and an accelerometer. In
one example, the display of a mobile device
is turned off during a phone call on the
mobile device when a proximity sensor
detects an object is proximate the device
and an accelerometer determines the device
is in a first orientation.
In total 62 words
EXAMPLE US9185203B2
Mobile device display management
Description
Background…
Summary…
Brief Description of Drawings…
Detailed description…
In total 8886 words
CLASSIFICATION VS. MACHINE LEARNING
MACHINE LEARNED TOPICS ALIGN POORLY WITH HUMAN CLASSIFICATION
10. 1031/05/2017 10
Unsupervised learning
Produces an outcome based on an input while not receiving any
feedback from the environment.
reliance on a formal framework that enables the algorithm to find
patterns.
Topic models " ...can extract surprisingly interpretable and useful
structure without any explicit "understanding" of the language by
computer".
As a simplification each document in a corpus is a random
mixture over latent topics, and each latent topic is characterized
by a distribution over words.
11. 1131/05/2017 11
DATA, PRE-PROCESSING, AND ANALYSIS
SAMPLE
From the telecommunication industry
Alcatel-Lucent, Apple, Google, Huawei, Microsoft, Nokia and Samsung
Electronics
The analysis was limited to a time period from 2001 to 2014.
METHOD
Analyzed sample companies knowledge base with unsupervised
learning using patent data as proxy.
DATA SOURCE
full-text patent descriptions filed in the USPTO containing
approximately 6 million patents. The repository, owned by Teqmine
Analytics Ltd
Final data contains 157 718 records.
12. 1231/05/2017 12
DATA, PRE-PROCESSING, AND ANALYSIS
Topic 1 Topic 2 … Topic N
Patent 1 0.10 0.24 0.40
Patent 2 0.40 0.01 0.10
…
Patent N 0.01 0.80 0.01
Topic 2
Topic N
Topic 1
Patent 2
Patent 1
Patent N
13. 131331/05/2017
DATA, PRE-PROCESSING, AND ANALYSIS
ALGORITHM: LDA
The algorithm is based on an online
variational Bayes algorithm for LDA [9]
Number of Topics used was set using a
trial-and-error approach to 75.
IMPLEMENTATION: Python
Python implementation included pre-
processing
ANALYSIS: Gephi, Python, Excel
Gephi was used to create visuals from the
soft classification created by the algorithm.
Python was used to pivot the document
topic probability matrix by company to a
sum of probabilities by company in a given
year
Excel was used to calculate TD defined as:
17. 1731/05/2017 17
INSIGHT: TELECOMMUNICATION INDUSTRY
Sample telecommunication companies with a decreasing technological diversity value. X-axis is years and Y-axis is
Technological Diversity (TD), calculated for each company.
18. 1831/05/2017 18
Insight on the telecommunication industry
Sample telecommunication companies with a increasing technological diversity value. X-axis is years and Y-axis is
Technological Diversity (TD), calculated for each company. Largest increase in TD from Google, for which the linear
trend line is given with fit values.
19. 1931/05/2017 19
INSIGHT: TELECOMMUNICATION INDUSTRY
Correlation between technological diversity and count of patents.
p-value is higher than 0.05 the results of the correlation were not
statistically significant (r(109) = 0.17, p = 0.077)
A multiple linear regression was calculated to predict the
technological diversity based on patent count and company
A significant regression equation was found F(8, 102), 35.99, p =
.000 with an R2 = 0.73
Google, Huawei and Microsoft were significant predictors. Patent
count, Apple, Motorola, Nokia and Samsung were not a significant
predictors.
There is a clear trend of technological diversity.
Patent count is not a significant predictor in explaining
technological diversity.
20. 2031/05/2017 20
Natural language offers an important vantage point to interesting
phenomenon not directly measurable.
This advantage is clear in the case of patent data analysis, where
abstract are known to carry a low information value and the use of
metadata has significant limitations.
Main finding is that, by using full-text and LDA, we can create a
Technology Diversity value independent of patent count.
This analysis opens the possibility to utilize the approach in
more in depth studies focusing in, for example, measuring the
impact of company knowledge depth and breadth to company
performance.
INSIGHT: TELECOMMUNICATION INDUSTRY
21. 212131/05/2017
THANK YOU
Dr. Arho Suominen
Senior Scientist, D.Sc. (Tech)
Academy of Finland Postdoctoral Researcher
stationed at
VTT TECHNICAL RESEARCH CENTRE OF
FINLAND
Innovations, Economy, and Policy
Vuorimiehentie 3, P.O. Box 1000, 02044 Espoo,
Finland
Tel. +358 50 5050 354
www.vtt.fi, arho.suominen@vtt.fi
https://www.linkedin.com/in/arhosuominen
Twitter @ArhoSuominen