Continuous Learning Algorithms - a Research Proposal Paper

Continuous Learning Algorithms in Machine Intelligence and their
Application to Less Consistent Decision Making Processes Like Name
Matching Analysis

Abstract

General software intelligences are still held to be outside our current
capacity to build. While the definition of intelligence which we apply to machine
learning and artificial intelligence generally has expanded over time as our practical
computational scales increase, little exploration has been conducted around the
other aspect of intelligence, which is the capacity to constantly learn and improve
through interaction with the environment. If we are to define a software
intelligence as an algorithm that is capable of interacting with its environment and
adapting to it over time, then this exploration is critical to the development of such
a system.

This body of research will attempt to make the first step into the area of
continual feedback for a machine learning algorithm, evaluating it against an area
which has traditionally been difficult for computers to emulate – Name Matching
Analysis. If a machine learning algorithm can be used to ‘tune’ a soft-search name
matching algorithm based on continual feedback generated from the results of that
engine and the feedback provided by human experts, then this technique of
constant feedback not only has immediate practical value but could be explored
further in more ambitious research projects.

Final Research Paper, Information Science Extension Studies 4 (7867) Page 1 of 18
Tim Barlow (u3055036) Submitted 27 November 2011

Table of Contents

Section Page

1. Introduction 3

2. Literature Review 5

2.1 A History of Artificial Intelligence 5

2.2 Modern AI Research into Data and Text Classification 6

2.3 A History of Soft Search Techniques for Identity Resolution 7

2.4 Issues in Name Matching and Existing Research in the Field 9

3. Research Problem or Knowledge Gap 12

4. Further Questions 13

5. Conclusion 15

6. References 16


1. Introduction

Dreams of a general artificial intelligence have been with us for some time (Turing,

1950; McCarthy & Hay, 1969). Initial experimentation with the game of chess (de Groot, 1965)

lead to heavy expectations in the field, and this field also started to inform the approach to

researching how the human mind itself actually worked (Simon & Chase, 1973). The problems

associated with achieving a general artificial intelligence have generally fallen into two

categories – computational power and training models.

Today, modern computing power and the advent of the internet, which can provide

access to massive amounts of information to any machine learning algorithm, are eroding the

first restriction. Research into the use of this new power is being actively explored (Gillick et al,

2006).

Traditional soft search name matching algorithms have also been in existence for some

time. The first documented linguistic approach to name matching came from a doctoral thesis

(Hermansen, 1985) which also outlined a classification process using fuzzy logic matching

counterbalanced with a large array of ‘linguistic’ rules. At approximately the same time, a small

company (Search Software Australia) was developing fuzzy logic rules around the orthographic

approach to name matching (Halloway and Dunkerley, 1999).

In the last 10 years, there has been an influx of academic writings in the field of name

matching, from comparisons of existing techniques (Christen, 2006; Snae, 2007) through to


suggestions on how best to approach the problem (Oshika et al, 1988; Bilenko et al, 2003;

Freeman et al, 2006).

Name Matching as a process however has a more gradiated result set due to the finer

granularity of names generally and the complete lack of an enforced standard global

nomenclature (Do & Rahn, 2007). Often a name that is linguistically (means-like) similar to a

search term is not orthographically (looks-like) or phonetically (sounds-like) similar. To that

end, often these three approaches are considered contradictory yet all 3 (and possibly more)

are required for a successful name matching algorithm (See Table 1).

Original Name Orthographic Error Phonetic Error Linguistic Error
Sean Saen Shaun John

Seam Shorn Shane
Elizabeth Elizadeth Elisabeth Bethany

Ellixabeth Ellizabef Lisa

Table 1 – Name variations possible through error or natural variation

So, how do people learn to analyse name matches? While they can be given some initial

training, it is generally held within the industry that the best teacher is experience (Wang et al,

1995). This approach of learning ‘on the job’ and over time is something that has never been

attempted within a name matching machine learning context before.


2. Literature Review

This is a wide topic, drawing from several different complementary disciplines. It is

therefore prudent to categorise the material in this literature review across four topics. The

first will be a discussion on the history of research in Artificial Intelligence and the changing

scope and definition of an artificially intelligent system over time. Second, we will address the

current direction of AI research as it applies to data analysis and the complexity of modern

database holdings. Third will be the history of computerized soft name matching systems and

finally we will engage in a discussion about the linguistic challenges faced by such algorithms

and how these might be addressed by new technologies.

2.1 A History of Artificial Intelligence

As early as 1950 there have been discussions about the possibility of computers

possessing artificial intelligence in a manner which allowed it to learn and adapt to its

environment at least in some limited way (Turing, 1950). The famous Turing Test is a

model commonly used as a benchmark when attempting to create systems that are

capable of conversing with humans.

In 1969, it was clear that the computational power and storage required for an

artificial intelligence capable of interacting with the world at large was not available,

and likewise training it would have taken prohibitive amounts of time and manual input.

Still, the ‘mathematisation’ of various fields of interest has extended past chess with the


intent of using computers to simulate and inform on the thought processes, not just of

the individual, but of society (Laland, 1993).

This more recent work implies that many of the issues around scale and

complexity are no longer insurmountable. Nevertheless, we do not have a functioning

general machine intelligence. Generally, machine learning has been dedicated to Non-

Polynomial problems (Deitrich, 2000) like the travelling salesman problem (Dorigo,

1997). If the scale and capacity problems can be solved, then the sole remaining issue

appears to be the simulation of a suitable learning mechanism. So, what is the current

focus of research in this field and how effective has it been to date?

2.2 Modern AI Research into Data and Text Classification

There has been much written in the field of using machine learning for text

classification and categorization (Joachims, 2002). This category of research is better

known as Natural Language Processing (NLP). This work has two broad aims – the first is

to allow a computer to intelligently categorise data in free text form so that humans can

read and absorb that text which is considered to be of a higher priority than the

remainder. The second is to facilitate machine learning by allowing a computer to

categorise text in a manner that leads to contextual awareness of the content

(Sebastiani, 2002). While there have been successes in this field, there are always

‘border cases’ (cases where a body of text could easily belong to more than one

category). For the most part however, the classification itself can generally be


considered either ‘right’ or ‘wrong’ by the human expert(s) (Sebastiani, 2002) making it

easier to create a training model for a system designed to categorise or classify text.

Teaching computers to understand text is considered to be essential to general

intelligence models (Jurafsky et al, 2000) and intelligent categorization of data by

computer would by definition be a primary task of most machine learning approaches

(Witten & Frank, 2010). If the ability to classify and contextualize text is so important to

a system intelligence, and progress is being made in this field, does that mean that

Name Matching benefits from these advances?

2.3 A History of Soft Search Techniques for Identity Resolution

There is a wide array of papers that have been published on name matching.

Starting with Jack Hermansen’s seminal paper which was also his Doctoral Thesis

(Hermansen, 1985), we find the start of a new field of Computational Linguistics,

especially as applicable to the matching of names. Mr Hermansen’s approach was to

create a large database of name variations to which every name could be compared, so

that it could be grouped appropriately. While it also did some basic similarity tests

across names to cater for error being introduced, the approach was primarily designed

around names having a distinct meaning (Linguistic Approach).

By 1999, there were several commercial firms like Search Software Australia

(Halloway and Dunkerley, 1999) who were approaching the topic from a completely

different approach very loosely based on Soundex. This approach attempts to draw


similarity from the order and placement of letters within the name, attempting to

discern whether two names being compared against each other have a ‘distance’ from

each other which is within acceptable limits so that it can be included as a potential

match. In this design, the words in the name have no meaning whatsoever and the

names are being compared as a series of letters (Orthographic Approach).

These two approaches have their strengths and weaknesses. The linguistic

approach would do well at recognizing that Peggy and Maggie may well be the same

name (both derived from Margaret), but the orthographic approach would find it far

simpler to match Maggie and Magpie, which a linguistic engine may understand to be

two completely different words yet could easily just be a typo if you look at their

orthographic distance from each other.

So, by removing meaning from names you are more adept at picking up user

errors and simple mistakes in your data. Unfortunately, without meaning it is difficult to

categorise a name (Pfeiffer et al, 1996). Current research (Christen, 2006) also

demonstrates that no technique performs significantly better than any other over a

reasonable data pool. Identity resolution still needs to be able to deal with linguistic,

phonetic AND orthographic errors thanks to the many different types of errors that can

be introduced during data capture.

While the emulation of human decisions and learning has been attempted in

more commonly understood domains like chess (Furkranz, 1996) it can be argued that


success in such an environment is inevitable because of the strict rules and objectives,

no matter how complicated they might be.

On the other hand, name matching has moved away from machine learning in

the literature, focusing specifically on the different approaches and their comparison

(Bilenko et al, 2003). One gets the impression from the body of papers on the subject

that there is a reluctance to introduce machine learning into this field of research. But

why?

2.4 Issues in Name Matching and Existing Research in the Field

Is the problem that names in databases are constantly changing? Certainly there

is evidence to demonstrate that the rules that we take for granted in the use of names

are not only changing but they are being broken. This has now reached a point in

general society where the recording of names is being considered more carefully for

specific professions like the legal system (Emens, 2007). Most commercial systems

devoted to name matching claim a flexible approach to name matching however,

meaning that a machine learning algorithm could build on that flexibility to get around

the problem.

Is it that naming trends and fashions are constantly changing in different ways

across multiple cultures? Again, we see evidence that this is the case. A modern

example of this would be how naming conventions changed in Indonesia during Dutch

colonization and after they gained their independence (Anderson, 1999). Add to that


the experience of the African Americans in modern times, literally striving to create their

own sense of culture by creating unique names for the next generation which are

devoid of any cultural attachments from either African or Western societies (Lieberson

& Mikelson, 1995).

In point of fact, the African American approach actually simplifies the process

rather than complicating it. As the names being used are new, there is no linguistic

elements to consider, leaving the simpler orthographic and phonetic comparisons.

Generally speaking, it is the older names which are rich in linguistic heritage and have

been used differently in multiple cultures which cause the biggest headaches for a

machine learning approach. After all, most lay-people wouldn’t know that John, Sean,

Ian, Johan, Juan, Zane, Giovanni and Ivan are effectively the same name from an original

Hebrew source – it’s even harder for a system to learn it from experience.

So is the problem that machine learning can’t cope with drift? Actually, it can.

There are already studies (Klinkenberg, 2004) that explore a similar problem to this in

that one would expect users to become more adept at selecting their name matches

over time, therefore would consider a completely different set of names to be

acceptable ten years from when they first started. This requires constant learning

feedback meaning that the traditional use of training and testing sets are less subject to

dataset shift (Quionero-Candela et al, 2009), but other issues are introduced. For

instance, how does machine learning cope with changing answers or even conflicting

answers from different trainers?


One possible solution is demonstrated in a case study (Doan et al, 2001) of a

system that is trained by people with different skills and experience in a way that allows

the system to build a meta-learner, or an algorithm designed to learn how to learn

rather than learn a specific approach. This is a similar approach to the one that I plan to

use, however the group lead by Doan have specialized in semantic connections which is

a more consistent field of inquiry.


3. Research Problem or Knowledge Gap

There are several gaps in the current body of knowledge which are directly relevant to

my aims.

We already have name matching engines which perform soft searches to resolve

identities in data pools (Miller et al, 2008). We already have a body of knowledge in linguistics

around names (the specific study of which is known as Onomastics) and there is a significant

body of research around machine learning and artificial intelligence generally (Bishop, 2006).

There is an increasing body of research into text categorization using machine learning

(Sebastiani, 2002), but I could find no papers that discussed NAME categorization.

Papers that covered linguistic association of names such as Hermansen (1985) and

Freeman et al (2006) did so in the context of cross cultural mappings. All the other papers I

read that discussed name matching techniques in detail tended to focus on items like ‘edit

distance’ (Cohen et al, 2003) and other similar orthographic techniques. These techniques

consider the name to be no more than a sequence of letters, and therefore the name carries no

meaning.

So, can a system intelligence continue to improve over time at problems for which the

answers appear at best inconsistent and at worst contradictory if one provides a constant

stream of feedback to use as learning data? This question has not been explored conclusively

and appears to be a gap in the literature around the topics of name matching and artificial

intelligence or machine learning.


4. Further Questions

Can a computer emulate the learning style employed by humans in this field? If so, does

this mean the name matching process algorithmic in nature, despite the apparent

inconsistencies and contradictions that seem to occur within the process? If so, it would

indicate that the problem is not so much the ability of computers to emulate the apparently

inconsistent decisions of humans, but the inability of humans to articulate sufficiently complex

algorithms in code.

The implication of such an outcome is that continual feedback training cannot

‘overtrain’ a process which is sufficiently complex that humans get better at it the more they

learn about it themselves. Of course, that in turn suggests that an AI that is being constantly

trained by being asked questions and feedback being provided on the answers is more likely to

behave like a human intelligence because it’s closer to the way humans learn. We don’t tend to

learn a task by focusing on the training alone, but rather we are constantly learning that task

(even after we are taught it) through practice (McGeoch & Irion, 1952).

Of course, this also then informs the debate between the two famous cosmologists and

mathematicians regarding the universe and whether or not it is algorithmic by nature (Penrose,

1989; Hawking, 1988).

If a truly general machine intelligence could be built using this technique pioneered with

name matching algorithms, that would support Hawking’s view that the universe is algorithmic

in nature and therefore awareness is a by-product of a sufficiently complex algorithm. On the


other hand, if all attempts to use constant feedback training on a general machine learning

algorithm failed, then it could indicate that Penrose is correct, and awareness is due to the non-

algorithmic nature of insight, which he believes is a property possessed by all humans that

cannot be replicated within a computer program.


5. Conclusion

Because we process unstructured information gathered from our environment

instinctively it is very easy to forget just how much of it is processed by our minds every second.

We not only process the immediate information provided by our senses, but we also have the

capacity to process what we’ve stored in our memories. The next step in the creation of a

general machine intelligence is to see if a modern computing system is capable of a similar feat

and some research in this area is already being conducted.

Research into the scale problem is already underway (Rosenbloom, 1996) however my

research will address the aspect of continual learning instead. Part of that is addressing the

perceived inconsistencies (irrational decisions) that we see in some aspects of what we do. Are

they a matter of emotions or insight disrupting an otherwise perfect (if complex) algorithm, or

are they a matter of an algorithm that is much more complex than we originally thought? Are

humans high complexity, high entropy beings where our choices are in fact easy to interpret as

being part of many different possible ordered states? Is this in turn why we find it so difficult to

understand the motivations of others when they choose to help or hinder our own efforts?

If we can build a self-tuning name matching algorithm that continues to improve over

time through the ‘experience’ provided by matching experts (rather than reaching a plateau or

degrading after an initial improvement which would represent the overtraining curve effect),

then perhaps some of these questions will be within range for future research topics.

These questions inform the direction and intent of my proposed research.


6. References

Anderson B R O'G, (1999) Indonesian Nationalism Today and in the Future, Indonesia No. 67, pps. 1-11

Bilenko, M.; Mooney, R.; Cohen, W.; Ravikumar, P.; Fienberg, S. (2003) Adaptive name matching in information
integration. Intelligent Systems, IEEE Vol 18 Issue 5 pps. 16 - 23

Bishop, Christopher M, (2006) Pattern Recognition and Machine Learning, New York, NY, Springer

Christen, Peter; (2006) A Comparison of Personal Name Matching: Techniques and Practical Issues, Sixth IEEE
International Conference on Data Mining Workshops, pps. 290 – 294

Cohen, W., Ravikumar, P., & Fienberg, S. (2003). A comparison of string metrics for matching names and records.
KDD Workshop on Data Cleaning and Object Consolidation Vol. 3, pps. 73-78

Dietterich, Thomas (2000). Ensemble Methods in Machine Learning, Lecture Notes in Computer Science Vol 1857
pp 1-15, Springer Berlin / Heidelberg.

Do HH, Rahm E (2007), Matching Large Schemas: Approaches and Evaluation, Information Systems, Volume 32,
Issue 6, pps. 857-885

Doan, AnHai; Domingos, Pedro and Halevy, Alon Y. (2001). Reconciling schemas of disparate data sources: a
machine-learning approach. In Proceedings of the 2001 ACM SIGMOD international conference on Management
of data (SIGMOD '01), Timos Sellis (Ed.). ACM, New York, NY, USA, pps. 509-520.

Dorigo, M, Gambardella, L.M. (1997) Ant colony system: a cooperative learning approach to the traveling salesman
problem, Evolutionary Computation, IEEE Transactions Vol 1 Issue 1 pps. 53-66

Emens, EF (2007), Changing Name Changing: Framing Rules and the Future of Marital Names, The University of
Chicago Law Review Vol. 74, No. 3, pps. 761-863

Freeman, Andrew T; Condon, Sherri L. and Ackerman, Christopher M. (2006). Cross linguistic name matching in
English and Arabic: a "one to many mapping" extension of the Levenshtein edit distance algorithm. In Proceedings
of the main conference on Human Language Technology Conference of the North American Chapter of the
Association of Computational Linguistics (HLT-NAACL '06). Association for Computational Linguistics, Stroudsburg,
PA, USA, pps. 471-478.

Fürnkranz J (1996), Machine Learning In Computer Chess: The Next Generation, International Computer Chess
Association Journal

Gillick D, Faria A, Denero J (2006); MapReduce: Distributed Computing for Machine Learning

de Groot, AD (1965). Thought and choice in chess, Moulton Publishers, The Hague, The Netherlands

Halloway, G; Dunkerley, M (1999) The Math, Myth & Magic of Name Search and Matching, Search Software
America

Hawking, Stephen (1988) A Brief History of Time, Bantam Dell Publishing Group

Hermansen, J.C. (1985) Automatic Name Searching in Large Databases of International Names (Ph.D. Thesis,
Georgetown University)


Joachims, Thorsten (2002) Learning to Classify Text Using Support Vector Machines: Methods, Theory, and
Algorithms. Kluwer Academic Publishers

Jurafsky, Daniel and Martin, James H. (2000) Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition (University of Colorado, Boulder) Upper
Saddle River, NJ: Prentice Hall (Prentice Hall series in artificial intelligence, edited by Stuart Russell and Peter
Norvig) , xxvi+934 pps

Klinkenberg, Ralf. (2004). Learning drifting concepts: Example selection vs. example weighting. Intell. Data Anal. Vol
8, No 3, pps. 281-300.

Laland, K. N. (1993), The mathematical modelling of human culture and its implications for psychology and the
human sciences. British Journal of Psychology, pps. 84: 145–169.

Lieberson S, Mikelson KS (1995), Distinctive African American Names: An Experimental, Historical, and Linguistic
Analysis of Innovation, American Sociological Review Vol. 60, No. 6, pps. 928-946

Mccarthy, John & Hayes, Patrick J. (1969) Some Philosophical Problems from the Standpoint of Artificial
Intelligence, Edinburgh University Press

McGeoch, John A.; Irion, Arthur L. (1952) The psychology of human learning. (2nd ed.). Oxford, England: Longmans,
Green & Co. xxii, 596 pps.

Miller, Keith J; Arehart, Mark; Ball, Catherine; Polk, John; Rubenstein, Alan; Samuel, Ken; Schroeder, Elizabeth;
Vecchi Eva; & Wolf, Chris (2008); An Infrastructure, Tools and Methodology for Evaluation of Multicultural Name
Matching Systems, Proceedings of the 6th international conference on Language Resources and Evaluation, pps.
3179 – 3184

Oshika, R; Machi, F; Evans, B; and Tom, J. (1988) Computational Techniques for Improved Name Search,
Proceedings of Second Conference on Applied Natural Language Processing pps. 203-210

Penrose, Roger (1989) - The emperor's new mind: Concerning computers, minds, and the laws of physics. New York,
NY, US: Oxford University Press. 466 pps.

Pfeiffer, U; Poersch, T & Fuhr, R (1996) - Retrieval Effectiveness of Proper Name Search Methods, Information
Processing & Management, Issue 32: pps.667-679

Quionero-Candela, Joaquin; Sugiyama, Masashi; Schwaighofer, Anton and Lawrence, Neil D. (2009). Dataset Shift
in Machine Learning. The MIT Press.

Rosenbloom, Paul S; Laird, John E; Newell, Allen; McCarl, Robert (1991) - A preliminary analysis of the Soar
architecture as a basis for general intelligence, Artificial Intelligence, Volume 47, Issues 1-3, pps. 289-325

Sebastiani, Fabrizio. (2002). Machine learning in automated text categorization. ACM Comput. Surv. 34, 1,
DOI=10.1145/505282.505283 http://doi.acm.org/10.1145/505282.505283 pps. 1-47.

Simon, Herbert A & Chase, William G (1973) - Skill in Chess: Experiments with chess playing tasks and computer
simulation of skilled performance throw light on some human perceptual and memory processes - American
Scientist, Vol 61 No 4

Snae, C (2007); Comparison and Analysis of Name Matching Algorithms, Proceedings of World Academy of Science,
Engineering and Technology, Volume 21

Turing, AM (1950). Computing Machinery and Intelligence, Mind (Oxford University Press), Vol. 59, No. 236


Wang L; Siegel H J; Roychowdhury V P (1997). Task matching and scheduling in heterogeneous computing
environments using a genetic-algorithm-based approach, Journal of Parallel and Distributed Computing, Vol 47 No
01

Wang, R.Y.; Storey, V.C. and Firth, C.P. (1995) A framework for analysis of data quality research, IEEE Transactions
on Knowledge and Data Engineering, Vol 7 Issue 4, pps. 623 - 640


Continuous Learning Algorithms - a Research Proposal Paper

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Continuous Learning Algorithms - a Research Proposal Paper

Ähnlich wie Continuous Learning Algorithms - a Research Proposal Paper (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Continuous Learning Algorithms - a Research Proposal Paper