SlideShare ist ein Scribd-Unternehmen logo
1 von 20
APPLICATION OF GENETIC PROGRAMMING
                  TOWARDS WORD ALIGNERS



                                 BENJAMIN HEILERS




                Department of Electrical Engineering and Computer Science
                            University of California, Berkeley


                                     December 2004

                                        CS 294-5




keywords: Genetic Programming, Word Aligners, Machine Translation, Machine
Learning, Genetic Algorithms, Natural Language Processing, Artificial Intelligence
1.       Introduction

         This paper details the (as-of-yet-unfruitful-and-thus-determinedly-ongoing)

research into the application of genetic programming towards optimizing word aligners.

Popular belief holds the use of genetic programming in machine translation to be

infeasible. Regardless, it is the goal of the author, admittedly due to infatuation with

machine learning in general, to convince himself personally that such a wide-spread

sentiment is either well-chosen or wholly amiss.

         A word aligner is a program coupling words in sentence pairs, in effect

constructing a bilingual dictionary [1:484, 2]. Genetic Programming, a specific branch of

Genetic Algorithms, is a term used to deal with search across a function space, where the

natural reproductive methods – selection, mutation, crossover – are mimicked in the

hopes that the great successes of evolution on living creatures may be repeated on

programs [3:47-56, 4]. Genetic Algorithms deal more broadly with evolving all types of

functions. Genetic Programming takes a set of programs and filters out those most

resembling word aligners, to subject them to alterations in the hopes of finding yet better

candidates. The process by which programs are selected tests each program on a subset

of the sentence-pair corpus, thus qualifying this approach as supervised learning.

Another concept appearing in this paper and warranting a definition is that of the Abstract

Syntax Tree (AST), a representation for computer code which renders the code in a

particularly useful format for genetically reproductive processes [5:9]. Abstract Syntax

Trees are preferable due to:

     •   it is many times simpler conceptually to apply crossover and mutation to a tree

         representation, than to a program code in string form.




                                                                                         2
•    a design pattern, the Visitor Pattern, suggests an easily implementable approach

         to traversing this representation for program code [9, 10]




Figure 1: An Eclipse Abstract Syntax Tree in Graphical and Textual Forms.




Note in figure 1 that the Eclipse AST, the package used in this research, maintains some

information within nodes (such as the operator in infix expressions), whereas some AST

representations place formulate this information a child node.



2.       Literature Review




                                                                                        3
There is little literature on the application of genetic algorithms to word aligners.

Instead, we turn to the literature on genetic programming, where suggestions to

counteract various results-limiting phenomena abound.

       There are a myriad of decisions to make in implementing a genetic algorithm.

Fortunately, literature provides enough detailed discussion to allow for preparations

against most common problems with genetic programming. Franz Rothlauf is the first to

write a book on the pros and cons of various representations in genetic algorithms [7].

Like many of his colleagues, he highly suggests tree representations for genetic

programming. This representation eases the implementation of mutation and crossover

tremendously, compared with the traditional representation as bit strings, whereby the

chances that a mutated string still resembles working code are less than slim.




                                                                                          4
public Alignment alignSentencePair
                             (SentencePair sentencePair){
                      MISSING=2092010418 <= -1198423683;
                      alignment=new Alignment();
                      I4=addAlignment(alignment,I4,I4,B3);
                      B4=false;
                      I2=numEnglishWordsInSentence(sentencePair);
                      if (I2 < -594586326){
                             D2=getDouble(L3,I1);
                      } else {
                             addInt(L2,I1,I3);
                             getInt(L5,I2);
                             addBoolean(L3,I1,B2);
                             while (I2 < 1564864814){
                                    addDouble(L5,I5,D1);
                                    MISSING=664939021;
                                    alignment=getString(L1,I2);
                                    MISSING=311599999 * 1197784289;
                                    D1=-287916828;
                             }
                      }
                      I2=numFrenchWordsInSentence(sentencePair);
                      for (I3=0;I3 < I1;I3++){
                             I4=-1;
                             D1=0;
                             for (I5=0;I5 < I2;I5++){
                                    D2=50 / (1 + abs(I3 - I5));
                                    if (D2 >= D1){
                                           D1=D2;
                                           I4=I5;
                                    }
                             }
                             addAlignment(alignment,I4,I3,true);
                      }
                      return alignment;
               }


               Figure 2: Example of Bloat. Lines resembling original file are in
               bold. Lines colored red are added as result of bloat phenomenon.


       The phenomena of bloat is widely mentioned, whereby each successive

generation displays a much larger file size than the previous, yet most of the added code

contribute little to no added functionality. With high rates of mutation, I have seen 450

lines (nine pages) of code introduced to an initially twenty-line file, after less than ten

generations. There are several mechanisms in place to cope with bloat, as discussed later.

       Another commonly observed fact to cope with is over-fitting. This occurs when

the genetic process is allowed to run for too long. For example, the corpus used in this

research consists of 447 sentence pairs, pairing English and French sentences. If we are


                                                                                         5
to choose the first ten and evolve randomly generated programs to return alignments of

these, then at some point we may theoretically find a reasonable solution which not only

achieves superb results on the ten training sentence pairs, but on the 447 total sentence

pairs as well. However, if we continue to evolve past this point, chances are that our

population will become over fit for these ten sentences. This is similar, for example, to

hoping to find the equation y = x2, but instead achieving y = 1, with training data of only

(-1, 1) and (1, 1).




     Figure 3: After fitness is reached, over-fitting to the training data may occur.




                                                                                         6
Figure 4: Example of Over-Fitting. The
                                                  solid black line is y = x2, the red dotted
                                                  line is y = |x|, and the blue dashed line is y
                                                  = 1. The training data is { (1, 1), (-1, 1) },
                                                  but the desired function is { (x, y) : y =
                                                  x2 }




       The literature is also helpful in suggesting approximate values for the frequencies

at which to apply mutation and crossover to members of the population, though the

perfect values are apparently learned only by trial and error.



3.     General Overview of Algorithm

       In general, instead of searching across the solution space, we utilize GP to aid in

search across the function space. As the function space is of immense proportions, we

randomly sample the function space, and then search through not only these functions but

others similar to them. In the graph above, we may have a function y = x + 5. This

would lead us to searching similar functions such as y = x + 6, y = 3x + 5, y = x 2 + 5, etc.

Since it is infeasible to evaluate every possible function with similar form to y = x + 5,

we must again find a method with which to decide which functions to search. This is the

basic concept of genetic programming, where a desired set of (input, output) pairs is

known, but we search for the function (or possibly one of many functions) which causes

this return. Doing this search across program code is many times more complex than

across math equations.



                                                                                              7
The flow chart shown here

is exactly the order in which

genetic        programming         is

implemented in this research. An

initial population is created, by

taking files such as random.java in

the Appendix and sending them

through several generations of high

mutation.       Since   the   current

version of this GP process is still

prone to producing erroneous code,
                                                  Figure 5: Flow Chart of GP
many more programs are generated

than asked for. Each is then evaluated according to the fitness function, and those which

have compile and runtime errors, (at this point mostly due to invalid arguments, incorrect

casting, and undeclared variable names – see Results), is filtered out and thrown away.

Thus the GP process begins with only valid programs in its initial population. From here,

the population undergoes a number of iterations wherein each member is evaluated, then

the next generation is selected, then crossover and mutation is allowed to occur. The

rates for these are currently at 80% chance for 3-point crossover to occur (see Section 4),

and 0.05% for mutation, as suggested by most literature.




                                                                                         8
4.        Details of Implementation Decisions

Since each design decision is independent of each other, and many need to be presented

simultaneously, I have chosen to format this section by discussing each one on its own, as

orderly as possible.

          AST representation: The members of the population could be represented in a

myriad of ways. Many people implementing genetic programming choose to create a an

original representation of their function. This unfortunately, is due to the newness of the

field, and hinders the progress of future work by allowing for the researchers to get

bogged down in minor details which could potentially be settled already. I admire Franz

Rothlauf’s efforts to correct this problem, and agree with him that the best representation

for my purposes is to use an AST. This allows for an easy crossover implementation, and

only necessitates moderate work to implement mutation.

          Generational model: The generational model of a population allows for the

lifespan of each population member to be only a single generation, as opposed to the

Steady-State model, which not only selects members for reproduction but also selects

which member they will be replacing as well [8:134]. Both models use a constant-sized

population. I have chosen to go with the generational model here because of simplicity in

design.

          Initial Population: The usual approach is to start with a completely random set of

programs. This seems unreasonable. Why create programs which construct strings and

draw websites when we are looking for a program to add two numbers together? I have

taken an approach for which I could not find any literature, to start with a population of

programs similar to that under random.java in the Appendix. In some cases, I have even




                                                                                          9
placed some initial code within the for-loop, to make alignments based on superficial

traits. I do not rule out the possibility that this second strategy in effect may steer me into

the wrong direction, by intending to run the program both with random.java and with the

other versions (entitled superficial.java). It is my hope that by providing some base code,

the completely random results will be avoided and thus a better chance of finding an

optimal solution is possible.

         Fitness function: The fitness function, the measure by which we decide which

members yield the most desirable results, and thus have the most potential for being

prototypes of our desired word aligner, seems obvious. The goal is to maximize the

precision and recall while minimizing the AER, as defined in [11:1]. Thus the fitness

function calls alignSentencePair of each population member on a small subset of the full

corpus, and returns the weighted sum of these numbers (where w1, w2, w3 are the

weights):

                        10 * [ w1 * P + w2 * R + w3 * (1 – AER) ]

         Countering Over-Fitting: An easy fix to countering over-fitting to the training

data used in the fitness function is to keep the training set dynamic. I have implemented

this by choosing a random set each time. Thus there is no worry of over-fitting to the

specific set of sentence pairs being learned on, since there is no specific set of sentence

pairs.

         Fitness-Proportionate Selection: There are two methods of selection in

widespread use: tournament and proportionate fitness selection [3:37]. In tournament

selection several tournaments are held in which the fitness is calculated and the winner of

the tournament is selected for reproduction. As the fitness function here requires no




                                                                                            10
small amount of time, for matters of efficiency I chose the less computationally

expensive selection process, fitness-proportionate selection. Here each member of the

population is evaluated once, and then the new generation is randomly selected, with

probability proportional to the fitness of the member [see figure]. The risk here generally

is that with a wide variety of fitness values, those with the lower fitness values will be

excluded from selection, the diversity

of   the   population   will   disappear

prematurely, leading to premature

convergence. It is my hope that with a

non-random initial population, the

disparity in the fitness values will not

be as dangerous as if the populations

had been truly initialized randomly.
                                           Figure 6: The Proportionate Fitness Selection Process
                                                        is Akin to a Game of Darts.
       Elitist Strategy:   This is the

decision to leave the best fit member of the population in the next generation,

unmodified, although this does not rule out the possibility of including genetic

reproductions of this member as well.

       Copies to file best of each generation: Looking back to the chart in Section 2,

Literature Review, we acknowledge that there exists a reduction in quality when one

allows the GP process to run too many generations. To alleviate this, a copy of the

member with the highest fitness value of each generation is written to file in a separate

folder. The end process, where we test the evolved word aligners, tests the best fit

member of each generation, not solely that of the final generation.




                                                                                             11
N-point crossover: The two common methods of crossover are uniform and n-

point. In uniform crossover, a single point in the genome is chosen and two members

have their code swapped at this point. N-point crossover allows for this to happen at

multiple points, and is much more suitable to crossover on trees, where we are not

dealing with the traditional fixed-length representation as in bit strings.

       Protected functions and variable types: To alleviate casting problems and protect

against null pointer exceptions, it is easier to Refer to WordAligner class in Appendix,

which is the super class of all other word aligner classes.

       Confine alterations:     To minimize the number of off-track members of the

population, alterations to the code are kept in the area where they matter the most. The

basic information needed by all word aligners is as is shown in the appendix for

random.java.     Every word aligner should align each French word to some word in

English. Thus, only the body of the for loop is altered.

       Halting problem: It may occur through mutation or crossover that infinite loops

are created [8:293-294]. To counter this, the fitness function makes use of threads and

halts after a reasonable amount of time has elapsed. This also serves as an additional

measure against excessive bloat.



5.     Results

       As can be seen by the example code, mutation never worked as desired. Nearly

every mutation results in erroneous code. It seems the large majority are casting issues

(in the figure here, D3 is a double and S1 is a string, while H5 is a HashMap). Those

mutated programs which do compile include useless code, such as incrementing variables




                                                                                     12
used nowhere else in

the program.

          Without

mutation, the rest of the

genetic     programming

process is little more

than searching over the

different orderings of
                             Figure 2: Some Mutation Errors Still Mishandled.

the statements within

the for loop, which holds no possible word aligners not already visible at a first glance.




Figure 5: Initial Code and Code with "Better" Result.



                                                                                             13
Multiple runs each proved futile. As can be seen in the figure above, the best

word aligner returned is only slightly improved in terms of performance. Looking at the

code it appears to be a fluke, due to the off-chance that using the number of English

words in place of the number of French words tends to give better recall results (due to

the smaller length of English sentences in general relative to their French versions, less

proposed alignments allows for less erroneous guesses).

       Indeed, generations proved of no use. In effect, the code as is merely stares at

effectively the same program each generation, since crossover alone is not enough to

introduce variety, and the initial population is not truly random (not with the mutation

process in its current status). In the table below are shown the evaluation results of

members of each generation with the highest fitness.

              Gen.    Precision        Recall          AER Fitness
              0       0.3658           0.2258          0.6864    9.0520
              1       0.3658           0.2258          0.6864    9.0520
              2       0.3535           0.2909          0.6678    9.7660
              3       0.3658           0.2258          0.6864    9.0520
              4       0.3658           0.2258          0.6864    9.0520
              5       0.3658           0.2258          0.6864    9.0520
              6       0.3935           0.1889          0.6966    8.8580
              7       0.3658           0.2258          0.6864    9.0520
              8       0.3658           0.2258          0.6864    9.0520
              9       0.3658           0.2258          0.6864    9.0520
              10      0.3658           0.2250          0.6864    9.0520
              11      0.3658           0.2250          0.6864    9.0520
              12      0.3658           0.2250          0.6864    9.0520
              13      0.3658           0.2250          0.6864    9.0520

               Figure 9: Table of Fitness Function Results on a GP run




6.     Conclusion


                                                                                       14
In addition to deciphering the process of mutating code in a meaningful way,

there are a few other tricks which I did not have time to experiment with, but think may

prove useful.

       With regards to premature convergence, it would be interesting to add a feature

whereby the mutation rate is raised greatly for a generation to promote an increase in

diversity, triggered by a low standard deviation in the fitnesses.

       Since I was not able to get mutation working, I never actually started off with

random.java. I instead started off with the likeness of the file seen in Figure 8. From

what I have seen, it seems like possibly initializing from this file yields a population

lacking in diversity, even with higher mutation rates in the initial generations. It would

be interesting to run a comparison between initializing from this file and from

random.java.

       It is acknowledged widely that the process of finding correct values for mutation

and crossover rates, for the number of generations, for the population size, and choosing

a heuristic function are all decisions which are still made by trial and error. Studying the

exact effects of raising and lowering each of these values will consume quite an amount

of time, but is vital before much more work can be done in the area of genetic

programming in general.




                                                                                         15
Bibliography


1. Manning and Schütze. Foundations of Statistical Natural Language Processing,
     pg. 484
2. Automatic Construction of a Bilingual Lexicon:
     wwwhome.cs.utwente.nl/~irgroup/align/
3. Ghanea-Hercock. Applied Evolutionary Algorithms in Java
4. Genetic-Programming.Org: www.genetic-programming.org/
5. Grune, Bal, Jacobs, and Langendoen. Modern Compiler Design, pg. 9, 22, 52-55
6. Langdon and Poli. Foundations of Genetic Programming
7. Rothlauf. Representations for Genetic and Evolutionary Algorithms
8. Banzhaf, Nordin, Keller, and Francone. Genetic Programming: An Introduction
9. Gamma, Helm, Johnson, and Vlissides. Design Patterns
10. Visitor Pattern: http://en.wikipedia.org/wiki/Visitor_pattern
11. Assignment 4: Word Alignment Models:
     www.cs.berkeley.edu/~klein/cs294-5/cs294-5%20assignment%204.pdf


                                          Figures
                             (original unless otherwise noted)


1.      Abstract Syntax Tree in graphical and textual forms.

2.      Example of Bloat.

3.      After fitness is reached, overfitting to the training data may occur. [source:
        Schmiedle F, Drechsler N, Grosse D, Drechsler R. “Heuristic learning based on
        genetic programming.” Genetic Programming & Evolvable Machines, Vol. 3,
        Dec. 2002, pg 376]

4.      Example of Over-Fitting
5.      Flow Chart of GP Approach.         [chart source: Sette S, Boullart L. “Genetic
        programming: principles and applications.”             Engineering Applications of
        Artificial Intelligence, Vol. 14, Dec. 2001, pg 728]



                                                                                        16
6.     Proportionate Fitness Selection
7.     Some Mutation Errors Still Mishandled
8.     Initial Code and Code with "Better" Result.
9.     Table of Fitness Function Results


                                    Appended Code


Crossover: Takes two statements with parents of same type (for/for, while/while, etc.)
The eclipse AST toolkit requires that each node belong to a certain tree, and thus simply
switching trees is not possible, instead we clone the subtrees under their new owner with
the static copySubtree(targetAST, sourceNode) method.
private void crossover (int index1, Statement switch1,
                           int index2, Statement switch2) {

       CompilationUnit cu1 = newPop[index1];
       CompilationUnit cu2 = newPop[index2];

       AST ast1 = cu1.getAST();
       AST ast2 = cu2.getAST();

       ASTNode p1 = switch1.getParent();
       ASTNode p2 = switch2.getParent();

       Statement switch1_under_ast2 =
              (Statement) ASTNode.copySubtree(ast2,switch1);
       Statement switch2_under_ast1 =
              (Statement) ASTNode.copySubtree(ast1,switch2);

       switch (p1.getNodeType()) {
       case ASTNode.BLOCK:
              List m1 = ((Block) p1).statements();
              List m2 = ((Block) p2).statements();

              m1.set(m1.indexOf(switch1), switch2_under_ast1);
              m2.set(m2.indexOf(switch2), switch1_under_ast2);
              break;

       case ASTNode.IF_STATEMENT:
              if (switch1.getLocationInParent().getId()
                            .equals("elseStatement")) {
                     ((IfStatement) p2).setElseStatement(switch1_under_ast2);
                     ((IfStatement) p1).setElseStatement(switch2_under_ast1);
              } else {
                     ((IfStatement) p2).setThenStatement(switch1_under_ast2);
                     ((IfStatement) p1).setThenStatement(switch2_under_ast1);
              }
              break;

       case ASTNode.WHILE_STATEMENT:
              ((WhileStatement) p2).setBody(switch1_under_ast2);
              ((WhileStatement) p1).setBody(switch2_under_ast1);
              break;



                                                                                      17
case ASTNode.FOR_STATEMENT:
               ((ForStatement) p2).setBody(switch1_under_ast2);
               ((ForStatement) p1).setBody(switch2_under_ast1);
               break;

        default:
               throw new RuntimeException("unhandled crossover for nodeType: "
                                                + p1.getNodeType());

        }

}


Mutation: Uses the Visitor Pattern and extends org.eclipse.jdt.internal.corext
.dom.GenericVisitor with Mutator to implement mutation. Mutator is a file much too

long to display here. The essentials are that it randomly changes register names and
values in the code, as well occasionally inserting newly generated lines of code and
making calls to safely defined methods (that is to say, a divide that checks for division by
zero, etc.).
public void mutate(int index) {
       random.nextFloat()*interchangeableTable.size();
       Mutator mutator = new Mutator(seed, numRegisters);
       CompilationUnit cu = newPop[index];
       AST ast = cu.getAST();
       cu.accept(mutator);
}



WordAligner parent class: The following is edited due to length; redundant and
obvious methods have been abbreviated. The Statistics object contains data from an
initial pass over the corpus before hand, gathering data such as is used in unsupervised
learning: Pr(f), Pr(e), Pr(f, e).
public class WordAligner {
       protected WordAligner (Statistics s) {
              statistics = s;
       }

        public Alignment alignSentencePair(SentencePair s) {
               return null;
        }

        public float prob_f(String f) {
               return (float) statistics.prob_f(f);
        }

        public float prob_e(String e) {
               return (float) statistics.prob_e(e);
        }

        public float prob_e_and_f(SentencePair s, String f, String e) {



                                                                                         18
return (float) statistics.prob_f_and_e(s,f,e);
}

public List getFrenchWords (SentencePair s) {
       return s.getFrenchWords();
}

public List getEnglishWords (SentencePair s) {
       return s.getEnglishWords();
}

public float abs (float i) {
       return Math.abs(i);
}

public float numFrenchWordsInSentence (SentencePair s) {
       return s.getFrenchWords().size();
}

public float numEnglishWordsInSentence (SentencePair s) {
       return s.getEnglishWords().size();
}

public float getSentenceID (SentencePair s) {
       return s.getSentenceID();
}

public boolean addAlignment(float englishPosition,
                           float frenchPosition, boolean sure) {
       int e = Math.round(englishPosition);
       int f = Math.round(frenchPosition);
       alignment.addAlignment(e, f, sure);
       return true;
}

/** GET methods **/
public String getString(List L, float i) {
       if (L== null || L.size() == 0)
              return "";
       if (i >= L.size())
              i = L.size()-1;
       if (i < 0)
              i = 0;
       return (String) L.get(Math.round(i));
}

Also: getBoolean, getNumber

/** ADD methods **/
public boolean addString (List L, float i, String o) {
       if (L == null)
              L = new ArrayList();
       if (i >= L.size())
              i = L.size()-1;
       if (i < 0)
              i = 0;
       L.add(Math.round(i), o);
       return true;
}

Also: addBoolean, addNumber




                                                                   19
/** FIELDS **/
       public LinkedList         L1   =   new   LinkedList();
       public LinkedList         L2   =   new   LinkedList();
       public LinkedList         L3   =   new   LinkedList();
       public LinkedList         L4   =   new   LinkedList();
       public LinkedList         L5   =   new   LinkedList();

       public   float   N1   =   0;             public   float   N2   =   0;
       public   float   N3   =   0;             public   float   N4   =   0;
       public   float   N5   =   0;             public   float   N6   =   0;
       public   float   N7   =   0;             public   float   N8   =   0;
       public   float   N9   =   0;             public   float   N0   =   0;

       public boolean B1 = true; public boolean B2 = true;
       public boolean B3 = true; public boolean B4 = true;
       public boolean B5 = true;

       public String S1 = "";                   public String S2 = "";
       public String S3 = "";                   public String S4 = "";
       public String S5 = "";

       public Alignment alignment = new Alignment();
       public static Statistics statistics;
}


An extension of WordAligner class: This is the base class for the random initialization.
Several instances of this class are made, then subjected to many generations at a higher
than normal mutation rate. Mutation occurs within the for-loop.
public class random extends WordAligner {
        public Alignment alignSentencePair(SentencePair sentencePair) {
             alignment = new Alignment();
             N1 = numEnglishWordsInSentence(sentencePair);
             N2 = numFrenchWordsInSentence(sentencePair);
             for (N3 = 0; N3 < N2; N3++) {
               B5 = addAlignment(N4, N3, true);
             }
             return alignment;
        }

        public random(Statistics s) {
              super(s);
        }

}




                                                                                     20

Weitere ähnliche Inhalte

Was ist angesagt?

On fuzzy concepts in engineering ppt. ncce
On fuzzy concepts in engineering ppt. ncceOn fuzzy concepts in engineering ppt. ncce
On fuzzy concepts in engineering ppt. ncceSurender Singh
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparationKushaal Singla
 
Partial compute function
Partial compute functionPartial compute function
Partial compute functionRajendran
 
Data Structures and Algorithms
Data Structures and AlgorithmsData Structures and Algorithms
Data Structures and AlgorithmsPierre Vigneras
 
Selection & Making Decisions in c
Selection & Making Decisions in cSelection & Making Decisions in c
Selection & Making Decisions in cyndaravind
 
Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)
Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)
Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)Federal University of Technology of Parana
 
Sure interview algorithm-1103
Sure interview algorithm-1103Sure interview algorithm-1103
Sure interview algorithm-1103Sure Interview
 
Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...
Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...
Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...Editor IJCATR
 
NTCIR11-Math2-PattaniyilN_slides
NTCIR11-Math2-PattaniyilN_slidesNTCIR11-Math2-PattaniyilN_slides
NTCIR11-Math2-PattaniyilN_slidesNidhin Pattaniyil
 
Divide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with ModularDivide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with ModularJie Bao
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao
 

Was ist angesagt? (20)

On fuzzy concepts in engineering ppt. ncce
On fuzzy concepts in engineering ppt. ncceOn fuzzy concepts in engineering ppt. ncce
On fuzzy concepts in engineering ppt. ncce
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Database management system session 5
Database management system session 5Database management system session 5
Database management system session 5
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
A Theory of Scope
A Theory of ScopeA Theory of Scope
A Theory of Scope
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 
Partial compute function
Partial compute functionPartial compute function
Partial compute function
 
Fuzzy arithmetic
Fuzzy arithmeticFuzzy arithmetic
Fuzzy arithmetic
 
DBMS CS2
DBMS CS2DBMS CS2
DBMS CS2
 
Data Structures and Algorithms
Data Structures and AlgorithmsData Structures and Algorithms
Data Structures and Algorithms
 
Data structure-question-bank
Data structure-question-bankData structure-question-bank
Data structure-question-bank
 
Selection & Making Decisions in c
Selection & Making Decisions in cSelection & Making Decisions in c
Selection & Making Decisions in c
 
Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)
Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)
Slides Workshopon Explainable Logic-Based Knowledge Representation (XLoKR 2020)
 
Sure interview algorithm-1103
Sure interview algorithm-1103Sure interview algorithm-1103
Sure interview algorithm-1103
 
Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...
Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...
Generalized Semipre Regular Closed Sets in Intuitionistic Fuzzy Topological S...
 
NTCIR11-Math2-PattaniyilN_slides
NTCIR11-Math2-PattaniyilN_slidesNTCIR11-Math2-PattaniyilN_slides
NTCIR11-Math2-PattaniyilN_slides
 
Divide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with ModularDivide and Conquer Semantic Web with Modular
Divide and Conquer Semantic Web with Modular
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 

Andere mochten auch

Mahesh Joshi
Mahesh JoshiMahesh Joshi
Mahesh Joshibutest
 
ARDA-Insider-BAA03-0..
ARDA-Insider-BAA03-0..ARDA-Insider-BAA03-0..
ARDA-Insider-BAA03-0..butest
 
utdallas.edu
utdallas.eduutdallas.edu
utdallas.edubutest
 
Newsletter
NewsletterNewsletter
Newsletterbutest
 
Application specific Programming of the Texas Instruments ...
Application specific Programming of the Texas Instruments  ...Application specific Programming of the Texas Instruments  ...
Application specific Programming of the Texas Instruments ...butest
 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 

Andere mochten auch (9)

paper
paperpaper
paper
 
Mahesh Joshi
Mahesh JoshiMahesh Joshi
Mahesh Joshi
 
ARDA-Insider-BAA03-0..
ARDA-Insider-BAA03-0..ARDA-Insider-BAA03-0..
ARDA-Insider-BAA03-0..
 
utdallas.edu
utdallas.eduutdallas.edu
utdallas.edu
 
Newsletter
NewsletterNewsletter
Newsletter
 
Application specific Programming of the Texas Instruments ...
Application specific Programming of the Texas Instruments  ...Application specific Programming of the Texas Instruments  ...
Application specific Programming of the Texas Instruments ...
 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.doc
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 

Ähnlich wie Genetic Programming Word Aligners Machine Translation

Major Programming Paradigms
Major Programming ParadigmsMajor Programming Paradigms
Major Programming ParadigmsASIMYILDIZ
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language ProcessingVsevolod Dyomkin
 
Dsm as theory building
Dsm as theory buildingDsm as theory building
Dsm as theory buildingClarkTony
 
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...Hamidreza Soleimani
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxdickonsondorris
 
Introduction to Erlang
Introduction to ErlangIntroduction to Erlang
Introduction to ErlangRaymond Tay
 
Prolog,Prolog Programming IN AI.pdf
Prolog,Prolog Programming IN AI.pdfProlog,Prolog Programming IN AI.pdf
Prolog,Prolog Programming IN AI.pdfCS With Logic
 
Martin Chapman: Research Overview, 2017
Martin Chapman: Research Overview, 2017Martin Chapman: Research Overview, 2017
Martin Chapman: Research Overview, 2017Martin Chapman
 
Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Editor IJARCET
 
Learning and Modern Programming Languages
Learning and Modern Programming LanguagesLearning and Modern Programming Languages
Learning and Modern Programming LanguagesRay Toal
 
Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine LearningAnkit Tewari
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 

Ähnlich wie Genetic Programming Word Aligners Machine Translation (20)

3.5
3.53.5
3.5
 
Major Programming Paradigms
Major Programming ParadigmsMajor Programming Paradigms
Major Programming Paradigms
 
Erlang session1
Erlang session1Erlang session1
Erlang session1
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
R language
R languageR language
R language
 
Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
Dsm as theory building
Dsm as theory buildingDsm as theory building
Dsm as theory building
 
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
Introduction to Erlang
Introduction to ErlangIntroduction to Erlang
Introduction to Erlang
 
Prolog,Prolog Programming IN AI.pdf
Prolog,Prolog Programming IN AI.pdfProlog,Prolog Programming IN AI.pdf
Prolog,Prolog Programming IN AI.pdf
 
Martin Chapman: Research Overview, 2017
Martin Chapman: Research Overview, 2017Martin Chapman: Research Overview, 2017
Martin Chapman: Research Overview, 2017
 
Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582Ijarcet vol-2-issue-4-1579-1582
Ijarcet vol-2-issue-4-1579-1582
 
Erlang, an overview
Erlang, an overviewErlang, an overview
Erlang, an overview
 
Learning and Modern Programming Languages
Learning and Modern Programming LanguagesLearning and Modern Programming Languages
Learning and Modern Programming Languages
 
Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Oops ppt
Oops pptOops ppt
Oops ppt
 
C Tutorials
C TutorialsC Tutorials
C Tutorials
 

Mehr von butest

LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 

Mehr von butest (20)

LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 

Genetic Programming Word Aligners Machine Translation

  • 1. APPLICATION OF GENETIC PROGRAMMING TOWARDS WORD ALIGNERS BENJAMIN HEILERS Department of Electrical Engineering and Computer Science University of California, Berkeley December 2004 CS 294-5 keywords: Genetic Programming, Word Aligners, Machine Translation, Machine Learning, Genetic Algorithms, Natural Language Processing, Artificial Intelligence
  • 2. 1. Introduction This paper details the (as-of-yet-unfruitful-and-thus-determinedly-ongoing) research into the application of genetic programming towards optimizing word aligners. Popular belief holds the use of genetic programming in machine translation to be infeasible. Regardless, it is the goal of the author, admittedly due to infatuation with machine learning in general, to convince himself personally that such a wide-spread sentiment is either well-chosen or wholly amiss. A word aligner is a program coupling words in sentence pairs, in effect constructing a bilingual dictionary [1:484, 2]. Genetic Programming, a specific branch of Genetic Algorithms, is a term used to deal with search across a function space, where the natural reproductive methods – selection, mutation, crossover – are mimicked in the hopes that the great successes of evolution on living creatures may be repeated on programs [3:47-56, 4]. Genetic Algorithms deal more broadly with evolving all types of functions. Genetic Programming takes a set of programs and filters out those most resembling word aligners, to subject them to alterations in the hopes of finding yet better candidates. The process by which programs are selected tests each program on a subset of the sentence-pair corpus, thus qualifying this approach as supervised learning. Another concept appearing in this paper and warranting a definition is that of the Abstract Syntax Tree (AST), a representation for computer code which renders the code in a particularly useful format for genetically reproductive processes [5:9]. Abstract Syntax Trees are preferable due to: • it is many times simpler conceptually to apply crossover and mutation to a tree representation, than to a program code in string form. 2
  • 3. a design pattern, the Visitor Pattern, suggests an easily implementable approach to traversing this representation for program code [9, 10] Figure 1: An Eclipse Abstract Syntax Tree in Graphical and Textual Forms. Note in figure 1 that the Eclipse AST, the package used in this research, maintains some information within nodes (such as the operator in infix expressions), whereas some AST representations place formulate this information a child node. 2. Literature Review 3
  • 4. There is little literature on the application of genetic algorithms to word aligners. Instead, we turn to the literature on genetic programming, where suggestions to counteract various results-limiting phenomena abound. There are a myriad of decisions to make in implementing a genetic algorithm. Fortunately, literature provides enough detailed discussion to allow for preparations against most common problems with genetic programming. Franz Rothlauf is the first to write a book on the pros and cons of various representations in genetic algorithms [7]. Like many of his colleagues, he highly suggests tree representations for genetic programming. This representation eases the implementation of mutation and crossover tremendously, compared with the traditional representation as bit strings, whereby the chances that a mutated string still resembles working code are less than slim. 4
  • 5. public Alignment alignSentencePair (SentencePair sentencePair){ MISSING=2092010418 <= -1198423683; alignment=new Alignment(); I4=addAlignment(alignment,I4,I4,B3); B4=false; I2=numEnglishWordsInSentence(sentencePair); if (I2 < -594586326){ D2=getDouble(L3,I1); } else { addInt(L2,I1,I3); getInt(L5,I2); addBoolean(L3,I1,B2); while (I2 < 1564864814){ addDouble(L5,I5,D1); MISSING=664939021; alignment=getString(L1,I2); MISSING=311599999 * 1197784289; D1=-287916828; } } I2=numFrenchWordsInSentence(sentencePair); for (I3=0;I3 < I1;I3++){ I4=-1; D1=0; for (I5=0;I5 < I2;I5++){ D2=50 / (1 + abs(I3 - I5)); if (D2 >= D1){ D1=D2; I4=I5; } } addAlignment(alignment,I4,I3,true); } return alignment; } Figure 2: Example of Bloat. Lines resembling original file are in bold. Lines colored red are added as result of bloat phenomenon. The phenomena of bloat is widely mentioned, whereby each successive generation displays a much larger file size than the previous, yet most of the added code contribute little to no added functionality. With high rates of mutation, I have seen 450 lines (nine pages) of code introduced to an initially twenty-line file, after less than ten generations. There are several mechanisms in place to cope with bloat, as discussed later. Another commonly observed fact to cope with is over-fitting. This occurs when the genetic process is allowed to run for too long. For example, the corpus used in this research consists of 447 sentence pairs, pairing English and French sentences. If we are 5
  • 6. to choose the first ten and evolve randomly generated programs to return alignments of these, then at some point we may theoretically find a reasonable solution which not only achieves superb results on the ten training sentence pairs, but on the 447 total sentence pairs as well. However, if we continue to evolve past this point, chances are that our population will become over fit for these ten sentences. This is similar, for example, to hoping to find the equation y = x2, but instead achieving y = 1, with training data of only (-1, 1) and (1, 1). Figure 3: After fitness is reached, over-fitting to the training data may occur. 6
  • 7. Figure 4: Example of Over-Fitting. The solid black line is y = x2, the red dotted line is y = |x|, and the blue dashed line is y = 1. The training data is { (1, 1), (-1, 1) }, but the desired function is { (x, y) : y = x2 } The literature is also helpful in suggesting approximate values for the frequencies at which to apply mutation and crossover to members of the population, though the perfect values are apparently learned only by trial and error. 3. General Overview of Algorithm In general, instead of searching across the solution space, we utilize GP to aid in search across the function space. As the function space is of immense proportions, we randomly sample the function space, and then search through not only these functions but others similar to them. In the graph above, we may have a function y = x + 5. This would lead us to searching similar functions such as y = x + 6, y = 3x + 5, y = x 2 + 5, etc. Since it is infeasible to evaluate every possible function with similar form to y = x + 5, we must again find a method with which to decide which functions to search. This is the basic concept of genetic programming, where a desired set of (input, output) pairs is known, but we search for the function (or possibly one of many functions) which causes this return. Doing this search across program code is many times more complex than across math equations. 7
  • 8. The flow chart shown here is exactly the order in which genetic programming is implemented in this research. An initial population is created, by taking files such as random.java in the Appendix and sending them through several generations of high mutation. Since the current version of this GP process is still prone to producing erroneous code, Figure 5: Flow Chart of GP many more programs are generated than asked for. Each is then evaluated according to the fitness function, and those which have compile and runtime errors, (at this point mostly due to invalid arguments, incorrect casting, and undeclared variable names – see Results), is filtered out and thrown away. Thus the GP process begins with only valid programs in its initial population. From here, the population undergoes a number of iterations wherein each member is evaluated, then the next generation is selected, then crossover and mutation is allowed to occur. The rates for these are currently at 80% chance for 3-point crossover to occur (see Section 4), and 0.05% for mutation, as suggested by most literature. 8
  • 9. 4. Details of Implementation Decisions Since each design decision is independent of each other, and many need to be presented simultaneously, I have chosen to format this section by discussing each one on its own, as orderly as possible. AST representation: The members of the population could be represented in a myriad of ways. Many people implementing genetic programming choose to create a an original representation of their function. This unfortunately, is due to the newness of the field, and hinders the progress of future work by allowing for the researchers to get bogged down in minor details which could potentially be settled already. I admire Franz Rothlauf’s efforts to correct this problem, and agree with him that the best representation for my purposes is to use an AST. This allows for an easy crossover implementation, and only necessitates moderate work to implement mutation. Generational model: The generational model of a population allows for the lifespan of each population member to be only a single generation, as opposed to the Steady-State model, which not only selects members for reproduction but also selects which member they will be replacing as well [8:134]. Both models use a constant-sized population. I have chosen to go with the generational model here because of simplicity in design. Initial Population: The usual approach is to start with a completely random set of programs. This seems unreasonable. Why create programs which construct strings and draw websites when we are looking for a program to add two numbers together? I have taken an approach for which I could not find any literature, to start with a population of programs similar to that under random.java in the Appendix. In some cases, I have even 9
  • 10. placed some initial code within the for-loop, to make alignments based on superficial traits. I do not rule out the possibility that this second strategy in effect may steer me into the wrong direction, by intending to run the program both with random.java and with the other versions (entitled superficial.java). It is my hope that by providing some base code, the completely random results will be avoided and thus a better chance of finding an optimal solution is possible. Fitness function: The fitness function, the measure by which we decide which members yield the most desirable results, and thus have the most potential for being prototypes of our desired word aligner, seems obvious. The goal is to maximize the precision and recall while minimizing the AER, as defined in [11:1]. Thus the fitness function calls alignSentencePair of each population member on a small subset of the full corpus, and returns the weighted sum of these numbers (where w1, w2, w3 are the weights): 10 * [ w1 * P + w2 * R + w3 * (1 – AER) ] Countering Over-Fitting: An easy fix to countering over-fitting to the training data used in the fitness function is to keep the training set dynamic. I have implemented this by choosing a random set each time. Thus there is no worry of over-fitting to the specific set of sentence pairs being learned on, since there is no specific set of sentence pairs. Fitness-Proportionate Selection: There are two methods of selection in widespread use: tournament and proportionate fitness selection [3:37]. In tournament selection several tournaments are held in which the fitness is calculated and the winner of the tournament is selected for reproduction. As the fitness function here requires no 10
  • 11. small amount of time, for matters of efficiency I chose the less computationally expensive selection process, fitness-proportionate selection. Here each member of the population is evaluated once, and then the new generation is randomly selected, with probability proportional to the fitness of the member [see figure]. The risk here generally is that with a wide variety of fitness values, those with the lower fitness values will be excluded from selection, the diversity of the population will disappear prematurely, leading to premature convergence. It is my hope that with a non-random initial population, the disparity in the fitness values will not be as dangerous as if the populations had been truly initialized randomly. Figure 6: The Proportionate Fitness Selection Process is Akin to a Game of Darts. Elitist Strategy: This is the decision to leave the best fit member of the population in the next generation, unmodified, although this does not rule out the possibility of including genetic reproductions of this member as well. Copies to file best of each generation: Looking back to the chart in Section 2, Literature Review, we acknowledge that there exists a reduction in quality when one allows the GP process to run too many generations. To alleviate this, a copy of the member with the highest fitness value of each generation is written to file in a separate folder. The end process, where we test the evolved word aligners, tests the best fit member of each generation, not solely that of the final generation. 11
  • 12. N-point crossover: The two common methods of crossover are uniform and n- point. In uniform crossover, a single point in the genome is chosen and two members have their code swapped at this point. N-point crossover allows for this to happen at multiple points, and is much more suitable to crossover on trees, where we are not dealing with the traditional fixed-length representation as in bit strings. Protected functions and variable types: To alleviate casting problems and protect against null pointer exceptions, it is easier to Refer to WordAligner class in Appendix, which is the super class of all other word aligner classes. Confine alterations: To minimize the number of off-track members of the population, alterations to the code are kept in the area where they matter the most. The basic information needed by all word aligners is as is shown in the appendix for random.java. Every word aligner should align each French word to some word in English. Thus, only the body of the for loop is altered. Halting problem: It may occur through mutation or crossover that infinite loops are created [8:293-294]. To counter this, the fitness function makes use of threads and halts after a reasonable amount of time has elapsed. This also serves as an additional measure against excessive bloat. 5. Results As can be seen by the example code, mutation never worked as desired. Nearly every mutation results in erroneous code. It seems the large majority are casting issues (in the figure here, D3 is a double and S1 is a string, while H5 is a HashMap). Those mutated programs which do compile include useless code, such as incrementing variables 12
  • 13. used nowhere else in the program. Without mutation, the rest of the genetic programming process is little more than searching over the different orderings of Figure 2: Some Mutation Errors Still Mishandled. the statements within the for loop, which holds no possible word aligners not already visible at a first glance. Figure 5: Initial Code and Code with "Better" Result. 13
  • 14. Multiple runs each proved futile. As can be seen in the figure above, the best word aligner returned is only slightly improved in terms of performance. Looking at the code it appears to be a fluke, due to the off-chance that using the number of English words in place of the number of French words tends to give better recall results (due to the smaller length of English sentences in general relative to their French versions, less proposed alignments allows for less erroneous guesses). Indeed, generations proved of no use. In effect, the code as is merely stares at effectively the same program each generation, since crossover alone is not enough to introduce variety, and the initial population is not truly random (not with the mutation process in its current status). In the table below are shown the evaluation results of members of each generation with the highest fitness. Gen. Precision Recall AER Fitness 0 0.3658 0.2258 0.6864 9.0520 1 0.3658 0.2258 0.6864 9.0520 2 0.3535 0.2909 0.6678 9.7660 3 0.3658 0.2258 0.6864 9.0520 4 0.3658 0.2258 0.6864 9.0520 5 0.3658 0.2258 0.6864 9.0520 6 0.3935 0.1889 0.6966 8.8580 7 0.3658 0.2258 0.6864 9.0520 8 0.3658 0.2258 0.6864 9.0520 9 0.3658 0.2258 0.6864 9.0520 10 0.3658 0.2250 0.6864 9.0520 11 0.3658 0.2250 0.6864 9.0520 12 0.3658 0.2250 0.6864 9.0520 13 0.3658 0.2250 0.6864 9.0520 Figure 9: Table of Fitness Function Results on a GP run 6. Conclusion 14
  • 15. In addition to deciphering the process of mutating code in a meaningful way, there are a few other tricks which I did not have time to experiment with, but think may prove useful. With regards to premature convergence, it would be interesting to add a feature whereby the mutation rate is raised greatly for a generation to promote an increase in diversity, triggered by a low standard deviation in the fitnesses. Since I was not able to get mutation working, I never actually started off with random.java. I instead started off with the likeness of the file seen in Figure 8. From what I have seen, it seems like possibly initializing from this file yields a population lacking in diversity, even with higher mutation rates in the initial generations. It would be interesting to run a comparison between initializing from this file and from random.java. It is acknowledged widely that the process of finding correct values for mutation and crossover rates, for the number of generations, for the population size, and choosing a heuristic function are all decisions which are still made by trial and error. Studying the exact effects of raising and lowering each of these values will consume quite an amount of time, but is vital before much more work can be done in the area of genetic programming in general. 15
  • 16. Bibliography 1. Manning and Schütze. Foundations of Statistical Natural Language Processing, pg. 484 2. Automatic Construction of a Bilingual Lexicon: wwwhome.cs.utwente.nl/~irgroup/align/ 3. Ghanea-Hercock. Applied Evolutionary Algorithms in Java 4. Genetic-Programming.Org: www.genetic-programming.org/ 5. Grune, Bal, Jacobs, and Langendoen. Modern Compiler Design, pg. 9, 22, 52-55 6. Langdon and Poli. Foundations of Genetic Programming 7. Rothlauf. Representations for Genetic and Evolutionary Algorithms 8. Banzhaf, Nordin, Keller, and Francone. Genetic Programming: An Introduction 9. Gamma, Helm, Johnson, and Vlissides. Design Patterns 10. Visitor Pattern: http://en.wikipedia.org/wiki/Visitor_pattern 11. Assignment 4: Word Alignment Models: www.cs.berkeley.edu/~klein/cs294-5/cs294-5%20assignment%204.pdf Figures (original unless otherwise noted) 1. Abstract Syntax Tree in graphical and textual forms. 2. Example of Bloat. 3. After fitness is reached, overfitting to the training data may occur. [source: Schmiedle F, Drechsler N, Grosse D, Drechsler R. “Heuristic learning based on genetic programming.” Genetic Programming & Evolvable Machines, Vol. 3, Dec. 2002, pg 376] 4. Example of Over-Fitting 5. Flow Chart of GP Approach. [chart source: Sette S, Boullart L. “Genetic programming: principles and applications.” Engineering Applications of Artificial Intelligence, Vol. 14, Dec. 2001, pg 728] 16
  • 17. 6. Proportionate Fitness Selection 7. Some Mutation Errors Still Mishandled 8. Initial Code and Code with "Better" Result. 9. Table of Fitness Function Results Appended Code Crossover: Takes two statements with parents of same type (for/for, while/while, etc.) The eclipse AST toolkit requires that each node belong to a certain tree, and thus simply switching trees is not possible, instead we clone the subtrees under their new owner with the static copySubtree(targetAST, sourceNode) method. private void crossover (int index1, Statement switch1, int index2, Statement switch2) { CompilationUnit cu1 = newPop[index1]; CompilationUnit cu2 = newPop[index2]; AST ast1 = cu1.getAST(); AST ast2 = cu2.getAST(); ASTNode p1 = switch1.getParent(); ASTNode p2 = switch2.getParent(); Statement switch1_under_ast2 = (Statement) ASTNode.copySubtree(ast2,switch1); Statement switch2_under_ast1 = (Statement) ASTNode.copySubtree(ast1,switch2); switch (p1.getNodeType()) { case ASTNode.BLOCK: List m1 = ((Block) p1).statements(); List m2 = ((Block) p2).statements(); m1.set(m1.indexOf(switch1), switch2_under_ast1); m2.set(m2.indexOf(switch2), switch1_under_ast2); break; case ASTNode.IF_STATEMENT: if (switch1.getLocationInParent().getId() .equals("elseStatement")) { ((IfStatement) p2).setElseStatement(switch1_under_ast2); ((IfStatement) p1).setElseStatement(switch2_under_ast1); } else { ((IfStatement) p2).setThenStatement(switch1_under_ast2); ((IfStatement) p1).setThenStatement(switch2_under_ast1); } break; case ASTNode.WHILE_STATEMENT: ((WhileStatement) p2).setBody(switch1_under_ast2); ((WhileStatement) p1).setBody(switch2_under_ast1); break; 17
  • 18. case ASTNode.FOR_STATEMENT: ((ForStatement) p2).setBody(switch1_under_ast2); ((ForStatement) p1).setBody(switch2_under_ast1); break; default: throw new RuntimeException("unhandled crossover for nodeType: " + p1.getNodeType()); } } Mutation: Uses the Visitor Pattern and extends org.eclipse.jdt.internal.corext .dom.GenericVisitor with Mutator to implement mutation. Mutator is a file much too long to display here. The essentials are that it randomly changes register names and values in the code, as well occasionally inserting newly generated lines of code and making calls to safely defined methods (that is to say, a divide that checks for division by zero, etc.). public void mutate(int index) { random.nextFloat()*interchangeableTable.size(); Mutator mutator = new Mutator(seed, numRegisters); CompilationUnit cu = newPop[index]; AST ast = cu.getAST(); cu.accept(mutator); } WordAligner parent class: The following is edited due to length; redundant and obvious methods have been abbreviated. The Statistics object contains data from an initial pass over the corpus before hand, gathering data such as is used in unsupervised learning: Pr(f), Pr(e), Pr(f, e). public class WordAligner { protected WordAligner (Statistics s) { statistics = s; } public Alignment alignSentencePair(SentencePair s) { return null; } public float prob_f(String f) { return (float) statistics.prob_f(f); } public float prob_e(String e) { return (float) statistics.prob_e(e); } public float prob_e_and_f(SentencePair s, String f, String e) { 18
  • 19. return (float) statistics.prob_f_and_e(s,f,e); } public List getFrenchWords (SentencePair s) { return s.getFrenchWords(); } public List getEnglishWords (SentencePair s) { return s.getEnglishWords(); } public float abs (float i) { return Math.abs(i); } public float numFrenchWordsInSentence (SentencePair s) { return s.getFrenchWords().size(); } public float numEnglishWordsInSentence (SentencePair s) { return s.getEnglishWords().size(); } public float getSentenceID (SentencePair s) { return s.getSentenceID(); } public boolean addAlignment(float englishPosition, float frenchPosition, boolean sure) { int e = Math.round(englishPosition); int f = Math.round(frenchPosition); alignment.addAlignment(e, f, sure); return true; } /** GET methods **/ public String getString(List L, float i) { if (L== null || L.size() == 0) return ""; if (i >= L.size()) i = L.size()-1; if (i < 0) i = 0; return (String) L.get(Math.round(i)); } Also: getBoolean, getNumber /** ADD methods **/ public boolean addString (List L, float i, String o) { if (L == null) L = new ArrayList(); if (i >= L.size()) i = L.size()-1; if (i < 0) i = 0; L.add(Math.round(i), o); return true; } Also: addBoolean, addNumber 19
  • 20. /** FIELDS **/ public LinkedList L1 = new LinkedList(); public LinkedList L2 = new LinkedList(); public LinkedList L3 = new LinkedList(); public LinkedList L4 = new LinkedList(); public LinkedList L5 = new LinkedList(); public float N1 = 0; public float N2 = 0; public float N3 = 0; public float N4 = 0; public float N5 = 0; public float N6 = 0; public float N7 = 0; public float N8 = 0; public float N9 = 0; public float N0 = 0; public boolean B1 = true; public boolean B2 = true; public boolean B3 = true; public boolean B4 = true; public boolean B5 = true; public String S1 = ""; public String S2 = ""; public String S3 = ""; public String S4 = ""; public String S5 = ""; public Alignment alignment = new Alignment(); public static Statistics statistics; } An extension of WordAligner class: This is the base class for the random initialization. Several instances of this class are made, then subjected to many generations at a higher than normal mutation rate. Mutation occurs within the for-loop. public class random extends WordAligner { public Alignment alignSentencePair(SentencePair sentencePair) { alignment = new Alignment(); N1 = numEnglishWordsInSentence(sentencePair); N2 = numFrenchWordsInSentence(sentencePair); for (N3 = 0; N3 < N2; N3++) { B5 = addAlignment(N4, N3, true); } return alignment; } public random(Statistics s) { super(s); } } 20