SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron
[ http://www.linkedin.com/in/aaronhan ]
This document introduces the record by Aaron when running Moses translation systems
(http://www.statmt.org/moses/). If you want to get more information about the Moses please see the
official website of Moses or Moses-manual.pdf (http://www.statmt.org/moses/manual/manual.pdf).
1. Download virtual box from https://www.virtualbox.org/wiki/Downloads
2. Install virtual box by double click the “VirtualBox-4.2.18-88781-Win.exe” file. [To make the
Ubuntu fluent, take the following actions.
A1.download and install the VirtualBox Extension Pack outside the VM:
https://www.virtualbox.org/wiki/Downloads
A2.Ubuntu->Devices -> Install guest additions…]
3. Press “New” button to guide your “driver” (I use nlp2ct-Linux.vdi) into the virtual box. During
the selection, chose the Ubuntu64-bit. Setting the base memory around 7GB.

4. Press the “start” to lunch the Ubuntu system in the virtual box.
5. Download Giza++ (word alignment model) from https://code.google.com/p/gizapp/downloads/list
6. Download IRSTLM (language model) from http://hlt.fbk.eu/en/irstlm
7. Download Moses (Statistical Machine Translation decoder) from
http://www.statmt.org/moses/?n=Moses.Releases
8. Install the Giza++. If you store the Giza++ in the address Home/Aaron/Moses/giza-pp, open the
“Terminal”, get into this address (press “ls” to show the items under your current position; press
“cd xx” to enter the xx file; press “cd ..” to jump out current file), then press “make” command.
If show “g++ command not found”, type “g++”, then type “sudo apt-get install g++” to install g++.
After the install of g++, type “make” command to install Giza++ again.

9. Install IRSTLM. Get into the location of the IRSTLM file. (install guideline
http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Installation_Guidelines) Type “sh
regenerate-makefiles.sh”. if shows “command not found aclocal failed” type “sudo apt-get
install automake”, if show “libtoolize: no such file” type “sudo apt-get install libtool”. Then type
“bash regenerate-makefiles.sh” again.
Type “./configure –prefix=/Home/Aaron/Moses/irstlm-5.80.03” to generate and locate the
“Makefile”

Type “make” for compilation. If show “zlib.h: No such file” type “sudo apt-get install zlib1g-dev”
to install zlib. Type “make” again.
Type “sudo make install” for installation.

The IRSTLM library and commands are generated respectively under the address
“/Home/Aaron/Moses/irstlm-5.80.03”
10. Install Boost (C++ libraries).
Download it from ( http://www.boost.org/users/history/version_1_52_0.html ).
cd boost_1_52_0
./bootstrap.sh –prefix=/Home/Aaron/Moses/boost_1_52_0
Type “./b2” to install boost.

11. Install other dependencies (gcc, zlib, bzip2).
Type “sudo apt-get install build-essential libz-dev libbz2-dev” to install.
12. Install Moses.
Type “git clone git://github.com/moses-smt/mosesdecoder.git”.

Compile Moses. To examine the options you want, type “cd ~/mosesdecoder”, “./bjam --help”.
There will be automatic updates.

13. Run Moses with an example.
Type the following commands to run the example:
cd ~/mosesdecoder
wget http://www.statmt.org/moses/download/sample-models.tgz
tar xzzf sample-models.tgz
cd sample-models
~/Aaron/Moses/mosesdecoder/bin/moses –f phrase-model/moses.ini < phrase-model/in > out
The translation results of source sentence “das ist ein kleines haus” will be shown in the file of
out as “it is a small house”. Succeed!

To test the Chart decoder, type the following command, the translation result will be shown in
out.stt file:
~/Aaron/Moses/mosesdecoder/bin/moses_chart –f string-to-tree/moses.ini < string-to-tree/in >
out.stt
To test the tree-to-tree demo, type the following command, the translation result will be shown
in out.ttt file:
~/Aaron/Moses/mosesdecoder/bin/moses_chart –f tree-to-tree/moses.ini < tree-to-tree/in.xml >
out.ttt
======================================================
Another way to install GIZA++:
Type the following commands to install GIZA++.
wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.7.tar.gz
tar xzvf giza-pp-v1.0.7.tar.gz
cd giza-pp
make
above commands will create the binaries ~/giza-pp/GIZA++-v2/GIZA++, ~/giza-pp/GIZA++v2/snt2cooc.out and ~/giza-pp/mkcls-v2/mkcls. To automatically copy these files to somewhere
that Moses can find, type the following commands:
cd ~/mosesdecoder
mkdir tools
cp ~/Aaron/Moses/giza-pp/GIZA++-v2/GIZA++ ~/Aaron/Moses/giza-pp/GIZA++-v2/snt2cooc.out
~/Aaron/Moses/giza-pp/mkcls-v2/mkcls tools
above commands will copy the three binaries “GIZA++, snt2cooc.out, and mkcls” into the
directory “~/mosesdecoder/tools”.
When you come to run the training, you need to tell the training script where GIZA++ was
installed using the –external-bin-dir argument as below:
train-model.perl –external-bin-dir $HOME/mosesdecoder/tools
========================================================
Another way to install IRSTLM:
Download the latest version of IRSTLM.
tar zxvf irstlm-5.80.03.tgz
cd irstlm-5.80.03
./regenerate-makefiles.sh
./configure –prefix=$Home/Aaron/Moses/irstlm-5.80.03
sudo make install
========================================================
Corpus preparation:
mkdir corpus
cd corpus
wget http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz
tar zxvf training-parallel-nc-v8.tgz
corpus tokenization (add spaces between words and punctuations):
type the following command for corpus tokenization:
~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <
~/Aaron/corpus/training/news-commentary-v8.fr-en.en > ~/Aaron/corpus/news-commentaryv8.fr-en.tok.en
~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr <
~/Aaron/corpus/training/news-commentary-v8.fr-en.fr > ~/Aaron/corpus/news-commentaryv8.fr-en.tok.fr
The tokenized files will be generated in the rectory file “~/Aaron/corpus”
Truecasing (to reduce data sparsity, convert the initial words of the sentence into the most
probable casing):
To get the truecasing training models that are the statistics data extracted from text, type the
following commands:
~/Aaron/Moses/mosesdecoder/scripts/recaser/train-truecaser.perl --model
~/Aaron/corpus/trucase-model.en --corpus ~/Aaron/corpus/news-commentary-v8.fr-en.tok.en
~/Aaron/Moses/mosesdecoder/scripts/recaser/train-truecaser.perl --model
~/Aaron/corpus/trucase-model.fr --corpus ~/Aaron/corpus/news-commentary-v8.fr-en.tok.fr
Above commands will generate the model files “truecase-model.en” and “truecase-model.fr”
under the directory “/Aaron/corpus”
Using the extracted truecasing training models to perform the truecase function as below
commands:
~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/trucasemodel.en < ~/Aaron/corpus/news-commentary-v8.fr-en.tok.en > ~/Aaron/corpus/newscommentary-v8.fr-en.true.en
~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/trucasemodel.fr < ~/Aaron/corpus/news-commentary-v8.fr-en.tok.fr > ~/Aaron/corpus/newscommentary-v8.fr-en.true.fr
Above commands will generate the files “news-commentary-v8.fr-en.true.ed” and “newscommentary-v8.fr-en.true.fr” under directory “Aaron/corpus”
Cleaning (to remove the mis-aligned sentencs, long sentences and empty sentences, which
may cause problems with the training pipeline)
Type the following command to delete the sentences whose length is larger than 80:
~/Aaron/Moses/mosesdecoder/scripts/training/clean-corpus-n.perl ~/Aaron/corpus/newscommentary-v8.fr-en.true fr en ~/Aaron/corpus/news-commentary-v8.fr-en.clean 1 80

The files “news-commentary-v8.fr-en.clean.en” and “news-commentary-v8.fr-en.clean.fr” will
be generated in the directory “~/Aaron/corpus”.
========================================================
Language model training (built on the target language to ensure fluent output translation):
cd ~/Aaron
mkdir lm
cd lm
~/Aaron/Moses/irstlm-5.80.03/scripts/add-start-end.sh < ~/Aaron/corpus/news-commentaryv8.fr-en.true.en > news-commentary-v8.fr-en.sb.en
The above commends will generate the file “news-commentary-v8.fr-en.sb.en” in the directory
“Aaron/lm”. In this file, each sentence is added with the start and end symbol “<s> </s>”.
Type the following command to generate the language model of English.
export IRSTLM=$Home/Aaron/Moses/irstlm-5.80.03; ~/Aaron/Moses/irstlm5.80.03/scripts/build-lm.sh –i news-commentary-v8.fr-en.sb.en –t ./tmp –p –s improved-kneserney –o news-commentary-v8.fr-en.lm.en

Above commend generate the file “news-commentary-v8.fr-en.lm.en.gz” in the directory
“Aaron/lm”.
Type the following command to generate the file “news-commentary-v8.fr-en.arpa.en” in the
directory “Aaron/lm”
~/Aaron/Moses/irstlm-5.80.03/src/compile-lm –text news-commentary-v8.fr-en.lm.en.gz newscommentary-v8.fr-en.arpa.en

To make the faster loading, type the following command to binary the arpa.en file
“~/Aaron/Moses/mosesdecoder/bin/build_binary news-commentary-v8.fr-en.arpa.en newscommentary-v8.fr-en.blm.en”

To check the language model, type the following command: “$ echo “is this an English
sentence ?” | ~/Aaron/Moses/mosesdecoder/bin/query news-commentary-v8.fr-en.arpa.en”. it
will show the following result.
If using the following command: “$ echo “is this an English sentence ?” |
~/Aaron/Moses/mosesdecoder/bin/query news-commentary-v8.fr-en.blm.en”. it will show the
following result:

========================================================
Training translation system (run word alignment, phrase extraction and scoring, create
lexicalized reordering tables and create Moses configuration file):
Type the following command to train the translation system using language model:
cd ~/Aaron
mkdir working
cd working
~/Aaron/Moses/mosesdecoder/scripts/training/train-model.perl -root-dir train -corpus
~/Aaron/corpus/news-commentary-v8.fr-en.clean -f fr -e en -alignment grow-diag-final-and
-reordering msd-bidirectional-fe -lm 0:3:$HOME/Aaron/lm/news-commentary-v8.fr-en.blm.en:8
-external-bin-dir ~/Aaron/Moses/mosesdecoder/tools >& training.out &
The running details are written into the ~/working/training.out file step by step as below:

If you want to see the running thread, type command “top” to show the running details in the
window as below:

Type command “ctrl+c” to shut down the current running (top) shown in the window as below
[type "ctrl+z" to pause the run; type "jobs" show the running; type "bg + job-num" to put the detail
background; type "fg + job-num" to show runing thread (bring to fore-ground)]:
Around 2 hours and 10 minutes later, the translation training stage will be finished. By
command “ctrl+c”, the following figure show the cmd window content:

The following content is shown in the end of the file “~/working/training.out”:
Type “fg 2”, it will show that the progress is finished:

There will be a “train” file generated at “~/Aaron/working/train”. The “train” file contains 4 files
including “corpus, giza.en-fr, giza.fr-en, and model”. The 4 files contain the below files
respectively:
========================================================
Tuning translation system:
The weights used by Moses to weight the different models against each other are not optimized,
as shown in the moses.ini file. To tune the parameters, it requires a small amount of parallel
data, which is separated from the training data.
First, we download the WMT08 data, tokenise and truecase the corpus. The WMT08 corpus is
used as development set in WMT12.
Make a mew file “tune” to store the tuning corpus, then
cd ~/corpus/tune
wget http://www.statmt.org/wmt12/dev.tgz
tar zxvf dev.tgz
~/Aaron/corpus/tune$ ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <
dev/news-test2008.en > news-test2008.tok.en
~/Aaron/corpus/tune$ ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr <
dev/news-test2008.en > news-test2008.tok.fr
~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.en < news-test2008.tok.en > news-test2008.true.en
~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.fr < news-test2008.tok.fr > news-test2008.true.fr
After above commands, there will be the following files generated for tuning:
Now, we begin the tuning stage using the MERT (minimum error rate training) method.
cd ~/Aaron/working
~/Aaron/Moses/mosesdecoder/scripts/training/mert-moses.pl ~/Aaron/corpus/tune/newstest2008.true.fr ~/Aaron/corpus/tune/news-test2008.true.en
~/Aaron/Moses/mosesdecoder/bin/moses train/model/moses.ini --mertdir
~/Aaron/Moses/mosesdecoder/bin/ &> mert.out &
top,

Type “Ctrl+c” to exit the showing, “jobs” to show the job number, “ctrl+z” to pause the job, “fg
[num_job]” to show the running job foreground, “ps” to show running, the jobs are paused as
below:

To save the time, type the following command to run the tuning with 6 threads,
~/Aaron/Moses/mosesdecoder/scripts/training/mert-moses.pl ~/Aaron/corpus/tune/newstest2008.true.fr ~/Aaron/corpus/tune/news-test2008.true.en
~/Aaron/Moses/mosesdecoder/bin/moses train/model/moses.ini --mertdir
~/Aaron/Moses/mosesdecoder/bin/ --decoder-flags="-threads 6" &> mert.out &

Type “ctrl+c”, “ps -aux” to show detailed running,

Type “ps –aux|grep moses” to show the running that only related to moses:
A long time later, type “top” showing there will be no running of “moses”, which means the
tuning is finished:

There will be one document “mert.out” generated under working file, at the same time, a
“mert-work” file containing the tuning results “moses.ini” is generated under working file as
below:

Begin the tuning at 2013-10-29-17:00, finish the tuning and generate the “moses.ini” file at
2013-10-30-09:44, it takes around 16 hours (using 7 threads). The “moses.ini” file contains the
tuned parameters and other information as following:
In the file “working/mert.out”, it shows:
========================================================
Testing the translation model:
This stage is to test the translation quality of the built translation model. The translation quality
is measured by the automatic evaluation metric score, such as the metric BLEU(2002),
METEOR(2005), and LEOPR(2012), etc.
Type the command:
cd ~/Aaron
~/Aaron/Moses/mosesdecoder/bin/moses -f ~/Aaron/working/mert-work/moses.ini

It will take some minutes to show the finish the initializing LexicalReordering and reading
phrase-table as following:
Type a French sentence “c'est une petite maison.(//this is a small house.)” and “enter”, it will
translate it as below, which is not a fully translated sentence:

Type another French sentence “vous êtes beau (// you are handsome)”, it will translate it into
“you are beautiful” as below:

Type “ctrl+c” to exit the job.
To make the translation faster, we should binary the lexicalized reordering models and the
phrase-table.
mkdir ~/Aaron/working/binarised-model
Type the following command to binaries the phrase-table:
~/Aaron/Moses/mosesdecoder/bin/processPhraseTable -ttable 0 0 train/model/phrase-table.gz
-nscores 5 -out binarised-model/phrase-table

It will generate the following files:
To binary the lexical reordering-table, type the following command:
~/Aaron/Moses/mosesdecoder/bin/processLexicalTable -in train/model/reordering-table.wbemsd-bidirectional-fe.gz -out binarised-model/reordering-table

It will generate the files:
Copy the ~/working/mert-work/moses.ini document into the ~/binarised-model. Then change
the moses.ini content as below to point to the binarised files:

Using the binarised tables, the loading will be very fast using the following command:
~/Aaron/Moses/mosesdecoder/bin/moses -f ~/Aaron/working/binarised-model/moses.ini

If you type the above two french sentences again it will generate the same translations as
before.
Type the commands to prepare the testing data as below:
mkdir ~/Aaron/corpus/test
cd ~/corpus/test
~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en
<~/Aaron/corpus/tune/dev/newstest2011.en > newstest2011.tok.en
~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr <
~/Aaron/corpus/tune/dev/newstest2011.fr > newstest2011.tok.fr
~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.en < newstest2011.tok.en > newstest2011.true.en
~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.fr < newstest2011.tok.fr > newstest2011.true.fr
To make the translation faster, we can filter the trained translation model to retain the entries
that are only needed to translate the offered test corpus:
VirtualBox:~/Aaron/working$ ~/Aaron/Moses/mosesdecoder/scripts/training/filter-modelgiven-input.pl filtered-newstest2011 mert-work/moses.ini
~/Aaron/corpus/test/newstest2011.true.fr -Binarizer
~/Aaron/Moses/mosesdecoder/bin/processPhraseTable
The filtering and binary stage finished as below:

Above command will generate a “~/Aaron/working/filtered-newstest2011” file that contains the
following documents:
Then we translate the testing corpus and score the translation quality using BLEU metric:
nlp2ct@nlp2ct-VirtualBox:~/Aaron/working$ ~/Aaron/Moses/mosesdecoder/bin/moses -f
~/Aaron/working/filtered-newstest2011/moses.ini < ~/Aaron/corpus/test/newstest2011.true.fr >
~/Aaron/working/newstest2011.translated.en 2> ~/Aaron/working/newstest2011.out

It takes abound 25 minutes to finish the translation. There will be two documents generated
“~/working/newstest2011.out & ~/working/newstest2011.translated.en”.
The document newstest2011.translated.en contains the translated output sentences; the
document newstest2011.out contains the detailed translation procedure and its volume is larger:
Type the following command to test the BLEU score of the automatic translation, as compared
with the reference translation:
VirtualBox:~/Aaron/working$ ~/Aaron/Moses/mosesdecoder/scripts/generic/multi-bleu.perl -lc
~/Aaron/corpus/test/newstest2011.true.en < ~/Aaron/working/newstest2011.translated.en

It shows that the BLEU score is 23.41.

============================================================
Reference:
Moses manual: [http://www.statmt.org/moses/manual/manual.pdf], accessed 2013.10.31
TianLiang’ blog: [http://www.tianliang123.com/moses_installation], accessed 2013.10.31.

Weitere ähnliche Inhalte

Was ist angesagt?

[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례BJ Jang
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing Sandeep Wakchaure
 
Object Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCE
Object Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCEObject Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCE
Object Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCEVipin Kumar
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Visualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksVisualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksSungminYou
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)
OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)
OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)Shigenori Ueda
 
CMake - Introduction and best practices
CMake - Introduction and best practicesCMake - Introduction and best practices
CMake - Introduction and best practicesDaniel Pfeifer
 
공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정JungHwan Yun
 
OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)
OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)
OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)Shigenori Ueda
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Decomposition technique In Software Engineering
Decomposition technique In Software Engineering Decomposition technique In Software Engineering
Decomposition technique In Software Engineering Bilal Hassan
 
Native hook mechanism in Android Bionic linker
Native hook mechanism in Android Bionic linkerNative hook mechanism in Android Bionic linker
Native hook mechanism in Android Bionic linkerKevin Mai-Hsuan Chia
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Rubymametter
 
Lect2 conventional software management
Lect2 conventional software managementLect2 conventional software management
Lect2 conventional software managementmeena466141
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from IntelEdge AI and Vision Alliance
 

Was ist angesagt? (20)

[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
 
Types of Error in PHP
Types of Error in PHPTypes of Error in PHP
Types of Error in PHP
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
 
Object Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCE
Object Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCEObject Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCE
Object Oriented Software Engineering (OOSE) presentation on SOFTWARE MAINTENANCE
 
Software process
Software processSoftware process
Software process
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Visualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksVisualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networks
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)
OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)
OpenModelica tutorials_1(超初級チュートリアル1 解析モデルの作成と実行)
 
CMake - Introduction and best practices
CMake - Introduction and best practicesCMake - Introduction and best practices
CMake - Introduction and best practices
 
공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정
 
OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)
OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)
OpenModelica tutorials_7 PlantModel(超初級チュートリアル7.プラントモデル)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Decomposition technique In Software Engineering
Decomposition technique In Software Engineering Decomposition technique In Software Engineering
Decomposition technique In Software Engineering
 
Native hook mechanism in Android Bionic linker
Native hook mechanism in Android Bionic linkerNative hook mechanism in Android Bionic linker
Native hook mechanism in Android Bionic linker
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
 
Lect2 conventional software management
Lect2 conventional software managementLect2 conventional software management
Lect2 conventional software management
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
 

Andere mochten auch

How to run_moses 2
How to run_moses 2How to run_moses 2
How to run_moses 2Mahmoud Eid
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataPier Luca Lanzi
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...TAUS - The Language Data Network
 
Hindi –tamil text translation
Hindi –tamil text translationHindi –tamil text translation
Hindi –tamil text translationVaibhav Agarwal
 

Andere mochten auch (6)

How to run_moses 2
How to run_moses 2How to run_moses 2
How to run_moses 2
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
 
Hindi –tamil text translation
Hindi –tamil text translationHindi –tamil text translation
Hindi –tamil text translation
 

Ähnlich wie Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron

Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
 
Making environment for_infrastructure_as_code
Making environment for_infrastructure_as_codeMaking environment for_infrastructure_as_code
Making environment for_infrastructure_as_codeSoshi Nemoto
 
macos installation automation
macos installation automationmacos installation automation
macos installation automationJon Fuller
 
Oracle11g On Fedora14
Oracle11g On Fedora14Oracle11g On Fedora14
Oracle11g On Fedora14kmsa
 
Install websphere message broker 8 RHEL 6 64 bits
Install websphere message broker 8 RHEL 6 64 bitsInstall websphere message broker 8 RHEL 6 64 bits
Install websphere message broker 8 RHEL 6 64 bitsManuel Vega
 
Vagrant development environment
Vagrant   development environmentVagrant   development environment
Vagrant development environmentHiraq Citra M
 
Scripting for infosecs
Scripting for infosecsScripting for infosecs
Scripting for infosecsnancysuemartin
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantArtefactual Systems - AtoM
 
Project 2 how to install and compile os161
Project 2 how to install and compile os161Project 2 how to install and compile os161
Project 2 how to install and compile os161Xiao Qin
 
Getting started with robonova in ubuntu
Getting started with robonova in ubuntuGetting started with robonova in ubuntu
Getting started with robonova in ubuntuSandeep Saini
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014biicode
 
HowTo Install openMPI on Ubuntu
HowTo Install openMPI on UbuntuHowTo Install openMPI on Ubuntu
HowTo Install openMPI on UbuntuA Jorge Garcia
 
Fedora Atomic Workshop handout for Fudcon Pune 2015
Fedora Atomic Workshop handout for Fudcon Pune  2015Fedora Atomic Workshop handout for Fudcon Pune  2015
Fedora Atomic Workshop handout for Fudcon Pune 2015rranjithrajaram
 

Ähnlich wie Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron (20)

Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
 
Making environment for_infrastructure_as_code
Making environment for_infrastructure_as_codeMaking environment for_infrastructure_as_code
Making environment for_infrastructure_as_code
 
Bash Scripting
Bash ScriptingBash Scripting
Bash Scripting
 
macos installation automation
macos installation automationmacos installation automation
macos installation automation
 
Oracle11g On Fedora14
Oracle11g On Fedora14Oracle11g On Fedora14
Oracle11g On Fedora14
 
Oracle11g on fedora14
Oracle11g on fedora14Oracle11g on fedora14
Oracle11g on fedora14
 
Install websphere message broker 8 RHEL 6 64 bits
Install websphere message broker 8 RHEL 6 64 bitsInstall websphere message broker 8 RHEL 6 64 bits
Install websphere message broker 8 RHEL 6 64 bits
 
Composer
ComposerComposer
Composer
 
Vagrant development environment
Vagrant   development environmentVagrant   development environment
Vagrant development environment
 
Scripting for infosecs
Scripting for infosecsScripting for infosecs
Scripting for infosecs
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with Vagrant
 
Project 2 how to install and compile os161
Project 2 how to install and compile os161Project 2 how to install and compile os161
Project 2 how to install and compile os161
 
Getting started with robonova in ubuntu
Getting started with robonova in ubuntuGetting started with robonova in ubuntu
Getting started with robonova in ubuntu
 
Linux
LinuxLinux
Linux
 
Installation instructions for R
Installation instructions for RInstallation instructions for R
Installation instructions for R
 
Unix Administration 2
Unix Administration 2Unix Administration 2
Unix Administration 2
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
 
HowTo Install openMPI on Ubuntu
HowTo Install openMPI on UbuntuHowTo Install openMPI on Ubuntu
HowTo Install openMPI on Ubuntu
 
Fedora Atomic Workshop handout for Fudcon Pune 2015
Fedora Atomic Workshop handout for Fudcon Pune  2015Fedora Atomic Workshop handout for Fudcon Pune  2015
Fedora Atomic Workshop handout for Fudcon Pune 2015
 
005 skyeye
005 skyeye005 skyeye
005 skyeye
 

Mehr von Lifeng (Aaron) Han

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Lifeng (Aaron) Han
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaLifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationLifeng (Aaron) Han
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveyLifeng (Aaron) Han
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelLifeng (Aaron) Han
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the societyLifeng (Aaron) Han
 

Mehr von Lifeng (Aaron) Han (20)

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a survey
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the society
 

Kürzlich hochgeladen

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Kürzlich hochgeladen (20)

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron

  • 1. Build Moses on Ubuntu (64-bit) in VirtualBox: recorded by Aaron [ http://www.linkedin.com/in/aaronhan ] This document introduces the record by Aaron when running Moses translation systems (http://www.statmt.org/moses/). If you want to get more information about the Moses please see the official website of Moses or Moses-manual.pdf (http://www.statmt.org/moses/manual/manual.pdf). 1. Download virtual box from https://www.virtualbox.org/wiki/Downloads 2. Install virtual box by double click the “VirtualBox-4.2.18-88781-Win.exe” file. [To make the Ubuntu fluent, take the following actions. A1.download and install the VirtualBox Extension Pack outside the VM: https://www.virtualbox.org/wiki/Downloads A2.Ubuntu->Devices -> Install guest additions…] 3. Press “New” button to guide your “driver” (I use nlp2ct-Linux.vdi) into the virtual box. During the selection, chose the Ubuntu64-bit. Setting the base memory around 7GB. 4. Press the “start” to lunch the Ubuntu system in the virtual box. 5. Download Giza++ (word alignment model) from https://code.google.com/p/gizapp/downloads/list 6. Download IRSTLM (language model) from http://hlt.fbk.eu/en/irstlm 7. Download Moses (Statistical Machine Translation decoder) from http://www.statmt.org/moses/?n=Moses.Releases
  • 2. 8. Install the Giza++. If you store the Giza++ in the address Home/Aaron/Moses/giza-pp, open the “Terminal”, get into this address (press “ls” to show the items under your current position; press “cd xx” to enter the xx file; press “cd ..” to jump out current file), then press “make” command. If show “g++ command not found”, type “g++”, then type “sudo apt-get install g++” to install g++. After the install of g++, type “make” command to install Giza++ again. 9. Install IRSTLM. Get into the location of the IRSTLM file. (install guideline http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Installation_Guidelines) Type “sh regenerate-makefiles.sh”. if shows “command not found aclocal failed” type “sudo apt-get install automake”, if show “libtoolize: no such file” type “sudo apt-get install libtool”. Then type “bash regenerate-makefiles.sh” again. Type “./configure –prefix=/Home/Aaron/Moses/irstlm-5.80.03” to generate and locate the “Makefile” Type “make” for compilation. If show “zlib.h: No such file” type “sudo apt-get install zlib1g-dev” to install zlib. Type “make” again.
  • 3. Type “sudo make install” for installation. The IRSTLM library and commands are generated respectively under the address “/Home/Aaron/Moses/irstlm-5.80.03” 10. Install Boost (C++ libraries). Download it from ( http://www.boost.org/users/history/version_1_52_0.html ). cd boost_1_52_0 ./bootstrap.sh –prefix=/Home/Aaron/Moses/boost_1_52_0
  • 4. Type “./b2” to install boost. 11. Install other dependencies (gcc, zlib, bzip2). Type “sudo apt-get install build-essential libz-dev libbz2-dev” to install.
  • 5. 12. Install Moses. Type “git clone git://github.com/moses-smt/mosesdecoder.git”. Compile Moses. To examine the options you want, type “cd ~/mosesdecoder”, “./bjam --help”. There will be automatic updates. 13. Run Moses with an example. Type the following commands to run the example: cd ~/mosesdecoder wget http://www.statmt.org/moses/download/sample-models.tgz tar xzzf sample-models.tgz cd sample-models ~/Aaron/Moses/mosesdecoder/bin/moses –f phrase-model/moses.ini < phrase-model/in > out
  • 6. The translation results of source sentence “das ist ein kleines haus” will be shown in the file of out as “it is a small house”. Succeed! To test the Chart decoder, type the following command, the translation result will be shown in out.stt file: ~/Aaron/Moses/mosesdecoder/bin/moses_chart –f string-to-tree/moses.ini < string-to-tree/in > out.stt
  • 7. To test the tree-to-tree demo, type the following command, the translation result will be shown in out.ttt file: ~/Aaron/Moses/mosesdecoder/bin/moses_chart –f tree-to-tree/moses.ini < tree-to-tree/in.xml > out.ttt
  • 8. ====================================================== Another way to install GIZA++: Type the following commands to install GIZA++. wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.7.tar.gz tar xzvf giza-pp-v1.0.7.tar.gz cd giza-pp make above commands will create the binaries ~/giza-pp/GIZA++-v2/GIZA++, ~/giza-pp/GIZA++v2/snt2cooc.out and ~/giza-pp/mkcls-v2/mkcls. To automatically copy these files to somewhere that Moses can find, type the following commands: cd ~/mosesdecoder mkdir tools
  • 9. cp ~/Aaron/Moses/giza-pp/GIZA++-v2/GIZA++ ~/Aaron/Moses/giza-pp/GIZA++-v2/snt2cooc.out ~/Aaron/Moses/giza-pp/mkcls-v2/mkcls tools above commands will copy the three binaries “GIZA++, snt2cooc.out, and mkcls” into the directory “~/mosesdecoder/tools”. When you come to run the training, you need to tell the training script where GIZA++ was installed using the –external-bin-dir argument as below: train-model.perl –external-bin-dir $HOME/mosesdecoder/tools ======================================================== Another way to install IRSTLM: Download the latest version of IRSTLM. tar zxvf irstlm-5.80.03.tgz cd irstlm-5.80.03 ./regenerate-makefiles.sh ./configure –prefix=$Home/Aaron/Moses/irstlm-5.80.03 sudo make install ======================================================== Corpus preparation: mkdir corpus cd corpus wget http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz tar zxvf training-parallel-nc-v8.tgz corpus tokenization (add spaces between words and punctuations): type the following command for corpus tokenization: ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < ~/Aaron/corpus/training/news-commentary-v8.fr-en.en > ~/Aaron/corpus/news-commentaryv8.fr-en.tok.en ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr < ~/Aaron/corpus/training/news-commentary-v8.fr-en.fr > ~/Aaron/corpus/news-commentaryv8.fr-en.tok.fr The tokenized files will be generated in the rectory file “~/Aaron/corpus” Truecasing (to reduce data sparsity, convert the initial words of the sentence into the most probable casing):
  • 10. To get the truecasing training models that are the statistics data extracted from text, type the following commands: ~/Aaron/Moses/mosesdecoder/scripts/recaser/train-truecaser.perl --model ~/Aaron/corpus/trucase-model.en --corpus ~/Aaron/corpus/news-commentary-v8.fr-en.tok.en ~/Aaron/Moses/mosesdecoder/scripts/recaser/train-truecaser.perl --model ~/Aaron/corpus/trucase-model.fr --corpus ~/Aaron/corpus/news-commentary-v8.fr-en.tok.fr Above commands will generate the model files “truecase-model.en” and “truecase-model.fr” under the directory “/Aaron/corpus” Using the extracted truecasing training models to perform the truecase function as below commands: ~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/trucasemodel.en < ~/Aaron/corpus/news-commentary-v8.fr-en.tok.en > ~/Aaron/corpus/newscommentary-v8.fr-en.true.en ~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/trucasemodel.fr < ~/Aaron/corpus/news-commentary-v8.fr-en.tok.fr > ~/Aaron/corpus/newscommentary-v8.fr-en.true.fr Above commands will generate the files “news-commentary-v8.fr-en.true.ed” and “newscommentary-v8.fr-en.true.fr” under directory “Aaron/corpus” Cleaning (to remove the mis-aligned sentencs, long sentences and empty sentences, which may cause problems with the training pipeline) Type the following command to delete the sentences whose length is larger than 80: ~/Aaron/Moses/mosesdecoder/scripts/training/clean-corpus-n.perl ~/Aaron/corpus/newscommentary-v8.fr-en.true fr en ~/Aaron/corpus/news-commentary-v8.fr-en.clean 1 80 The files “news-commentary-v8.fr-en.clean.en” and “news-commentary-v8.fr-en.clean.fr” will be generated in the directory “~/Aaron/corpus”.
  • 11. ======================================================== Language model training (built on the target language to ensure fluent output translation): cd ~/Aaron mkdir lm cd lm ~/Aaron/Moses/irstlm-5.80.03/scripts/add-start-end.sh < ~/Aaron/corpus/news-commentaryv8.fr-en.true.en > news-commentary-v8.fr-en.sb.en The above commends will generate the file “news-commentary-v8.fr-en.sb.en” in the directory “Aaron/lm”. In this file, each sentence is added with the start and end symbol “<s> </s>”. Type the following command to generate the language model of English. export IRSTLM=$Home/Aaron/Moses/irstlm-5.80.03; ~/Aaron/Moses/irstlm5.80.03/scripts/build-lm.sh –i news-commentary-v8.fr-en.sb.en –t ./tmp –p –s improved-kneserney –o news-commentary-v8.fr-en.lm.en Above commend generate the file “news-commentary-v8.fr-en.lm.en.gz” in the directory “Aaron/lm”. Type the following command to generate the file “news-commentary-v8.fr-en.arpa.en” in the directory “Aaron/lm”
  • 12. ~/Aaron/Moses/irstlm-5.80.03/src/compile-lm –text news-commentary-v8.fr-en.lm.en.gz newscommentary-v8.fr-en.arpa.en To make the faster loading, type the following command to binary the arpa.en file “~/Aaron/Moses/mosesdecoder/bin/build_binary news-commentary-v8.fr-en.arpa.en newscommentary-v8.fr-en.blm.en” To check the language model, type the following command: “$ echo “is this an English sentence ?” | ~/Aaron/Moses/mosesdecoder/bin/query news-commentary-v8.fr-en.arpa.en”. it will show the following result.
  • 13. If using the following command: “$ echo “is this an English sentence ?” | ~/Aaron/Moses/mosesdecoder/bin/query news-commentary-v8.fr-en.blm.en”. it will show the following result: ======================================================== Training translation system (run word alignment, phrase extraction and scoring, create lexicalized reordering tables and create Moses configuration file): Type the following command to train the translation system using language model: cd ~/Aaron mkdir working cd working ~/Aaron/Moses/mosesdecoder/scripts/training/train-model.perl -root-dir train -corpus ~/Aaron/corpus/news-commentary-v8.fr-en.clean -f fr -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:$HOME/Aaron/lm/news-commentary-v8.fr-en.blm.en:8 -external-bin-dir ~/Aaron/Moses/mosesdecoder/tools >& training.out &
  • 14. The running details are written into the ~/working/training.out file step by step as below: If you want to see the running thread, type command “top” to show the running details in the window as below: Type command “ctrl+c” to shut down the current running (top) shown in the window as below [type "ctrl+z" to pause the run; type "jobs" show the running; type "bg + job-num" to put the detail background; type "fg + job-num" to show runing thread (bring to fore-ground)]:
  • 15. Around 2 hours and 10 minutes later, the translation training stage will be finished. By command “ctrl+c”, the following figure show the cmd window content: The following content is shown in the end of the file “~/working/training.out”:
  • 16. Type “fg 2”, it will show that the progress is finished: There will be a “train” file generated at “~/Aaron/working/train”. The “train” file contains 4 files including “corpus, giza.en-fr, giza.fr-en, and model”. The 4 files contain the below files respectively:
  • 17.
  • 18. ======================================================== Tuning translation system: The weights used by Moses to weight the different models against each other are not optimized, as shown in the moses.ini file. To tune the parameters, it requires a small amount of parallel data, which is separated from the training data. First, we download the WMT08 data, tokenise and truecase the corpus. The WMT08 corpus is used as development set in WMT12. Make a mew file “tune” to store the tuning corpus, then cd ~/corpus/tune wget http://www.statmt.org/wmt12/dev.tgz tar zxvf dev.tgz ~/Aaron/corpus/tune$ ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < dev/news-test2008.en > news-test2008.tok.en ~/Aaron/corpus/tune$ ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr < dev/news-test2008.en > news-test2008.tok.fr ~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.en < news-test2008.tok.en > news-test2008.true.en ~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.fr < news-test2008.tok.fr > news-test2008.true.fr After above commands, there will be the following files generated for tuning:
  • 19. Now, we begin the tuning stage using the MERT (minimum error rate training) method. cd ~/Aaron/working ~/Aaron/Moses/mosesdecoder/scripts/training/mert-moses.pl ~/Aaron/corpus/tune/newstest2008.true.fr ~/Aaron/corpus/tune/news-test2008.true.en ~/Aaron/Moses/mosesdecoder/bin/moses train/model/moses.ini --mertdir ~/Aaron/Moses/mosesdecoder/bin/ &> mert.out & top, Type “Ctrl+c” to exit the showing, “jobs” to show the job number, “ctrl+z” to pause the job, “fg [num_job]” to show the running job foreground, “ps” to show running, the jobs are paused as below: To save the time, type the following command to run the tuning with 6 threads,
  • 20. ~/Aaron/Moses/mosesdecoder/scripts/training/mert-moses.pl ~/Aaron/corpus/tune/newstest2008.true.fr ~/Aaron/corpus/tune/news-test2008.true.en ~/Aaron/Moses/mosesdecoder/bin/moses train/model/moses.ini --mertdir ~/Aaron/Moses/mosesdecoder/bin/ --decoder-flags="-threads 6" &> mert.out & Type “ctrl+c”, “ps -aux” to show detailed running, Type “ps –aux|grep moses” to show the running that only related to moses:
  • 21. A long time later, type “top” showing there will be no running of “moses”, which means the tuning is finished: There will be one document “mert.out” generated under working file, at the same time, a “mert-work” file containing the tuning results “moses.ini” is generated under working file as below: Begin the tuning at 2013-10-29-17:00, finish the tuning and generate the “moses.ini” file at 2013-10-30-09:44, it takes around 16 hours (using 7 threads). The “moses.ini” file contains the tuned parameters and other information as following:
  • 22. In the file “working/mert.out”, it shows:
  • 23. ======================================================== Testing the translation model: This stage is to test the translation quality of the built translation model. The translation quality is measured by the automatic evaluation metric score, such as the metric BLEU(2002), METEOR(2005), and LEOPR(2012), etc. Type the command: cd ~/Aaron ~/Aaron/Moses/mosesdecoder/bin/moses -f ~/Aaron/working/mert-work/moses.ini It will take some minutes to show the finish the initializing LexicalReordering and reading phrase-table as following:
  • 24. Type a French sentence “c'est une petite maison.(//this is a small house.)” and “enter”, it will translate it as below, which is not a fully translated sentence: Type another French sentence “vous êtes beau (// you are handsome)”, it will translate it into “you are beautiful” as below: Type “ctrl+c” to exit the job. To make the translation faster, we should binary the lexicalized reordering models and the phrase-table.
  • 25. mkdir ~/Aaron/working/binarised-model Type the following command to binaries the phrase-table: ~/Aaron/Moses/mosesdecoder/bin/processPhraseTable -ttable 0 0 train/model/phrase-table.gz -nscores 5 -out binarised-model/phrase-table It will generate the following files:
  • 26. To binary the lexical reordering-table, type the following command: ~/Aaron/Moses/mosesdecoder/bin/processLexicalTable -in train/model/reordering-table.wbemsd-bidirectional-fe.gz -out binarised-model/reordering-table It will generate the files:
  • 27. Copy the ~/working/mert-work/moses.ini document into the ~/binarised-model. Then change the moses.ini content as below to point to the binarised files: Using the binarised tables, the loading will be very fast using the following command: ~/Aaron/Moses/mosesdecoder/bin/moses -f ~/Aaron/working/binarised-model/moses.ini If you type the above two french sentences again it will generate the same translations as before. Type the commands to prepare the testing data as below: mkdir ~/Aaron/corpus/test cd ~/corpus/test ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <~/Aaron/corpus/tune/dev/newstest2011.en > newstest2011.tok.en ~/Aaron/Moses/mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr < ~/Aaron/corpus/tune/dev/newstest2011.fr > newstest2011.tok.fr ~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.en < newstest2011.tok.en > newstest2011.true.en ~/Aaron/Moses/mosesdecoder/scripts/recaser/truecase.perl --model ~/Aaron/corpus/truecasemodel.fr < newstest2011.tok.fr > newstest2011.true.fr To make the translation faster, we can filter the trained translation model to retain the entries that are only needed to translate the offered test corpus: VirtualBox:~/Aaron/working$ ~/Aaron/Moses/mosesdecoder/scripts/training/filter-modelgiven-input.pl filtered-newstest2011 mert-work/moses.ini ~/Aaron/corpus/test/newstest2011.true.fr -Binarizer ~/Aaron/Moses/mosesdecoder/bin/processPhraseTable
  • 28. The filtering and binary stage finished as below: Above command will generate a “~/Aaron/working/filtered-newstest2011” file that contains the following documents:
  • 29. Then we translate the testing corpus and score the translation quality using BLEU metric: nlp2ct@nlp2ct-VirtualBox:~/Aaron/working$ ~/Aaron/Moses/mosesdecoder/bin/moses -f ~/Aaron/working/filtered-newstest2011/moses.ini < ~/Aaron/corpus/test/newstest2011.true.fr > ~/Aaron/working/newstest2011.translated.en 2> ~/Aaron/working/newstest2011.out It takes abound 25 minutes to finish the translation. There will be two documents generated “~/working/newstest2011.out & ~/working/newstest2011.translated.en”. The document newstest2011.translated.en contains the translated output sentences; the document newstest2011.out contains the detailed translation procedure and its volume is larger:
  • 30. Type the following command to test the BLEU score of the automatic translation, as compared with the reference translation: VirtualBox:~/Aaron/working$ ~/Aaron/Moses/mosesdecoder/scripts/generic/multi-bleu.perl -lc ~/Aaron/corpus/test/newstest2011.true.en < ~/Aaron/working/newstest2011.translated.en It shows that the BLEU score is 23.41. ============================================================
  • 31. Reference: Moses manual: [http://www.statmt.org/moses/manual/manual.pdf], accessed 2013.10.31 TianLiang’ blog: [http://www.tianliang123.com/moses_installation], accessed 2013.10.31.