3. Japanese Characteristics
› No spaces
› Kanas and Kanjis
Thus, requires
› Automatically cutting into components
However, to prevent too much sized dictionary
› Regulations can be set
Kana texts in which no kanjis are used
Kana-kanji texts in which kanjis are used wherever
possible according to the official directives about the
use of kana and kanjis.
› This is “pre-editing”
4. Each kana will be Romanized
› To preserve
one-to-one correspondence between kanas and
their correspondent Roman letters
› Better analyzed with Roman letters than kanas
Fewer varieties of suffixes
Fewer rules of permissible combinations with
canonical stems
Fewer possibilities of homographic verbal stems
Kanji will be replaced with irreducible unit
token
› No kanji will contain more than one
“morpheme”
5. Segmentation of a continuous run of
tokens
› Based on following prospects:
Auxiliary items will be shorter in length and
fewer in number
No problem will be caused by:
assuming every “phrase” in a sentence begins with a
dictionary item
including “prefixes” in the category of dictionary items
6.
7. Predictive analysis:
› Originally by Rhodes
Peculiarity seen in Japanese :
› More convenient to start from end of sentence:
Words having a final position in a sentence are
limited
Particles which show case, prepositional or
conjunctional relationships always follow words,
phrases or clauses to which they are attached
Attributive words, phrases and clauses always
stand before DT substantives which they modify
8. Each word in a sentence will be assigned
› An essence which has been fulfilled by it
› A linkage number which shows by which word it
has been predicted
› A group number which shows to which clause in
the sentence it belongs
Another peculiarity about Japanese:
› The subject of a sentence is very often omitted
Hence, in this analysis:
› Subject market and relative subject marker
predictions is essential
11. This stage deals with the synthesis of the TL
Brief explanation:
› Words with same group num. are gathered
› Transformation of word order is performed
In concrete:
› Subject marker, object marker & relative subject
marker are omitted
› Subject master or relative subject master comes
first within each group
› followed by predicate head or relative
predicate head
› and then by object master
12.
13. Readings in Machine Translation
› Edited by Sergei Nirenburg, Harold Somers,
and Yorick Wilks
› The MIT Press