Dark Circle - Translation : The cool way to contribute to F/OSS #2(우분투와 번역 이야기 두번째) (2011Y10M29D)

Seong-ho, Cho
darkcircle . 0426 at gmail dot com
그놈 한국 로컬팀
Xfce 한국 로컬 커미터
Translation :
The cool way to contribute to F/OSS
Part 2. Translating Automation (1)

At the viewpoint of the engineer . . .
• 언제까지 이걸 일일히 손으로 다 쳐야됨 ?
• 번역기의 번역이 엉망이면 , 차라리 내가 번역기를 만
들어버려 ?
-> LET YOU DO IT! IF YOU CAN.

Making translator is easy, but …
• 번역기를 만들기 전에 배워두면 좋은 기초학문
• 번역할 언어들에 대한 언어학 ( 국어 , 영어 , 언어개
론 )
• 이산수학 ( 집합 , 그래프 , 위상수학 , etc. )
• 자료구조 ( + 알고리즘 )
• 프로그래밍 언어론
• 컴파일러 ( + 오토마타와 형식언어 )

Before we translate some sentences
• Need something to know
• Grammar
• Grammar generation rule
• A kind of Sentence components
• Transition relation of sentence structure between each
different languages

Token Analysis
• 나 는 학교 에 가 는 중
이다
^^ ^^ ^^^^ ^^ ^^ ^^ ^^ ^^^^
주어 조사 명사 조사 동사 조사 형용사 서술
• I am going to school
^^^ ^^^ ^^^^^ ^^ ^^^^^^
Subject verb adjective preposition noun

Rearranging components of
each sentences after replacing words
• 나 ( 는 ) 학교 에 가는중 이다
• How we can model these steps to make translator?

Abstract Syntax Tree
( like binary tree structure )
am going to
I school
가는중 에
나
이다
학교

Semantic Analysis
• By using “Grammar rule”
• In single hop
• In multi hop
Predicate
Subject Object
Predicate
Subject Predicate
Subject Object
Object Phrase

Sentence generation (finish)
• Korean
• 나 ( 는 ) 학교에 가는중이다
• English

Compiler Phases
Front-End
Back-End
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code
Generation
Code Optimization
Executable Binary
Generation

Languages
• Natural Language
• Language for the conversation between Humans or
Animals
• Formal Language
• “Formal” Languages that used by “virtual turing
machine” which is made for impersonating human
Type Grammar Automaton Prod. Rule
0 Recursively
Enumerable
Turing Machine No Rest.
1 Context-Sensitive Linear-bounded
non-determistic
αAβ -> αγβ
2 Context-Free Non-determistic
Push-down
A->γ
3 Regular Expression Finite State A->αB A->Aβ A->α

Regular Expression
• 문자열 유효성 검사를 위한 검사식
• [] : 하나의 문자 선택 ( - : 범위지정 ex. A-Z : A 부터 Z )
• () : 문자열 그룹화
• {x,y} : 최소 x 번 , 최대 y 번 반복
• | : or 연산자
• . : any one character
• * : 0 or more
• + : 1 or more
• ? : 0 or 1
• ^blabla : blabla 로 시작하는 문장
• blabla$ : blabla 로 끝나는 문장
• d : 숫자 0~9
• D : 숫자 이외의 모든 문자
• w : 대소문자 숫자
• W : 대소문자 숫자 이외의 모든 문자
• s : 공백
• S : 공백 이외의 모든 문자
• ( 특수문자 ) : 특수문자 자체를 나타냄

Terminology
• Token : 글자 한 개 ( 1~2bytes )
• Symbol : 의미를 갖는 Token 하나 혹은 스트링
• Lookahead : 토큰 의미를 파악하기 위해 다음 문
자를 들여다보는 것
• Parse : 구분 분석이나 계산을 위한 트리 생성

Classes of the Symbol
• ID
• 변수 이름 , 함수 이름
• [a-zA-Z][a-zA-Z0-9_]*
• Keyword
• 형 이름 , 예약어 , 제어문
• Digit
• [0-9]
• Operator
• +, -, *, /, %, ==, !=, >=, <=, <, >, !, …
• Delimiter
• ;, (, ), [, ], {, }, ., ->, “, ‘, …
• <, > ( for Template in C++ of Generic in Java )

Grammar
• 모든 문법은 시작 심볼 S 로 시작
• Non-terminal 은 대문자로 terminal 은 소문자로표기
• Terminal 은 ε (empty string) 도 포함한다
• Non-terminal 에서 terminal 로 전이 (derive) 한다고
한다
• Terminal 에서 Non-terminal 로 가는 것은 reduce 라
고 한다 ( 의미를 가진 단어의 길이가 Non-terminal
보다 긴 것에 착안함 )
• 최종 결과는 terminal symbol string 이 되어야 한다
• BNF : Beckus-Naur Form
• S -> E+E | E-E | E*E | E/E
• E -> S | (S) | F
• F -> 0|1|2|3|4|5|6|7|8|9

Eliminate Ambiguity
• If we have a given grammar
• S-> E+E | E*E
• E-> S | F
• F-> 0|1|2|3|4|5|6|7|8|9
• We can generate below parse tree for 2 + 3 * 5
2
*+
*
3 5
+
2 3
5

Eliminate Ambiguity
• Result (Abstract Syntax Tree)
2
+
*
3 5

Classes for Grammar
• LL(x) and LR(x)
• Left input
• Left or Right parse tree.
• x is number of lookahead token
• LL(x) Grammar
• Programming Language
• Mathematical Expression
• LR(x) Grammar
• English-like Language

Tools for Generating Analyzer
• Lex and Yacc
• Lexical Analyzer generator
• Yet Another Compiler Compiler
• Flex and Bison
• Fast LEXical analyzer generator
• is not part of GNU Project
• Bison
• Parser generator which is in the GNU Project

Reference Codes.
• Simple Calculator
• svn://darkcircle.myhome.tv/Calc
• CER Browser for IRCBot
• svn://darkcircle.myhome.tv/ExchangeList

The Plan of
Generating
Translator

Requirements
• Some SQL-Database
To store many words and maintaining matching table
• Should not better to use Regular Expression
Too many case of selection of the word.
• Need some normalized terminology matching list
Such as dictionary for the computer science.
• Language?
• Maybe C is better.
• If you are more friendly to another language, go on.
• Such as perl, python, java, and so on.

Program Structure
Dictionary
PO File
Extr. word
Get msgid
Gen. Tree
Replace
word
Rearrange
Complet
ed?
Write final
msgstr
N
Y
Ins. To
Symb.
Bucket

Dark Circle - Translation : The cool way to contribute to F/OSS #2(우분투와 번역 이야기 두번째) (2011Y10M29D)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Dark Circle - Translation : The cool way to contribute to F/OSS #2(우분투와 번역 이야기 두번째) (2011Y10M29D)

Ähnlich wie Dark Circle - Translation : The cool way to contribute to F/OSS #2(우분투와 번역 이야기 두번째) (2011Y10M29D) (20)

Mehr von Ubuntu Korea Community

Mehr von Ubuntu Korea Community (20)

Dark Circle - Translation : The cool way to contribute to F/OSS #2(우분투와 번역 이야기 두번째) (2011Y10M29D)