SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
TAUS	
  MACHINE	
  TRANSLATION	
  SHOWCASE	
  


Strategies for Building Competitive
Advantage and Revenue from
Machine Translation

14:40 – 15:00
Wednesday, 10 April 2013

Dion Wiggins
Asia Online
Business	
  Strategies	
  for	
  Building	
  
Strategic	
  Advantage	
  and	
  Revenue	
  from	
  
             Machine	
  Transla<on	
  


                            Dion	
  Wiggins	
  
                            Chief	
  Execu<ve	
  Officer	
  
                            dion.wiggins@asiaonline.net	
  
                            	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Human	
  Resources	
                                                   •  Data	
  Requirements	
  
                     –  Linguis@c	
                                                  –  Third	
  party	
  
                                  •  Language	
  /	
  Transla@on	
                          •  Free,	
  Commercial	
  
                                  •  Natural	
  Language	
                           –    Internal	
  data	
  
                                     Programming	
  (NLP)	
                          –    Data	
  manufacturing	
  
                     –  Technical	
                                                  –    Clean	
  vs.	
  Dirty	
  Data	
  SMT	
  
                                  •  Opera@ng	
  System	
                            –    Rules	
  vs.	
  SMT	
  vs.	
  Hybrid	
  
                                  •  SoGware	
  installa@on	
  and	
  
                                     support	
                                   •  Skill	
  Development	
  
                     –  Programming	
                                                –  Hosted	
  -­‐	
  basic	
  skills	
  
                                  •  Tailoring	
  to	
  needs	
  of	
  the	
         –  Onsite	
  Moses	
  –	
  
                                     business	
                                         comprehensive	
  
                                  •  Integra@on	
  with	
  other	
  tools	
  
                                     and	
  plaLorms	
                           •  TMS	
  /	
  Workflow	
  
       •  Infrastructure	
                                                          Integra@on	
  
                     –  Hardware	
                                                   –  Pre-­‐built,	
  custom	
  
                                  •  Hosted,	
  purchased	
                             development	
  
                     –  SoGware	
                                                •  Document	
  Format	
  Support	
  
                                  •  Licensed,	
  Hosted,	
  Open	
  
                                     Source	
  	
                                    –  Wide,	
  limited	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Transla@on	
  Costs	
                               •  Project	
  Type	
  
                     –  Monthly	
  fee,	
  per	
  word,	
          –  Language	
  Pair	
  
                        human	
  resources	
                       –  Domain	
  
       •  Customiza@on	
  Costs	
                             •  Risk	
  
                     –  Up	
  front,	
  embedded	
  on	
           –  Managed	
  by	
  expert	
  
                        transla@on	
  costs,	
  human	
            –  Managed	
  by	
  your	
  term	
  
                        resources	
                                –  Likelihood	
  of	
  failure	
  
       •  Management	
  Costs	
                               •  Time	
  to	
  Quality	
  
                     –  Oversight,	
  improvement	
  	
            –  Trained	
  by	
  professionals,	
  
       •  Control	
                                                   learned	
  skills	
  
                     –  Extensive,	
  limited	
               •  Cost	
  of	
  Post	
  Edi@ng	
  
       •  Data	
  Security	
                                       –  Higher	
  quality	
  MT	
  should	
  
                     –  Contract,	
  internal	
                       result	
  in	
  lower	
  cost	
  of	
  
                                                                      edi@ng	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
M	
        T	
  
                  Machine	
  Transla<on	
  
                                                              50	
  Years	
  of	
  
                                         eMpTy	
  Promises	
  
  Q	
                       Why	
  does	
  an	
  industry	
  that	
  has	
  spent	
  50	
  years	
  
                            failing	
  to	
  deliver	
  on	
  its	
  promises	
  s@ll	
  exist?	
  

  A	
                    An	
  infinite	
  demand	
  –	
  a	
  well	
  defined	
  and	
  
                         growing	
  problem	
  that	
  has	
  always	
  been	
  looking	
  
                         for	
  a	
  solu@on	
  –	
  what	
  was	
  missing	
  was	
  …	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Quality	
  
                                                              Control	
  
                                                              Focus	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
1.	
  Customize	
                            2.	
  Measure	
  
                                                               Create	
  a	
  new	
  custom	
  engine	
     Measure	
  the	
  quality	
  of	
  the	
  
                                                               using	
  founda@on	
  data	
  and	
          engine	
  for	
  ra@ng	
  and	
  future	
  
                                                               your	
  own	
  language	
  assets	
          improvement	
  comparisons	
  




                                                              4.	
  Manage	
                                3.	
  Improve	
  
                                                              Manage	
  transla@on	
  projects	
            Provide	
  correc@ve	
  feedback	
  
                                                              while	
  genera@ng	
  correc@ve	
             removing	
  poten@al	
  for	
  	
  
                                                              data	
  for	
  quality	
  improvement.	
      transla@on	
  errors.	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Quality requires an
                 understanding of
                     the data
         There is no exception to this rule


Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
1.             Click	
  Training	
  Data	
  tab.	
  
    2.             Click	
  on	
  Upload	
  and	
  select	
  TMX	
  files.	
  
    3.             Click	
  Training	
  Data	
  tab.	
  
    4.             Click	
  Build	
  

           Some	
  even	
  brag	
  that	
  it	
  is	
  this	
  simple.
           	
  
                                                                                	
  	
  




           “Seriously,	
  that’s	
  it!”	
  

                      Perhaps	
  it	
  should	
  have	
  been 	
  
                                                                     	
  	
  




                       “Seriously,	
  that’s	
  it????”	
  

Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Simply	
  upload	
  your	
  data	
  and	
  
                                                                 magic	
  happens	
  to	
  create	
  a	
  
                                                                 custom	
  MT	
  engine	
  in	
  hours/
                                                                 minutes.	
  
                                                              •  Seriously,	
  that’s	
  it!	
  
    Flaws	
  in	
  the	
  One	
  BuWon	
  Instant	
  MT	
  Approach	
  
 •  MT	
  cannot	
  not	
  read	
  your	
  mind.	
  
 •  It	
  cannot	
  determine	
  which	
  wri<ng	
  
    style,	
  target	
  audience,	
  formats,	
  
    vocabulary	
  or	
  capitaliza<on	
  you	
  want.	
  	
  
 •  It	
  cannot	
  determine	
  what	
  is	
  missing	
  
    and	
  whether	
  your	
  data	
  is	
  suitable	
  for	
  
    your	
  goal.	
  
 •  You	
  don’t	
  know	
  which	
  is	
  the	
  right	
  data	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Just	
  Add	
  Water	
  Upload	
  Data
                                                               If	
  it	
  was	
  really	
  this	
  
                                                              easy,	
  don’t	
  you	
  think	
  
                                                               custom	
  MT	
  success	
  
                                                                stories	
  would	
  be	
  
                                                                      everywhere?	
  


Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
“The	
  ready	
  availability	
  of	
  the	
  Moses	
  MT	
  engine	
  under	
  an	
  open	
  source	
  license	
  
      enables	
  everybody	
  to	
  create	
  staCsCcal	
  MT	
  engines	
  from	
  parallel	
  data	
  with	
  a	
  
                                         moderate	
  amount	
  of	
  effort.”	
  
   •  Moses	
  Case	
  study	
  that	
  describes	
  the	
  effort	
  in	
  detail:	
  hhp://slidesha.re/KwkdUH	
  
   •  Summary:	
  
                –        Needs	
  expert	
  programmer,	
  expert	
  project	
  manager	
  
                –        Requires	
  very	
  powerful	
  hardware	
  
                –        Large	
  amounts	
  of	
  soGware	
  development	
  
                –        TAUS	
  Data	
  Associa@on	
  membership	
  EUR	
  15,000	
  for	
  data	
  
                –        360	
  man	
  hours	
  to	
  set	
  up	
  first	
  pilot	
  
                –        Mul@-­‐year	
  effort	
  with	
  considerable	
  funding	
  required	
  
                –        Transla@on	
  quality	
  close	
  to	
  that	
  of	
  Bing	
  
   	
   “With	
  self-­‐serve	
  MT,	
  clients	
  without	
  the	
  necessary	
  MT	
  and	
  compuCng	
  experCse	
  to	
  
        install	
  Moses	
  themselves,	
  have	
  for	
  the	
  first	
  Cme	
  the	
  ability	
  to	
  build	
  an	
  MT	
  system	
  
                      based	
  on	
  their	
  own	
  user	
  requirements	
  preLy	
  much	
  instantly.“	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Do	
  it	
  yourself	
  Moses	
  and	
  Self	
  Service	
  Moses	
  primarily	
  
    target	
  and	
  solve	
  the	
  engineering	
  complexity	
  of	
  
    deploying	
  a	
  basic	
  Moses	
  system	
  
 •  There	
  are	
  many	
  other	
  technical	
  and	
  data	
  
    requirements	
  necessary	
  
 •  Many	
  addi@onal	
  technology	
  components	
  are	
  needed.	
  
    Some	
  have	
  not	
  yet	
  been	
  developed	
  such	
  as	
  TMS	
  
    integra@on,	
  XML	
  tag	
  handling	
  etc.	
  

 For	
  a	
  good	
  blog	
  entry	
  and	
  discussion	
  on	
  this	
  topic	
  see	
  
 hLp://bit.ly/rWAxG7	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
1.   What	
  is	
  the	
  right	
  data	
  to	
  upload	
  for	
  my	
  MT	
  system?	
  
 2.   How	
  should	
  I	
  prepare	
  my	
  data?	
  
 3.   What	
  cleaning	
  can	
  I	
  do	
  that	
  the	
  magic	
  1	
  click	
  buhon	
  	
  
      does	
  not	
  do?	
  
 4.  What	
  impact	
  will	
  my	
  data	
  have	
  on	
  the	
  MT	
  system?	
  
 5.  Will	
  the	
  data	
  I	
  upload	
  improve	
  or	
  decrease	
  quality?	
  
 6.  What	
  will	
  mixing	
  data	
  from	
  mul@ple	
  domains	
  do	
  to	
  my	
  MT	
  system?	
  
 7.  Should	
  I	
  add	
  some	
  or	
  all	
  of	
  the	
  TAUS	
  data	
  to	
  my	
  system?	
  
 8.  Once	
  I	
  have	
  a	
  system,	
  how	
  can	
  I	
  make	
  it	
  beher?	
  
 9.  When	
  I	
  see	
  an	
  error	
  in	
  my	
  MT	
  output,	
  how	
  can	
  I	
  know	
  the	
  cause	
  of	
  the	
  error?	
  
 10.  When	
  I	
  see	
  an	
  error	
  in	
  my	
  MT	
  output,	
  how	
  can	
  I	
  fix	
  the	
  error?	
  
 11.  …	
  
 ..	
  
 1.             …	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.	
  	
  	
  	
  	
  …	
  



Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Defini@on	
  
                 –        Domain	
  
                 –        Target	
  Audience	
  
                 –        Preferred	
  Wri@ng	
  Style	
  
                 –        Glossaries,	
  Non-­‐Translatable	
  Terms,	
  Preferred	
  Capitaliza@on	
  
                 –        Special	
  Formapng	
  Requirements	
  
                 –        Quality	
  Requirements	
  
    •  Data	
  Gathering	
  
                 –  Source	
  data	
  in	
  domain	
                           Provided	
  by	
  client	
  and	
  gathered	
  
                 –  Bilingual	
  data	
  to	
  support	
  domain	
             from	
  third	
  par@es.	
  
                 –  Monolingual	
  data	
  to	
  support	
  domain	
  
    •  Data	
  Analysis	
  
                 –  Gap	
  analysis	
  
                 –  High	
  frequency	
  terms	
  
                 –  Term	
  extrac@on	
  
    •  Data	
  Genera@on	
  
                 –  Suppor@ng	
  grammar	
  structures	
  
                 –  Source	
  Data	
  Analysis	
  
    •  Cleaning	
  of	
  Data	
  
    •  Tuning	
  and	
  Test	
  Set	
  Prepara@on	
  
    •  Diagnos@c	
  Engine	
  
                 –  Fine	
  tuning	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Near	
  human	
  quality	
  automated	
  transla<on	
  designed	
  
    for	
  the	
  professional	
  transla<on	
  industry	
                                                                       “  We	
  found	
  that	
  52%	
  of	
  the	
  raw	
  
                                                                                                                           original	
  output	
  from	
  Asia	
  Online	
  had	
  
                                                                                                                          no	
  errors	
  at	
  all	
  –	
  which	
  is	
  great	
  for	
  an	
  

                                                                                                                                                                                                          ”
        –  Many	
  customers	
  have	
  achieved	
  quality	
  levels	
  where	
  more	
  than	
  50%	
  
           of	
  raw	
  machine	
  transla@on	
  requires	
  no	
  edi@ng	
  at	
  all	
                                                                      ini<al	
  engine.	
  	
  	
  	
  	
  	
  .	
  	
  	
  	
  
                                                                                                                                                                 –	
  Kevin	
  Nelson,	
  	
  
        –  Case	
  studies	
  of	
  customers	
  that	
  have	
  achieved	
  3	
  x	
  margin	
  with	
  1/3	
  the	
  
           human	
  resources	
                                                                                                                              Managing	
  Director,	
  	
  
                                                                                                                                                    Omnilingua	
  Worldwide	
  
        –  Regularly	
  replacing	
  compe@tors	
  pre-­‐exis@ng	
  installa@ons	
  
 •  Machine	
  +	
  Human	
  approach	
  delivers	
  higher	
  quality	
  
    than	
  a	
  human	
  only	
  approach	
  
        –  More	
  consistent	
  wri@ng	
  style	
  and	
  more	
  accurate	
  terminology	
                              Complete	
  Stylis<c	
  
 •  Rapid	
  ongoing	
  transla<on	
  quality	
  improvement	
  
        –  Post	
  edited	
  machine	
  transla@on	
  is	
  fed	
  back	
  to	
  the	
  engine	
  which	
  learns	
  
                                                                                                                              Control	
  	
  
           from	
  its	
  previous	
  errors	
  by	
  analyzing	
  the	
  correc@ons	
                                     Two	
  different	
  output	
  styles	
  
        –  Live	
  feedback	
  as	
  new	
  content	
  is	
  published	
                                                  for	
  the	
  same	
  input	
  sentence	
  
 •  Enable	
  clients	
  to	
  control	
  preferred	
  terminology,	
  
    vocabulary	
  and	
  wri<ng	
  style	
  
Spanish	
  Original	
  
                                   Se	
  necesitó	
  una	
  gran	
  maniobra	
  polí@ca	
  muy	
  prudente	
  a	
  fin	
  de	
  facilitar	
  una	
  
Before	
  
                                   cita	
  de	
  los	
  dos	
  enemigos	
  históricos.	
  
Transla<on:	
  
Business	
  News	
      Significant	
  amounts	
  of	
  cau@ous	
  poli@cal	
  maneuvering	
  were	
  required	
  in	
  order	
  
Aaer	
  Transla<on:	
   to	
  facilitate	
  a	
  rendezvous	
  between	
  the	
  two	
  biher	
  historical	
  opponents.	
  
Children’s	
  Books	
   A	
  lot	
  of	
  care	
  was	
  taken	
  to	
  not	
  upset	
  others	
  when	
  organizing	
  the	
  mee@ng	
  
Aaer	
  Transla<on:P	
  te	
  Ltd	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
   between	
  the	
  two	
  long	
  @me	
  enemies.	
  
LP	
       Top-­‐Level	
                             Engines/Sub-­‐Domains	
  
                                                                  Domain	
  
                                                   EN-­‐ES	
     Automo<ve	
       	
     Honda	
       Cars	
                   User	
  Manuals	
  
                                                                                   	
                                            Engineering	
  Service	
  Manuals	
  
                                                                                   	
  
                                                                                   	
                   Motorbikes	
             User	
  Manuals	
  
                                                                                   	
                                            Engineering	
  Service	
  Manuals	
  
                                                                                   	
      Toyota	
     Marke@ng	
  
                                                                                   	
                   Service	
  Reports	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Dirty	
  Data	
  SMT	
  Model	
   •  Data	
  	
   from	
  as	
  many	
  sources	
  as	
  
                                      –  Gathered	
  
                                                                      possible.	
  
                                                                   –  Domain	
  of	
  knowledge	
  does	
  not	
  maher.	
  
                                                                   –  Data	
  quality	
  is	
  not	
  important.	
  	
  
                                                                   –  Data	
  quan<ty	
  is	
  important.	
  
                                                              •  Theory	
  	
  
                                                                   –  Good	
  data	
  will	
  be	
  more	
  sta<s<cally	
  
                                                                      relevant.	
  	
  


 Clean	
  Data	
  SMT	
  Model	
   •  Data	
  
                                       –  Gathered	
  from	
  a	
  small	
  number	
  of	
  
                                                                      trusted	
  quality	
  sources.	
  
                                                                   –  Domain	
  of	
  knowledge	
  must	
  match	
  
                                                                      target	
  
                                                                   –  Data	
  quality	
  is	
  very	
  important.	
  
                                                                   –  Data	
  quan@ty	
  is	
  less	
  important.	
  
                                                              •  Theory	
  
                                                                   –  Bad	
  or	
  undesirable	
  paAerns	
  cannot	
  be	
  
                                                                      learned	
  if	
  they	
  don’t	
  exist	
  in	
  the	
  data.	
  	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  There	
  is	
  no	
  magic	
  in	
  MT,	
  human	
  effort	
  is	
  required.	
  
 •  The	
  quality	
  of	
  the	
  output	
  and	
  suitability	
  	
  
    for	
  purpose	
  is	
  directly	
  in	
  propor@on	
  
    	
  to	
  the	
  amount	
  of	
  human	
  effort.	
  
 •  Without	
  human	
  direc@on,	
  	
  
    MT	
  will	
  cost	
  more	
  	
  
    in	
  the	
  long	
  term	
  	
  
    and	
  is	
  more	
  likely	
  	
  
    to	
  fail.	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Bad	
  transla@ons	
  
•  Out	
  of	
  domain	
  text	
  
•  Unbalanced	
  /	
  Biased	
  
             –  Too	
  much	
  text	
  from	
  other	
  domains	
  
•         Mixed	
  /	
  Wrong	
  language	
  
•         Junk	
  and	
  noise	
  
•         Broken	
  HTML	
  
•         Mixed	
  Encoding	
  
•         Missing	
  diacri@cs	
  	
  
             –  café	
  vs.	
  cafe	
  
•  OCR	
  Text	
  
•  Machine	
  translated	
  text	
  
•  Anything	
  that	
  is	
  not	
  high	
  	
  
   quality	
  and	
  in	
  domain	
  

      Put	
  Simply:	
  If	
  a	
  bad	
  paWern	
  does	
  not	
  exist	
  in	
  
     your	
  training	
  data,	
  you	
  cannot	
  generate	
  such	
  a	
  
              bad	
  paWern	
  as	
  transla<on	
  output.	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
English	
  Source	
                                       Human	
  Transla<on	
                          Google	
  Transla<on	
   Google	
  Context	
  
    I	
  went	
  to	
  the	
  bank	
                          Fui	
  al	
  banco	
                        Fui	
  al	
  banco	
                     Bank	
  as	
  in	
  finance	
  
    I	
  went	
  to	
  the	
  bank	
  to	
                    Fui	
  al	
  banco	
  para	
  depositar	
   Fui	
  al	
  banco	
  a	
  depositar	
   Bank	
  as	
  in	
  finance	
  
    deposit	
  money	
                                        dinero	
                                    el	
  dinero	
  
    I	
  went	
  to	
  the	
  bank	
  of	
                    Fui	
  en	
  coche	
  a	
  la	
                Fui	
  a	
  la	
  orilla	
  de	
  la	
  vuelta	
   Bank	
  as	
  in	
  river	
  bank	
  
    the	
  turn	
  in	
  my	
  car	
                          inclinación	
  de	
  la	
  vuelta	
            en	
  mi	
  coche	
  
    I	
  put	
  my	
  car	
  into	
  the	
                    Puse	
  mi	
  coche	
  en	
  la	
              Pongo	
  mi	
  coche	
  en	
  el	
              Bank	
  as	
  in	
  finance	
  
    bank	
  of	
  the	
  turn	
                               inclinación	
  de	
  la	
  vuelta.	
           banco	
  de	
  la	
  vuelta	
  
    I	
  swam	
  to	
  the	
  bank	
  of	
                    Nadé	
  en	
  la	
  orilla	
  del	
  río	
     Nadé	
  hasta	
  la	
  orilla	
  del	
          Bank	
  as	
  in	
  river	
  bank	
  
    the	
  river	
                                                                                           río	
  
    I	
  banked	
  my	
  money	
                Deposité	
  mi	
  dinero	
                                   Yo	
  depositado	
  mi	
  dinero	
   Banked	
  as	
  in	
  finance	
  
    I	
  banked	
  my	
  car	
  into	
  the	
   Incliné	
  mi	
  coche	
  en	
  la	
                         Yo	
  depositado	
  mi	
  coche	
   Banked	
  as	
  in	
  finance	
  
    turn	
                                      vuelta	
                                                     en	
  la	
  vuelta	
  
    I	
  banked	
  my	
  plane	
  into	
                      Incliné	
  mi	
  avión	
  en	
  para	
         Yo	
  depositado	
  en	
  mi	
                  Banked	
  as	
  in	
  finance	
  
    a	
  steep	
  dive	
                                      una	
  zambullida.	
                           avión	
  en	
  picada	
  

    Issue:	
               The	
  above	
  examples	
  show	
  that	
  Google	
  is	
  biased	
  towards	
  the	
  banking	
  and	
  finance	
  domain	
  

                           There	
  is	
  much	
  more	
  mul<lingual	
  banking	
  and	
  finance	
  data	
  available	
  to	
  learn	
  from	
  than	
  
   Cause:	
   there	
  is	
  aeronau<cal	
  or	
  water	
  sports	
  data	
  available.	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Compe<tors	
  require	
  20%	
  or	
                                            Typical	
  Dirty	
  Data	
  SMT	
  
       more	
  addi<onal	
  data	
  than	
                                            engines	
  will	
  have	
  between	
  
       the	
  ini<al	
  training	
  data	
  to	
                                       2	
  million	
  and	
  20	
  million	
  
       show	
  notable	
  improvements.	
  	
                                           sentences	
  in	
  the	
  iniCal	
  
                  –  This	
  could	
  take	
  years	
  for	
  most	
  LSPs	
                   training	
  data.	
  	
  
                  –  This	
  is	
  the	
  dirty	
  lihle	
  secret	
  of	
  the	
  
                     Dirty	
  Data	
  SMT	
  approach	
  that	
  is	
  
                     frequently	
  acknowledged.	
  


    •  Asia	
  Online	
  has	
  reference	
  
       customers	
  that	
  have	
  had	
  
       notable	
  improvements	
  with	
                                                                   <	
  0.1%	
  
       just	
  1	
  days	
  work	
  of	
  post	
                                                       Improvements	
  
                                                                                                       daily	
  based	
  on	
  
       edi<ng.	
                                                                                            edits	
  
                  –  Only	
  possible	
  with	
  Clean	
  Data	
  SMT	
  



Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Compe<tors	
  Sta<s<cal	
  MT	
  


    Language	
  Studio	
  Allows:	
  
    •  Automated	
  iden<fica<on	
  of	
  areas	
  of	
  weakness	
  
    •  Post	
  Edi<ng	
  Feedback	
  focusing	
  directly	
  on	
  areas	
  of	
      •    Get	
  more	
  dirty	
  data	
  
        weakness	
  
                                                                                      •    Human	
  translate	
  more	
  data	
  
    •  Automated	
  error	
  paWern	
  analysis	
  and	
  correc<on	
  
    •  Analysis	
  and	
  Resolu<on	
  of	
  Unknown	
  Words	
  
    •  Determina<on	
  and	
  resolu<on	
  of	
  high	
  frequency	
                 Compe<tors	
  Rule	
  Base	
  MT	
  
        phrases	
  
    •  Terminology	
  Extrac<on	
  
    •  Balancing	
  Bilingual	
  Phrases	
  against	
  Monolingual	
  Data	
  
    •  Run<me	
  glossary	
  
    •  Run<me	
  spelling	
  dic<onary	
  
    •  PaWern	
  handling	
  and	
  adjustments	
  
    •  Incremental	
  Improvement	
  Training	
  
    •  Automated	
  Quality	
  Measurement	
                                          •    Add	
  dic<onary	
  entries	
  (limit	
  20K	
  words)	
  
    •  Human	
  Quality	
  Measurement	
  	
  
                                                                                      •    Train	
  a	
  language	
  model	
  to	
  fix	
  broken	
  
    •  Quality	
  Confidence	
  Scores	
  for	
  each	
  segment	
  
                                                                                           rules	
  output	
  (limit	
  40K	
  phrases)	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•         Typically	
  about	
  10-­‐20	
  examples	
  for	
  each	
         •    Large	
  volumes	
  of	
  dirty	
  data	
  prohibits	
  manual	
  
                clean	
  word	
  of	
  phrase.	
                                        correc<on.	
  
      •         Each	
  correc<on	
  has	
  sta<s<cal	
  relevance	
  and	
        •    Individual	
  correc<ons	
  are	
  not	
  sta<s<cally	
  
                impact	
  can	
  be	
  clearly	
  seen.	
                               relevant.	
  
      •         Correc<ons	
  usually	
  involve	
  adding	
  data	
  to	
         •    Manual	
  correc<ons	
  must	
  compete	
  against	
  
                fill	
  gaps.	
                                                          1,000’s	
  of	
  bad	
  examples.	
  Imprac<cal	
  to	
  create	
  
      •         Far	
  less	
  correc<on	
  of	
  actual	
  errors.	
                   enough	
  examples	
  manually.	
  
      •         Clean	
  data	
  means	
  cause	
  of	
  errors	
  can	
  be	
     •    Understanding	
  the	
  cause	
  of	
  errors	
  is	
  difficult.	
  
                understood	
  and	
  corrected.	
                                  •    Slows	
  training	
  and	
  overall	
  processing	
  <me.	
  
      •         Concordance	
  used	
  to	
  create	
  unbiased	
                       Requires	
  more	
  resources	
  to	
  process	
  excess	
  
                examples/phrases	
  and	
  ensure	
  scope	
                            data.	
  
                covered.	
  	
                                                     •    Only	
  solu<on	
  is	
  to	
  acquire	
  more	
  dirty	
  data	
  
                                                                                        and	
  hope	
  problem	
  is	
  fixed.	
  But	
  may	
  get	
  worse	
  
                                                                                        or	
  cause	
  new	
  errors.	
  

Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
1960’s	
      1980’s	
  




                                                  1990’s	
     2012	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Before	
  Machine	
  TranslaCon	
  
         Source	
  text	
  is	
  processed	
  and	
  modified.	
  
        Pre-­‐Transla<on	
  JavaScript	
  (JS)	
  
        -­‐	
  Complex	
  pre-­‐processing	
  can	
  	
  
        	
  	
  be	
  customized	
  via	
  JavaScript.	
  
        Pre-­‐Transla<on	
  Correc<ons	
  (PTC)	
  
        -­‐	
  A	
  list	
  of	
  terms	
  that	
  adjust	
  the	
  source	
  	
                                      AUer	
  Machine	
  TranslaCon	
  
        	
  	
  text	
  fixing	
  common	
  issues	
  and	
  	
  
                                                                                                     Target	
  text	
  is	
  processed	
  and	
  modified.	
  
        	
  	
  making	
  it	
  more	
  suitable	
  for	
  transla@on.	
  
        Non-­‐Translatable	
  Terms	
  (NTT)	
                                                     Post	
  Transla<on	
  Adjustment	
  (PTA)	
  
        -­‐	
  A	
  list	
  of	
  monolingual	
  terms	
  that	
  are	
  	
                        	
  -­‐	
  A	
  list	
  of	
  terms	
  in	
  the	
  target	
  language	
  that	
  	
  
        	
  	
  used	
  to	
  ensure	
  key	
  terms	
  are	
  not	
  	
                           	
  	
  	
  modify	
  the	
  translated	
  output.	
  This	
  is	
  very	
  	
  
        	
  	
  translated.	
                                                                      	
  	
  	
  useful	
  for	
  normaliza@on	
  of	
  target	
  terms.	
  
        Run<me	
  Glossary	
  (GLO)	
                                                              Post	
  Transla<on	
  JavaScript	
  (JS)	
  
        -­‐	
  A	
  list	
  of	
  bilingual	
  terms	
  that	
  are	
  used	
  to	
  	
            	
  -­‐	
  Complex	
  post-­‐processing	
  can	
  	
  
        	
  	
  ensure	
  terminology	
  is	
  translated	
  a	
  	
                               	
  	
  	
  	
  be	
  customized	
  via	
  JavaScript.	
  
        	
  	
  specific	
  way.	
                                                           Run<me	
  customiza<ons	
  can	
  be	
  applied	
  in	
  2	
  forms:
                                                                                            Default:	
  Applied	
  to	
  all	
  jobs.	
  
                                                                                            Job	
  Specific:	
  A	
  different	
  set	
  of	
  customiza@ons	
  can	
  be	
  applied	
  for	
  
                                                                                            different	
  clients.	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
12,000	
  15,000	
  
                                                                     9,000	
                      18,000	
  
                                                                        Typical	
  MT	
  +	
  	
  
                                                              6,000	
   Post	
  Edi<ng	
                                21,000	
  
                                                                           Speed	
                                                             *	
  



                                             3,000	
                                                                               25,000	
  
                                                                Human	
  T
                                                                          ransla<o
                                                                                  n	
  

                                            0	
                       Words	
  Per	
  Day	
  Per	
  Translator	
   28,000	
  
    Average	
  person	
  reads	
  200-­‐250	
  words	
  per	
  minute.	
  96,000-­‐120,000	
  in	
  8	
  hours.	
  	
  ~35	
  Cmes	
  faster	
  than	
  human	
  translaCon.	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
                                                             *Fastest	
  MT	
  +	
  Post	
  Edi@ng	
  Speed	
  reported	
  by	
  clients.	
  
Metrics	
  That	
  Really	
  Count	
  	
                                                                            ProducCvity	
  is	
  the	
  	
  
•  Produc<vity	
  –	
  Words	
  per	
  day	
  per	
  human	
  resource	
                                            Best	
  Quality	
  Metric	
  
•  Margin	
  –	
  2-­‐3	
  <mes	
  the	
  profit	
  margin	
  is	
  commonplace	
                        Raw	
   MT	
   oaen	
   has	
   a	
   greater	
   number	
   of	
   errors	
   than	
  
                                                                                                        first	
  pass	
  human	
  transla<on.	
  	
  
•  Consistency	
  –	
  Wri<ng	
  style	
  and	
  terminology	
  
                                                                                                        However:	
  
     ü  MT	
  +	
  Human	
  delivers	
  higher	
  quality	
  than	
  a	
  human	
  only	
              Language	
  Studio™	
  MT	
  is	
  stylised	
  to	
  a	
  specific	
  domain,	
  
         approach	
                                                                                     customer	
   and	
   target	
   audience,	
   so	
   quality	
   is	
  
•  Deals	
                                                                                              considerably	
  higher	
  than	
  other	
  MT	
  systems.	
  	
  
     ü  New	
  deals	
  not	
  accessible	
  with	
  a	
  human	
  only	
  approach	
                  This	
  means	
  that:	
  
     ü  Deals	
  where	
  you	
  could	
  offer	
  a	
  more	
  compe@@ve	
  bid	
  due	
               1.  MT	
  errors	
  are	
  easy	
  to	
  see	
  and	
  easy	
  to	
  fix	
  	
  
                                                                                                                (i.e.	
  simple	
  grammar).	
  	
  
         to	
  MT	
  than	
  your	
  compe@tors	
                                                       2.  MT	
  provides	
  more	
  accurate	
  and	
  consistent	
  
     ü  Deals	
  that	
  would	
  have	
  been	
  lost	
  to	
  a	
  compe@tor	
  without	
                    terminology	
  than	
  human	
  translators,	
  especially	
  
                                                                                                                when	
  more	
  than	
  1	
  human	
  works	
  on	
  a	
  project.	
  
         the	
  advantages	
  that	
  MT	
  offers	
  
                                                                                                        3.  Human	
  errors	
  may	
  be	
  fewer,	
  but	
  harder	
  to	
  see	
  
                                                                                                                and	
  harder	
  to	
  fix.	
  
Examples	
  of	
  other	
  “Useful”	
  Quality	
  Indicators	
  
                                                                                                        Coun@ng	
  the	
  number	
  of	
  errors	
  only,	
  offers	
  no	
  value	
  as	
  
Automated	
  Metrics	
  (Good	
  indicators,	
  but	
  not	
  absolute)	
                               a	
   metric	
   as	
   the	
   complexity	
   of	
   the	
   error	
   is	
   not	
   taken	
  
•  BLEU	
  (Bilingual	
  Evalua@on	
  Understudy)	
                                                     into	
  account.	
  	
  
•  NIST	
                                                                                               MT	
  with	
  more	
  errors	
  is	
  oaen	
  faster	
  to	
  edit	
  and	
  fix	
  
                                                                                                        than	
  first	
  pass	
  human	
  transla<ons	
  with	
  fewer	
  errors.	
  	
  
•  F-­‐Measure	
  (F1	
  Score	
  or	
  F-­‐Score)	
  
•  METEOR	
  (Metric	
  for	
  Evalua@on	
  of	
  Transla@on	
  with	
  Explicit	
  ORdering)	
  	
                                                  Margin	
  
Manual	
  Quality	
  Metrics	
  (Most	
  not	
  designed	
  for	
  MT,	
  more	
  for	
  HT)	
                                                   Time	
  
•  Edit	
  Distance	
  (Does	
  not	
  take	
  into	
  account	
  complexity	
  of	
  edit)	
  
•  SAE-­‐J2450	
  (Industry	
  specific)	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Wait	
  for	
  a	
  Project	
  	
                                           Create	
  a	
  Product	
  	
  
                                                              That	
  Requires	
  MT	
                                                          For	
  Resale	
  
   •         Opportunis@c	
  approach	
                                                             •    Proac@ve	
  approach	
  
   •         Many	
  LSPs	
  are	
  interested	
  in	
  MT,	
  but	
  not	
                         •    Leverages	
  exis@ng	
  transla@on	
  assets	
  
             willing	
  to	
  take	
  the	
  plunge	
  without	
  a	
  paying	
                     •    Can	
  be	
  sold	
  to	
  many	
  clients	
  
             client.	
  	
                                                                          •    Easier	
  to	
  sell	
  -­‐	
  test	
  and	
  show	
  
   •         Limited	
  to	
  one	
  client	
                                                       •    Can	
  sell	
  mul@ple	
  language	
  pairs	
  at	
  the	
  same	
  
                                                                                                         @me	
  
   •         Harder	
  to	
  sell	
  –	
  longer	
  sales	
  cycle	
  
                                                                                                    •    Generally	
  a	
  higher	
  Return	
  On	
  Investment	
  
   •         OGen	
  build	
  one	
  language	
  pair	
  	
  to	
  try,	
  before	
                      (ROI)	
  
             commipng	
  to	
  others	
  


    Revenue	
                                                                                       Revenue	
  
    Recurring	
  revenues	
  from	
  words	
  translated	
                                           Recurring	
  revenues	
  from	
  words	
  translated	
  
    One	
  @me	
  revenues	
  from	
  resale	
  of	
  customiza@on	
                                 Preparing	
  source	
  data	
  
    Post	
  edi@ng	
                                                                                 Post	
  edi@ng	
  
    Preparing	
  source	
  data	
                                                                    Run@me	
  glossary	
  prepara@on	
  
    Terminology	
  defini@on	
                                                                        Non-­‐translatable	
  terms	
  defini@on	
  
    Non-­‐Translatable	
  terms	
  
    Unknown	
  and	
  high-­‐frequency	
  phrase	
  resolu@on	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Faster	
  Transla@on	
  	
  
                                                   Reduce	
  Project	
  Costs	
  	
  
                                                                                                                                                 Delivery	
  	
  
   •         Helps	
  to	
  manage	
  margin	
  squeeze:	
                                       •    New	
  projects	
  that	
  could	
  not	
  have	
  been	
  
                –        compete	
  with	
  compe@tors	
  using	
  cheaper	
  (perhaps	
              delivered	
  on	
  due	
  to	
  @me	
  and	
  resource	
  
                         lower	
  quality)	
  resources	
  or	
  compe@tors	
  using	
  MT	
  
                                                                                                      constraints	
  
   •         Helps	
  to	
  cost	
  jus@fy	
  business	
  cases	
  that	
  may	
                 •    Helps	
  clients	
  that	
  want	
  to	
  simultaneously	
  
             not	
  be	
  viable	
  using	
  a	
  human	
  only	
  approach	
                         ship	
  product	
  in	
  mul@ple	
  languages	
  
   •         Can	
  be	
  used	
  behind	
  the	
  scenes	
  (like	
  a	
                        •    New	
  clients	
  in	
  research,	
  analysis,	
  data	
  
             transla@on	
  memory)	
  or	
  disclosed	
  to	
  client	
                               mining	
  and	
  discovery	
  markets	
  
   •         More	
  client	
  work	
  in	
  other	
  areas	
  as	
  a	
  result	
  of	
         •    New	
  clients	
  that	
  need	
  real-­‐@me	
  or	
  near	
  real-­‐
             leG	
  over	
  transla@on	
  budget.	
                                                   @me	
  transla@on	
  

    Revenue	
                                                                                    Revenue	
  
    Depending	
  on	
  project	
  or	
  product	
  model,	
                                       Preparing	
  source	
  data	
  
    revenues	
  will	
  vary.	
  See	
  previous	
  slide.	
                                      Run@me	
  glossary	
  prepara@on	
  
    	
                                                                                            Non-­‐translatable	
  terms	
  defini@on	
  
    Addi@onal	
  revenues	
  from	
  client	
  gepng	
  more	
                                    Post	
  edi@ng	
  
    ROI	
  and	
  willing	
  to	
  invest	
  in	
  new	
  languages.	
                            Recurring	
  revenues	
  from	
  words	
  translated	
  



Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Expand	
  Exis@ng	
                                                       Added	
  	
  
                                                               	
  Rela@onships	
                                                  Func@onality	
  
   •         Opportuni@es	
  to	
  translate	
  addi@onal	
                           •    Expand	
  service	
  offerings	
  with	
  new	
  
             material	
  for	
  markets	
  that	
  may	
  not	
  have	
                    features	
  such	
  as	
  mul@lingual	
  customer	
  
             been	
  cost	
  viable	
  with	
  a	
  human	
  only	
                        support	
  
             approach	
                                                               •    Integrate	
  machine	
  transla@on	
  into	
  
   •         Reuse	
  custom	
  MT	
  for	
  mul@ple	
  purposes	
                         exis@ng	
  client	
  technologies,	
  products	
  
                                                                                           and	
  services	
  
   •         Enable	
  clients	
  to	
  beher	
  compete	
  in	
  
             markets	
  that	
  were	
  only	
  par@ally	
  
             addressed	
  due	
  to	
  cost	
  and	
  @me	
  
    Revenue	
                                                                         Revenue	
  
    Preparing	
  source	
  data	
                                                      Same	
  as	
  for	
  Expanding	
  Exis@ng	
  Rela@ons	
  
    Terminology	
  defini@on	
                                                          	
  
    Non-­‐Translatable	
  terms	
                                                      Addi@onally	
  able	
  to	
  charge	
  various	
  service	
  fees	
  
    Unknown	
  and	
  high-­‐frequency	
  phrase	
  resolu@on	
                        rela@ng	
  to	
  the	
  new	
  services	
  offered.	
  For	
  
    Post	
  edi@ng	
                                                                   example,	
  transla@ng	
  common	
  Q&A	
  for	
  
    Recurring	
  revenues	
  from	
  words	
  translated	
                             customer	
  support	
  and	
  a	
  commission	
  on	
  
    One	
  @me	
  revenues	
  from	
  resale	
  of	
  customiza@on	
                   integrated	
  mul@lingual	
  support	
  products.	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Post	
  Edi<ng	
  Cost	
                                                                                                                 6	
  




                                                                                                                    Cost	
  Per	
  Word	
  
   	
  	
  	
  	
  	
  	
  	
  MT	
  learns	
  from	
  post	
  edi@ng	
  feedback	
  and	
  quality	
  of	
                                   5	
  
                                                                                                                                                                                           Post	
  Edi<ng	
  (Human	
  Transla<on)	
  
                             transla@on	
  constantly	
  improves.	
                                                                          4	
  
                          	
  Cost	
  of	
  post	
  edi@ng	
  progressively	
  reduces	
  as	
  MT	
  quality	
                               3	
  
                             increases	
  aGer	
  each	
  engine	
  learning	
  itera@on.	
                                                   2	
  
                                                                                                                                              1	
                                                                MT	
  Post	
  Edi<ng	
  
                                                                                                                                                     1	
           2	
             3	
           4	
           5	
            6	
  
                                                                                                                                                                                Engine	
  Learning	
  Itera<on	
  

    Post	
  Edi<ng	
  Effort	
  Reduces	
  Over	
  Time	
                                                                                              Publica<on	
  Quality	
  Target	
  
          	
  The	
  post	
  edi@ng	
  and	
  cleanup	
  effort	
  gets	
  easier	
  as	
  the	
  
             MT	
  engine	
  improves.	
  
                                                                                                                                      Quality	
  

                                                                                                                                                             Post	
  Edi<ng	
  	
  Effort	
  
          	
  Ini@al	
  efforts	
  should	
  focus	
  on	
  error	
  analysis	
  and	
  
             correc@on	
  of	
  a	
  representa@ve	
  sample	
  data	
  set.	
  	
                                                                                                             Raw	
  MT	
  Quality	
  
          	
  Each	
  successive	
  project	
  should	
  get	
  easier	
  and	
  more	
  
             efficient.	
  
                                                                                                                                                    1	
           2	
             3	
           4	
           5	
            6	
  
                                                                                                                                                                               Engine	
  Learning	
  Itera<on	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
How	
  Omnilingua	
  Measures	
  Quality	
  
               –          Triangulate	
  to	
  find	
  the	
  data	
  
               –          Raw	
  MT	
  J2450	
  v.	
  Historical	
  Human	
  Quality	
  J2450	
  
               –          Time	
  Study	
  Measurements	
  
               –          OmniMT	
  EffortScore™	
  
    Everything	
  must	
  be	
  measured	
  by	
  effort	
  first	
  
                 –  All	
  other	
  metrics	
  support	
  effort	
  metrics	
  
                 –  Produc@vity	
  is	
  key	
  
    ∆	
  Effort	
  >	
  MT	
  System	
  Cost	
  +	
  Value	
  Chain	
  Sharing	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Built	
  as	
  a	
  Human	
  Assessment	
  System:	
  	
  
                 –  Provides	
  7	
  defined	
  and	
  ac@onable	
  error	
  classifica@ons.	
  
                 –  2	
  severity	
  levels	
  to	
  iden@fy	
  severe	
  and	
  minor	
  errors.	
  	
  
    •  Provides	
  a	
  Measurement	
  Score	
  Between	
  1	
  and	
  0:	
  	
  
                 –  A	
  lower	
  score	
  indicates	
  fewer	
  errors.	
  
                 –  Objec@ve	
  is	
  to	
  achieve	
  a	
  score	
  as	
  close	
  to	
  0	
  (no	
  errors/issues)	
  as	
  
                    possible.	
  	
  
    •  Provides	
  Scores	
  at	
  Mul@ple	
  Levels:	
  	
  
                 –  Composite	
  scores	
  across	
  an	
  en@re	
  set	
  of	
  data.	
  
                 –  Scores	
  for	
  logical	
  units	
  such	
  as	
  sentences	
  and	
  paragraphs.	
  	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Asia	
  Online	
  v.	
  
                                                                                                                                          Compe<ng	
  MT	
  System	
                                                Factor	
  
                                                                                                                                          Total	
  Raw	
  J2450	
  Errors	
                                     2x	
  Fewer	
  
                                                                                                                                                Raw	
  J2450	
  Score	
                                         2x	
  Beher	
  
                                                                                                                                           Total	
  PE	
  J2450	
  Errors	
                                   5.3x	
  Fewer	
  
                                                                                                                                                   PE	
  J2450	
  Score	
                                     4.8x	
  Beher	
  
                                                                                                                                                             PE	
  Rate	
                                     32%	
  Faster	
  


 “    	
  	
  	
  There	
  were	
  far	
  fewer	
  errors	
  produced	
  by	
  the	
  Language	
  Studio™	
  custom	
  MT	
  engine	
  
                                              than	
  the	
  compe<tor's	
  legacy	
  MT	
  engine.	
  	
  	
  
                                                                                                                                          “	
  	
  	
  	
  We	
  found	
  that	
  52%	
  of	
  the	
  raw	
  original	
  output	
  
                                                                                                                                                              from	
  Asia	
  Online	
  had	
  no	
  errors	
  at	
  all	
  	
  

                                                                                                                                                                                                                           ”
           Notably	
  there	
  were	
  fewer	
  wrong	
  meanings,	
  structural	
  errors	
  and	
  wrong	
  terms	
  in	
  the	
                             –	
  which	
  is	
  great	
  for	
  an	
  ini<al	
  engine.	
  	
  	
  
                    Language	
  Studio™	
  custom	
  MT	
  engine,	
  that	
  were	
  "typical	
  SMT	
  problems"	
  in	
  the	
  


                                                                                              ”
                                                      compe@tor's	
  legacy	
  MT	
  engine.	
  	
  



 “     The	
  final	
  transla<on	
  quality	
  aaer	
  post-­‐edi<ng	
  was	
  beWer	
  with	
  the	
  new	
  Language	
  
      Studio™	
  custom	
  MT	
  engine	
  than	
  the	
  compe<tor's	
  legacy	
  MT	
  engine	
  and	
  also	
  beWer	
  
                                  than	
  a	
  human	
  only	
  transla<on	
  approach.	
  	
                                                                                     –	
  Kevin	
  Nelson,	
  	
  
                                                                                                                                                                                  Managing	
  Director,	
  	
  
    Terminology	
  was	
  more	
  consistent	
  with	
  a	
  combined	
  Language	
  Studio™	
  custom	
  MT	
  engine	
  
                                      plus	
  human	
  post	
  edi@ng	
  approach.	
  	
                                                                                          Omnilingua	
  Worldwide	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  LSP:	
  Sajan	
  
    •  End	
  Client	
  Profile:	
  
                 –  Large	
  global	
  mul@na@onal	
  corpora@on	
  in	
  the	
  IT	
  domain.	
  
                 –  Has	
  developed	
  its	
  own	
  proprietary	
  MT	
  system	
  that	
  has	
  been	
  developed	
  over	
  many	
  years.	
  
    •  Project	
  Goals	
  
                 –  Eliminate	
  the	
  need	
  for	
  full	
  transla@on	
  and	
  limit	
  it	
  to	
  MT	
  +	
  Post-­‐edi@ng	
  
    •  Language	
  Pair:	
  	
  
                 –  English	
  -­‐>	
  Simplified	
  Chinese.	
  
                 –  English	
  -­‐>	
  European	
  Spanish.	
  
                 –  English	
  -­‐>	
  European	
  French.	
  
    •  Domain:	
  IT	
  
    •  2nd	
  Itera@on	
  of	
  Customized	
  Engine	
  
                 –  Customized	
  ini@al	
  engine,	
  followed	
  by	
  an	
  incremental	
  improvement	
  based	
  on	
  client	
  
                    feedback.	
  
    •  Data	
  	
  
                 –  Client	
  provided	
  ~3,000,000	
  phrase	
  pairs.	
  	
  
                 –  26%	
  were	
  rejected	
  in	
  cleaning	
  process	
  as	
  unsuitable	
  for	
  SMT	
  training.	
  
    •  Measurements:	
  
                 –  Cost	
  
                 –  Timeframe	
  
                 –  Quality	
  

Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
•  Quality	
  
                 –  Client	
  performed	
  their	
  own	
  metrics	
  
                 –  Asia	
  Online	
  Language	
  Studio™	
  was	
  
                    considerably	
  beher	
  than	
  the	
  clients	
  
                    own	
  MT	
  solu@on.	
  
                 –  Significant	
  quality	
  improvement	
  aGer	
                                       60%	
  Cost	
  Saving	
  
                    providing	
  feedback	
  –	
  65	
  BLEU	
  score.	
  
                 –  Chinese	
  scored	
  beher	
  than	
  first	
  pass	
  
                    human	
  transla@on	
  as	
  per	
  client’s	
  
                    feedback	
  and	
  was	
  faster	
  and	
  easier	
  to	
  
                    edit.	
  
    •  Result	
  	
                                                                                    70%	
  Time	
  Saving	
  
                 –  Client	
  extremely	
  impressed	
  with	
  result	
  
                    especially	
  when	
  compared	
  to	
  the	
  
                    output	
  of	
  their	
  own	
  MT	
  engine.	
  
                 –  Client	
  has	
  commissioned	
  Sajan	
  to	
  
                    work	
  with	
  more	
  languages	
  
 LRC	
  have	
  uploaded	
  Sajan’s	
  slides	
  and	
  video	
  PresentaCon	
  from	
  the	
  recent	
  LRC	
  conference:	
  
                      Slides:	
  hLp://bit.ly/r6BPkT	
  	
  	
  	
  	
  	
  Video:	
  hLp://bit.ly/trsyhg	
  
Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Travel	
  &	
  Leisure	
  Ver@cal	
  


                                       English	
  to	
  Spanish	
  Language	
  Pair	
  


                                       Custom	
  MT	
  engines	
  built	
  and	
  programma@cally	
  consumed	
  	
  

                                       A	
  human	
  post	
  edit	
  step	
  was	
  included	
  in	
  workflow	
  and	
  
                                       measurement	
  


                                       Scien@fic	
  measures	
  of	
  produc@vity	
  for	
  all	
  phases	
  of	
  process	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Base	
  training	
  materials	
  provided	
  and	
  catalogued	
  

                                        Asia	
  Online	
  trained	
  the	
  engine	
  and	
  released	
  to	
  a	
  diagnos@c	
  
                                        stage	
  

                                        First	
  pass	
  of	
  new	
  content	
  through	
  diagnos@c	
  engine	
  yielded	
  
                                        posi@ve	
  results	
  

                                        Asia	
  Online	
  provided	
  advanced	
  data	
  genera@on	
  technologies	
  
                                        to	
  the	
  diagnos@c	
  engine	
  through	
  monolingual	
  data	
  crawling,	
  
                                        applica@on	
  of	
  run@me	
  rules,	
  and	
  pre-­‐transla@on	
  adjustments	
  	
  

                                       Even	
  further	
  progress	
  achieved	
  from	
  extrac@ng	
  and	
  applying	
  
                                       a	
  industry	
  specific	
  high	
  frequency	
  term	
  list	
  from	
  the	
  source	
  



Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
58%	
  of	
  segments	
  
                                                              required	
  no	
  edits	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Post	
  Edit	
  Produc<vity	
  Analysis	
  
               Produc@vity	
  Percentage	
                    328%	
  Increase	
  

               Produc@vity	
  Rate	
                          8,208	
  words	
  a	
  day	
  




               	
  	
  
               	
  

Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  
Business	
  Strategies	
  for	
  Building	
  
Strategic	
  Advantage	
  and	
  Revenue	
  from	
  
             Machine	
  Transla<on	
  


                            Dion	
  Wiggins	
  
                            Chief	
  Execu<ve	
  Officer	
  
                            dion.wiggins@asiaonline.net	
  
                            	
  




Copyright	
  ©	
  2013,	
  Asia	
  Online	
  Pte	
  Ltd	
  

Weitere ähnliche Inhalte

Was ist angesagt?

ProductCamp Vancouver 2013
ProductCamp Vancouver 2013ProductCamp Vancouver 2013
ProductCamp Vancouver 2013Dave Sharrock
 
AT&T Telepresence Solution
AT&T Telepresence SolutionAT&T Telepresence Solution
AT&T Telepresence SolutionDavid Santos
 
WebSphere Portal | The Front End Of SOA
WebSphere Portal | The Front End Of SOAWebSphere Portal | The Front End Of SOA
WebSphere Portal | The Front End Of SOAJason Faszholz
 
Five best practices for ensuring uptime with Data Center Infrastructure Manag...
Five best practices for ensuring uptime with Data Center Infrastructure Manag...Five best practices for ensuring uptime with Data Center Infrastructure Manag...
Five best practices for ensuring uptime with Data Center Infrastructure Manag...CA Nimsoft
 
About Sovereign Business Resources
About Sovereign Business ResourcesAbout Sovereign Business Resources
About Sovereign Business Resourcesmosabu
 
Hp Fortify Cloud Application Security
Hp Fortify Cloud Application SecurityHp Fortify Cloud Application Security
Hp Fortify Cloud Application SecurityEd Wong
 
Tdwi agile data warehouse - dv, what is the buzz about
Tdwi   agile data warehouse - dv, what is the buzz aboutTdwi   agile data warehouse - dv, what is the buzz about
Tdwi agile data warehouse - dv, what is the buzz aboutPrudenza B.V
 
Avnet Analyst Day 2010 Presentation 2 Path to Premier
Avnet Analyst Day 2010 Presentation 2 Path to PremierAvnet Analyst Day 2010 Presentation 2 Path to Premier
Avnet Analyst Day 2010 Presentation 2 Path to PremierAvnet Electronics Marketing
 
A balanced metrics set for software business
A balanced metrics set for software businessA balanced metrics set for software business
A balanced metrics set for software businessTowo Toivola
 
MSA, TBD, DDD, TDD, BDD, WTF?
MSA, TBD, DDD, TDD, BDD, WTF?MSA, TBD, DDD, TDD, BDD, WTF?
MSA, TBD, DDD, TDD, BDD, WTF?Michael Lambert
 
Impact Of Column Oriented Main Memory Databases On Enterprise Applications
Impact Of Column Oriented Main Memory Databases On Enterprise ApplicationsImpact Of Column Oriented Main Memory Databases On Enterprise Applications
Impact Of Column Oriented Main Memory Databases On Enterprise ApplicationsMatthieu Schapranow
 
It Ceo View 2010 09 17
It Ceo View   2010 09 17It Ceo View   2010 09 17
It Ceo View 2010 09 17rhissrich
 
2009 Apollo Group Strategic Supplier Summit
2009 Apollo Group Strategic Supplier Summit2009 Apollo Group Strategic Supplier Summit
2009 Apollo Group Strategic Supplier Summitplbliss
 
Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Fabien Coppens
 
Xoriant - Financial services expertise
Xoriant - Financial services expertiseXoriant - Financial services expertise
Xoriant - Financial services expertiseXoriant Corporation
 

Was ist angesagt? (17)

Maior Offshore Profile
Maior Offshore ProfileMaior Offshore Profile
Maior Offshore Profile
 
ProductCamp Vancouver 2013
ProductCamp Vancouver 2013ProductCamp Vancouver 2013
ProductCamp Vancouver 2013
 
Hms e brochure
Hms e brochureHms e brochure
Hms e brochure
 
AT&T Telepresence Solution
AT&T Telepresence SolutionAT&T Telepresence Solution
AT&T Telepresence Solution
 
WebSphere Portal | The Front End Of SOA
WebSphere Portal | The Front End Of SOAWebSphere Portal | The Front End Of SOA
WebSphere Portal | The Front End Of SOA
 
Five best practices for ensuring uptime with Data Center Infrastructure Manag...
Five best practices for ensuring uptime with Data Center Infrastructure Manag...Five best practices for ensuring uptime with Data Center Infrastructure Manag...
Five best practices for ensuring uptime with Data Center Infrastructure Manag...
 
About Sovereign Business Resources
About Sovereign Business ResourcesAbout Sovereign Business Resources
About Sovereign Business Resources
 
Hp Fortify Cloud Application Security
Hp Fortify Cloud Application SecurityHp Fortify Cloud Application Security
Hp Fortify Cloud Application Security
 
Tdwi agile data warehouse - dv, what is the buzz about
Tdwi   agile data warehouse - dv, what is the buzz aboutTdwi   agile data warehouse - dv, what is the buzz about
Tdwi agile data warehouse - dv, what is the buzz about
 
Avnet Analyst Day 2010 Presentation 2 Path to Premier
Avnet Analyst Day 2010 Presentation 2 Path to PremierAvnet Analyst Day 2010 Presentation 2 Path to Premier
Avnet Analyst Day 2010 Presentation 2 Path to Premier
 
A balanced metrics set for software business
A balanced metrics set for software businessA balanced metrics set for software business
A balanced metrics set for software business
 
MSA, TBD, DDD, TDD, BDD, WTF?
MSA, TBD, DDD, TDD, BDD, WTF?MSA, TBD, DDD, TDD, BDD, WTF?
MSA, TBD, DDD, TDD, BDD, WTF?
 
Impact Of Column Oriented Main Memory Databases On Enterprise Applications
Impact Of Column Oriented Main Memory Databases On Enterprise ApplicationsImpact Of Column Oriented Main Memory Databases On Enterprise Applications
Impact Of Column Oriented Main Memory Databases On Enterprise Applications
 
It Ceo View 2010 09 17
It Ceo View   2010 09 17It Ceo View   2010 09 17
It Ceo View 2010 09 17
 
2009 Apollo Group Strategic Supplier Summit
2009 Apollo Group Strategic Supplier Summit2009 Apollo Group Strategic Supplier Summit
2009 Apollo Group Strategic Supplier Summit
 
Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?
 
Xoriant - Financial services expertise
Xoriant - Financial services expertiseXoriant - Financial services expertise
Xoriant - Financial services expertise
 

Ähnlich wie TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

UG Software Technologies
UG Software TechnologiesUG Software Technologies
UG Software TechnologiesUg Webmart
 
Congress 2012: Enterprise Cloud Adoption – an Evolution from Infrastructure ...
Congress 2012:  Enterprise Cloud Adoption – an Evolution from Infrastructure ...Congress 2012:  Enterprise Cloud Adoption – an Evolution from Infrastructure ...
Congress 2012: Enterprise Cloud Adoption – an Evolution from Infrastructure ...eurocloud
 
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS - The Language Data Network
 
I3 Staff Augmentation Services Brochure
I3 Staff Augmentation Services BrochureI3 Staff Augmentation Services Brochure
I3 Staff Augmentation Services Brochureshaahking
 
XebiaLabs Overview Slides
XebiaLabs Overview SlidesXebiaLabs Overview Slides
XebiaLabs Overview SlidesXebiaLabs
 
Introduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data ServicesIntroduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data ServicesEduardo Castro
 
Master agile development and testing
Master agile development and testingMaster agile development and testing
Master agile development and testingvmglover
 
TMS Deployment (ALC12)
TMS Deployment (ALC12)TMS Deployment (ALC12)
TMS Deployment (ALC12)bdonaldson
 
Ajel Corporate Profile (RPO)
Ajel Corporate Profile (RPO)Ajel Corporate Profile (RPO)
Ajel Corporate Profile (RPO)AjelTechnologies
 
General Presentation
General PresentationGeneral Presentation
General PresentationNsid123456
 
Corporate Personnel &amp; Associates
Corporate Personnel &amp; AssociatesCorporate Personnel &amp; Associates
Corporate Personnel &amp; Associatesdougott
 
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI BeastGolden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI BeastRhapsody Technologies, Inc.
 
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]Rhapsody Technologies, Inc.
 
Meet Jobspring Partners
Meet Jobspring PartnersMeet Jobspring Partners
Meet Jobspring Partnersjamesholt
 

Ähnlich wie TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013 (20)

UG Software Technologies
UG Software TechnologiesUG Software Technologies
UG Software Technologies
 
TejaServices 2.0 Model
TejaServices 2.0 ModelTejaServices 2.0 Model
TejaServices 2.0 Model
 
Congress 2012: Enterprise Cloud Adoption – an Evolution from Infrastructure ...
Congress 2012:  Enterprise Cloud Adoption – an Evolution from Infrastructure ...Congress 2012:  Enterprise Cloud Adoption – an Evolution from Infrastructure ...
Congress 2012: Enterprise Cloud Adoption – an Evolution from Infrastructure ...
 
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
TAUS Roundtable Moscow, User Empowered Machine Translation, Dion Wiggins, Asi...
 
I3 Staff Augmentation Services Brochure
I3 Staff Augmentation Services BrochureI3 Staff Augmentation Services Brochure
I3 Staff Augmentation Services Brochure
 
XebiaLabs Overview Slides
XebiaLabs Overview SlidesXebiaLabs Overview Slides
XebiaLabs Overview Slides
 
Introduccion M D S
Introduccion M D SIntroduccion M D S
Introduccion M D S
 
Introduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data ServicesIntroduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data Services
 
Master agile development and testing
Master agile development and testingMaster agile development and testing
Master agile development and testing
 
Imaginea qa&automation
Imaginea qa&automationImaginea qa&automation
Imaginea qa&automation
 
TMS Deployment (ALC12)
TMS Deployment (ALC12)TMS Deployment (ALC12)
TMS Deployment (ALC12)
 
Ajel Corporate Profile (RPO)
Ajel Corporate Profile (RPO)Ajel Corporate Profile (RPO)
Ajel Corporate Profile (RPO)
 
General Presentation
General PresentationGeneral Presentation
General Presentation
 
Corporate Personnel &amp; Associates
Corporate Personnel &amp; AssociatesCorporate Personnel &amp; Associates
Corporate Personnel &amp; Associates
 
Golden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI BeastGolden Rules [Best Practices] to tame the MDM/CDI Beast
Golden Rules [Best Practices] to tame the MDM/CDI Beast
 
Va gov webinar_v8
Va gov webinar_v8Va gov webinar_v8
Va gov webinar_v8
 
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
CDM SIG: Fusion MDM for Customer Highlights [2010 OAUG Collaborate]
 
MDM - Oracle Site Hub 101
MDM - Oracle Site Hub 101MDM - Oracle Site Hub 101
MDM - Oracle Site Hub 101
 
Oracle
OracleOracle
Oracle
 
Meet Jobspring Partners
Meet Jobspring PartnersMeet Jobspring Partners
Meet Jobspring Partners
 

Mehr von TAUS - The Language Data Network

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS - The Language Data Network
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS - The Language Data Network
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...TAUS - The Language Data Network
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)TAUS - The Language Data Network
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...TAUS - The Language Data Network
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...TAUS - The Language Data Network
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...TAUS - The Language Data Network
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...TAUS - The Language Data Network
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...TAUS - The Language Data Network
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...TAUS - The Language Data Network
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)TAUS - The Language Data Network
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...TAUS - The Language Data Network
 

Mehr von TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Kürzlich hochgeladen

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

TAUS MT SHOWCASE, Strategies for Building Competitive Advantage and Revenue from Machine Translation, Dion Wiggins, Asia Online, 10 April 2013

  • 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   Strategies for Building Competitive Advantage and Revenue from Machine Translation 14:40 – 15:00 Wednesday, 10 April 2013 Dion Wiggins Asia Online
  • 2. Business  Strategies  for  Building   Strategic  Advantage  and  Revenue  from   Machine  Transla<on   Dion  Wiggins   Chief  Execu<ve  Officer   dion.wiggins@asiaonline.net     Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 3. Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 4. •  Human  Resources   •  Data  Requirements   –  Linguis@c   –  Third  party   •  Language  /  Transla@on   •  Free,  Commercial   •  Natural  Language   –  Internal  data   Programming  (NLP)   –  Data  manufacturing   –  Technical   –  Clean  vs.  Dirty  Data  SMT   •  Opera@ng  System   –  Rules  vs.  SMT  vs.  Hybrid   •  SoGware  installa@on  and   support   •  Skill  Development   –  Programming   –  Hosted  -­‐  basic  skills   •  Tailoring  to  needs  of  the   –  Onsite  Moses  –   business   comprehensive   •  Integra@on  with  other  tools   and  plaLorms   •  TMS  /  Workflow   •  Infrastructure   Integra@on   –  Hardware   –  Pre-­‐built,  custom   •  Hosted,  purchased   development   –  SoGware   •  Document  Format  Support   •  Licensed,  Hosted,  Open   Source     –  Wide,  limited   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 5. •  Transla@on  Costs   •  Project  Type   –  Monthly  fee,  per  word,   –  Language  Pair   human  resources   –  Domain   •  Customiza@on  Costs   •  Risk   –  Up  front,  embedded  on   –  Managed  by  expert   transla@on  costs,  human   –  Managed  by  your  term   resources   –  Likelihood  of  failure   •  Management  Costs   •  Time  to  Quality   –  Oversight,  improvement     –  Trained  by  professionals,   •  Control   learned  skills   –  Extensive,  limited   •  Cost  of  Post  Edi@ng   •  Data  Security   –  Higher  quality  MT  should   –  Contract,  internal   result  in  lower  cost  of   edi@ng   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 6. M   T   Machine  Transla<on   50  Years  of   eMpTy  Promises   Q   Why  does  an  industry  that  has  spent  50  years   failing  to  deliver  on  its  promises  s@ll  exist?   A   An  infinite  demand  –  a  well  defined  and   growing  problem  that  has  always  been  looking   for  a  solu@on  –  what  was  missing  was  …   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 7. Quality   Control   Focus   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 8. 1.  Customize   2.  Measure   Create  a  new  custom  engine   Measure  the  quality  of  the   using  founda@on  data  and   engine  for  ra@ng  and  future   your  own  language  assets   improvement  comparisons   4.  Manage   3.  Improve   Manage  transla@on  projects   Provide  correc@ve  feedback   while  genera@ng  correc@ve   removing  poten@al  for     data  for  quality  improvement.   transla@on  errors.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 9. Quality requires an understanding of the data There is no exception to this rule Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 10. 1.  Click  Training  Data  tab.   2.  Click  on  Upload  and  select  TMX  files.   3.  Click  Training  Data  tab.   4.  Click  Build   Some  even  brag  that  it  is  this  simple.       “Seriously,  that’s  it!”   Perhaps  it  should  have  been       “Seriously,  that’s  it????”   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 11. •  Simply  upload  your  data  and   magic  happens  to  create  a   custom  MT  engine  in  hours/ minutes.   •  Seriously,  that’s  it!   Flaws  in  the  One  BuWon  Instant  MT  Approach   •  MT  cannot  not  read  your  mind.   •  It  cannot  determine  which  wri<ng   style,  target  audience,  formats,   vocabulary  or  capitaliza<on  you  want.     •  It  cannot  determine  what  is  missing   and  whether  your  data  is  suitable  for   your  goal.   •  You  don’t  know  which  is  the  right  data   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 12. Just  Add  Water  Upload  Data If  it  was  really  this   easy,  don’t  you  think   custom  MT  success   stories  would  be   everywhere?   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 13. “The  ready  availability  of  the  Moses  MT  engine  under  an  open  source  license   enables  everybody  to  create  staCsCcal  MT  engines  from  parallel  data  with  a   moderate  amount  of  effort.”   •  Moses  Case  study  that  describes  the  effort  in  detail:  hhp://slidesha.re/KwkdUH   •  Summary:   –  Needs  expert  programmer,  expert  project  manager   –  Requires  very  powerful  hardware   –  Large  amounts  of  soGware  development   –  TAUS  Data  Associa@on  membership  EUR  15,000  for  data   –  360  man  hours  to  set  up  first  pilot   –  Mul@-­‐year  effort  with  considerable  funding  required   –  Transla@on  quality  close  to  that  of  Bing     “With  self-­‐serve  MT,  clients  without  the  necessary  MT  and  compuCng  experCse  to   install  Moses  themselves,  have  for  the  first  Cme  the  ability  to  build  an  MT  system   based  on  their  own  user  requirements  preLy  much  instantly.“   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 14. •  Do  it  yourself  Moses  and  Self  Service  Moses  primarily   target  and  solve  the  engineering  complexity  of   deploying  a  basic  Moses  system   •  There  are  many  other  technical  and  data   requirements  necessary   •  Many  addi@onal  technology  components  are  needed.   Some  have  not  yet  been  developed  such  as  TMS   integra@on,  XML  tag  handling  etc.   For  a  good  blog  entry  and  discussion  on  this  topic  see   hLp://bit.ly/rWAxG7   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 15. 1.  What  is  the  right  data  to  upload  for  my  MT  system?   2.  How  should  I  prepare  my  data?   3.  What  cleaning  can  I  do  that  the  magic  1  click  buhon     does  not  do?   4.  What  impact  will  my  data  have  on  the  MT  system?   5.  Will  the  data  I  upload  improve  or  decrease  quality?   6.  What  will  mixing  data  from  mul@ple  domains  do  to  my  MT  system?   7.  Should  I  add  some  or  all  of  the  TAUS  data  to  my  system?   8.  Once  I  have  a  system,  how  can  I  make  it  beher?   9.  When  I  see  an  error  in  my  MT  output,  how  can  I  know  the  cause  of  the  error?   10.  When  I  see  an  error  in  my  MT  output,  how  can  I  fix  the  error?   11.  …   ..   1.  …                                                                                                                              1.          …   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 16. •  Defini@on   –  Domain   –  Target  Audience   –  Preferred  Wri@ng  Style   –  Glossaries,  Non-­‐Translatable  Terms,  Preferred  Capitaliza@on   –  Special  Formapng  Requirements   –  Quality  Requirements   •  Data  Gathering   –  Source  data  in  domain   Provided  by  client  and  gathered   –  Bilingual  data  to  support  domain   from  third  par@es.   –  Monolingual  data  to  support  domain   •  Data  Analysis   –  Gap  analysis   –  High  frequency  terms   –  Term  extrac@on   •  Data  Genera@on   –  Suppor@ng  grammar  structures   –  Source  Data  Analysis   •  Cleaning  of  Data   •  Tuning  and  Test  Set  Prepara@on   •  Diagnos@c  Engine   –  Fine  tuning   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 17. •  Near  human  quality  automated  transla<on  designed   for  the  professional  transla<on  industry   “ We  found  that  52%  of  the  raw   original  output  from  Asia  Online  had   no  errors  at  all  –  which  is  great  for  an   ” –  Many  customers  have  achieved  quality  levels  where  more  than  50%   of  raw  machine  transla@on  requires  no  edi@ng  at  all   ini<al  engine.            .         –  Kevin  Nelson,     –  Case  studies  of  customers  that  have  achieved  3  x  margin  with  1/3  the   human  resources   Managing  Director,     Omnilingua  Worldwide   –  Regularly  replacing  compe@tors  pre-­‐exis@ng  installa@ons   •  Machine  +  Human  approach  delivers  higher  quality   than  a  human  only  approach   –  More  consistent  wri@ng  style  and  more  accurate  terminology   Complete  Stylis<c   •  Rapid  ongoing  transla<on  quality  improvement   –  Post  edited  machine  transla@on  is  fed  back  to  the  engine  which  learns   Control     from  its  previous  errors  by  analyzing  the  correc@ons   Two  different  output  styles   –  Live  feedback  as  new  content  is  published   for  the  same  input  sentence   •  Enable  clients  to  control  preferred  terminology,   vocabulary  and  wri<ng  style   Spanish  Original   Se  necesitó  una  gran  maniobra  polí@ca  muy  prudente  a  fin  de  facilitar  una   Before   cita  de  los  dos  enemigos  históricos.   Transla<on:   Business  News   Significant  amounts  of  cau@ous  poli@cal  maneuvering  were  required  in  order   Aaer  Transla<on:   to  facilitate  a  rendezvous  between  the  two  biher  historical  opponents.   Children’s  Books   A  lot  of  care  was  taken  to  not  upset  others  when  organizing  the  mee@ng   Aaer  Transla<on:P  te  Ltd   Copyright  ©  2013,  Asia  Online   between  the  two  long  @me  enemies.  
  • 18. LP   Top-­‐Level   Engines/Sub-­‐Domains   Domain   EN-­‐ES   Automo<ve     Honda   Cars   User  Manuals     Engineering  Service  Manuals       Motorbikes   User  Manuals     Engineering  Service  Manuals     Toyota   Marke@ng     Service  Reports   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 19. Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 20. Dirty  Data  SMT  Model   •  Data     from  as  many  sources  as   –  Gathered   possible.   –  Domain  of  knowledge  does  not  maher.   –  Data  quality  is  not  important.     –  Data  quan<ty  is  important.   •  Theory     –  Good  data  will  be  more  sta<s<cally   relevant.     Clean  Data  SMT  Model   •  Data   –  Gathered  from  a  small  number  of   trusted  quality  sources.   –  Domain  of  knowledge  must  match   target   –  Data  quality  is  very  important.   –  Data  quan@ty  is  less  important.   •  Theory   –  Bad  or  undesirable  paAerns  cannot  be   learned  if  they  don’t  exist  in  the  data.     Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 21. Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 22. •  There  is  no  magic  in  MT,  human  effort  is  required.   •  The  quality  of  the  output  and  suitability     for  purpose  is  directly  in  propor@on    to  the  amount  of  human  effort.   •  Without  human  direc@on,     MT  will  cost  more     in  the  long  term     and  is  more  likely     to  fail.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 23. •  Bad  transla@ons   •  Out  of  domain  text   •  Unbalanced  /  Biased   –  Too  much  text  from  other  domains   •  Mixed  /  Wrong  language   •  Junk  and  noise   •  Broken  HTML   •  Mixed  Encoding   •  Missing  diacri@cs     –  café  vs.  cafe   •  OCR  Text   •  Machine  translated  text   •  Anything  that  is  not  high     quality  and  in  domain   Put  Simply:  If  a  bad  paWern  does  not  exist  in   your  training  data,  you  cannot  generate  such  a   bad  paWern  as  transla<on  output.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 24. English  Source   Human  Transla<on   Google  Transla<on   Google  Context   I  went  to  the  bank   Fui  al  banco   Fui  al  banco   Bank  as  in  finance   I  went  to  the  bank  to   Fui  al  banco  para  depositar   Fui  al  banco  a  depositar   Bank  as  in  finance   deposit  money   dinero   el  dinero   I  went  to  the  bank  of   Fui  en  coche  a  la   Fui  a  la  orilla  de  la  vuelta   Bank  as  in  river  bank   the  turn  in  my  car   inclinación  de  la  vuelta   en  mi  coche   I  put  my  car  into  the   Puse  mi  coche  en  la   Pongo  mi  coche  en  el   Bank  as  in  finance   bank  of  the  turn   inclinación  de  la  vuelta.   banco  de  la  vuelta   I  swam  to  the  bank  of   Nadé  en  la  orilla  del  río   Nadé  hasta  la  orilla  del   Bank  as  in  river  bank   the  river   río   I  banked  my  money   Deposité  mi  dinero   Yo  depositado  mi  dinero   Banked  as  in  finance   I  banked  my  car  into  the   Incliné  mi  coche  en  la   Yo  depositado  mi  coche   Banked  as  in  finance   turn   vuelta   en  la  vuelta   I  banked  my  plane  into   Incliné  mi  avión  en  para   Yo  depositado  en  mi   Banked  as  in  finance   a  steep  dive   una  zambullida.   avión  en  picada   Issue:   The  above  examples  show  that  Google  is  biased  towards  the  banking  and  finance  domain   There  is  much  more  mul<lingual  banking  and  finance  data  available  to  learn  from  than   Cause:   there  is  aeronau<cal  or  water  sports  data  available.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 25. •  Compe<tors  require  20%  or   Typical  Dirty  Data  SMT   more  addi<onal  data  than   engines  will  have  between   the  ini<al  training  data  to   2  million  and  20  million   show  notable  improvements.     sentences  in  the  iniCal   –  This  could  take  years  for  most  LSPs   training  data.     –  This  is  the  dirty  lihle  secret  of  the   Dirty  Data  SMT  approach  that  is   frequently  acknowledged.   •  Asia  Online  has  reference   customers  that  have  had   notable  improvements  with   <  0.1%   just  1  days  work  of  post   Improvements   daily  based  on   edi<ng.   edits   –  Only  possible  with  Clean  Data  SMT   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 26. Compe<tors  Sta<s<cal  MT   Language  Studio  Allows:   •  Automated  iden<fica<on  of  areas  of  weakness   •  Post  Edi<ng  Feedback  focusing  directly  on  areas  of   •  Get  more  dirty  data   weakness   •  Human  translate  more  data   •  Automated  error  paWern  analysis  and  correc<on   •  Analysis  and  Resolu<on  of  Unknown  Words   •  Determina<on  and  resolu<on  of  high  frequency   Compe<tors  Rule  Base  MT   phrases   •  Terminology  Extrac<on   •  Balancing  Bilingual  Phrases  against  Monolingual  Data   •  Run<me  glossary   •  Run<me  spelling  dic<onary   •  PaWern  handling  and  adjustments   •  Incremental  Improvement  Training   •  Automated  Quality  Measurement   •  Add  dic<onary  entries  (limit  20K  words)   •  Human  Quality  Measurement     •  Train  a  language  model  to  fix  broken   •  Quality  Confidence  Scores  for  each  segment   rules  output  (limit  40K  phrases)   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 27. •  Typically  about  10-­‐20  examples  for  each   •  Large  volumes  of  dirty  data  prohibits  manual   clean  word  of  phrase.   correc<on.   •  Each  correc<on  has  sta<s<cal  relevance  and   •  Individual  correc<ons  are  not  sta<s<cally   impact  can  be  clearly  seen.   relevant.   •  Correc<ons  usually  involve  adding  data  to   •  Manual  correc<ons  must  compete  against   fill  gaps.   1,000’s  of  bad  examples.  Imprac<cal  to  create   •  Far  less  correc<on  of  actual  errors.   enough  examples  manually.   •  Clean  data  means  cause  of  errors  can  be   •  Understanding  the  cause  of  errors  is  difficult.   understood  and  corrected.   •  Slows  training  and  overall  processing  <me.   •  Concordance  used  to  create  unbiased   Requires  more  resources  to  process  excess   examples/phrases  and  ensure  scope   data.   covered.     •  Only  solu<on  is  to  acquire  more  dirty  data   and  hope  problem  is  fixed.  But  may  get  worse   or  cause  new  errors.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 28. 1960’s   1980’s   1990’s   2012   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 29. Before  Machine  TranslaCon   Source  text  is  processed  and  modified.   Pre-­‐Transla<on  JavaScript  (JS)   -­‐  Complex  pre-­‐processing  can        be  customized  via  JavaScript.   Pre-­‐Transla<on  Correc<ons  (PTC)   -­‐  A  list  of  terms  that  adjust  the  source     AUer  Machine  TranslaCon      text  fixing  common  issues  and     Target  text  is  processed  and  modified.      making  it  more  suitable  for  transla@on.   Non-­‐Translatable  Terms  (NTT)   Post  Transla<on  Adjustment  (PTA)   -­‐  A  list  of  monolingual  terms  that  are      -­‐  A  list  of  terms  in  the  target  language  that        used  to  ensure  key  terms  are  not          modify  the  translated  output.  This  is  very        translated.        useful  for  normaliza@on  of  target  terms.   Run<me  Glossary  (GLO)   Post  Transla<on  JavaScript  (JS)   -­‐  A  list  of  bilingual  terms  that  are  used  to      -­‐  Complex  post-­‐processing  can        ensure  terminology  is  translated  a            be  customized  via  JavaScript.      specific  way.   Run<me  customiza<ons  can  be  applied  in  2  forms: Default:  Applied  to  all  jobs.   Job  Specific:  A  different  set  of  customiza@ons  can  be  applied  for   different  clients.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 30. 12,000  15,000   9,000   18,000   Typical  MT  +     6,000   Post  Edi<ng   21,000   Speed   *   3,000   25,000   Human  T ransla<o n   0   Words  Per  Day  Per  Translator   28,000   Average  person  reads  200-­‐250  words  per  minute.  96,000-­‐120,000  in  8  hours.    ~35  Cmes  faster  than  human  translaCon.   Copyright  ©  2013,  Asia  Online  Pte  Ltd   *Fastest  MT  +  Post  Edi@ng  Speed  reported  by  clients.  
  • 31. Metrics  That  Really  Count     ProducCvity  is  the     •  Produc<vity  –  Words  per  day  per  human  resource   Best  Quality  Metric   •  Margin  –  2-­‐3  <mes  the  profit  margin  is  commonplace   Raw   MT   oaen   has   a   greater   number   of   errors   than   first  pass  human  transla<on.     •  Consistency  –  Wri<ng  style  and  terminology   However:   ü  MT  +  Human  delivers  higher  quality  than  a  human  only   Language  Studio™  MT  is  stylised  to  a  specific  domain,   approach   customer   and   target   audience,   so   quality   is   •  Deals   considerably  higher  than  other  MT  systems.     ü  New  deals  not  accessible  with  a  human  only  approach   This  means  that:   ü  Deals  where  you  could  offer  a  more  compe@@ve  bid  due   1.  MT  errors  are  easy  to  see  and  easy  to  fix     (i.e.  simple  grammar).     to  MT  than  your  compe@tors   2.  MT  provides  more  accurate  and  consistent   ü  Deals  that  would  have  been  lost  to  a  compe@tor  without   terminology  than  human  translators,  especially   when  more  than  1  human  works  on  a  project.   the  advantages  that  MT  offers   3.  Human  errors  may  be  fewer,  but  harder  to  see   and  harder  to  fix.   Examples  of  other  “Useful”  Quality  Indicators   Coun@ng  the  number  of  errors  only,  offers  no  value  as   Automated  Metrics  (Good  indicators,  but  not  absolute)   a   metric   as   the   complexity   of   the   error   is   not   taken   •  BLEU  (Bilingual  Evalua@on  Understudy)   into  account.     •  NIST   MT  with  more  errors  is  oaen  faster  to  edit  and  fix   than  first  pass  human  transla<ons  with  fewer  errors.     •  F-­‐Measure  (F1  Score  or  F-­‐Score)   •  METEOR  (Metric  for  Evalua@on  of  Transla@on  with  Explicit  ORdering)     Margin   Manual  Quality  Metrics  (Most  not  designed  for  MT,  more  for  HT)   Time   •  Edit  Distance  (Does  not  take  into  account  complexity  of  edit)   •  SAE-­‐J2450  (Industry  specific)   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 32. Wait  for  a  Project     Create  a  Product     That  Requires  MT   For  Resale   •  Opportunis@c  approach   •  Proac@ve  approach   •  Many  LSPs  are  interested  in  MT,  but  not   •  Leverages  exis@ng  transla@on  assets   willing  to  take  the  plunge  without  a  paying   •  Can  be  sold  to  many  clients   client.     •  Easier  to  sell  -­‐  test  and  show   •  Limited  to  one  client   •  Can  sell  mul@ple  language  pairs  at  the  same   @me   •  Harder  to  sell  –  longer  sales  cycle   •  Generally  a  higher  Return  On  Investment   •  OGen  build  one  language  pair    to  try,  before   (ROI)   commipng  to  others   Revenue   Revenue   Recurring  revenues  from  words  translated   Recurring  revenues  from  words  translated   One  @me  revenues  from  resale  of  customiza@on   Preparing  source  data   Post  edi@ng   Post  edi@ng   Preparing  source  data   Run@me  glossary  prepara@on   Terminology  defini@on   Non-­‐translatable  terms  defini@on   Non-­‐Translatable  terms   Unknown  and  high-­‐frequency  phrase  resolu@on   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 33. Faster  Transla@on     Reduce  Project  Costs     Delivery     •  Helps  to  manage  margin  squeeze:   •  New  projects  that  could  not  have  been   –  compete  with  compe@tors  using  cheaper  (perhaps   delivered  on  due  to  @me  and  resource   lower  quality)  resources  or  compe@tors  using  MT   constraints   •  Helps  to  cost  jus@fy  business  cases  that  may   •  Helps  clients  that  want  to  simultaneously   not  be  viable  using  a  human  only  approach   ship  product  in  mul@ple  languages   •  Can  be  used  behind  the  scenes  (like  a   •  New  clients  in  research,  analysis,  data   transla@on  memory)  or  disclosed  to  client   mining  and  discovery  markets   •  More  client  work  in  other  areas  as  a  result  of   •  New  clients  that  need  real-­‐@me  or  near  real-­‐ leG  over  transla@on  budget.   @me  transla@on   Revenue   Revenue   Depending  on  project  or  product  model,   Preparing  source  data   revenues  will  vary.  See  previous  slide.   Run@me  glossary  prepara@on     Non-­‐translatable  terms  defini@on   Addi@onal  revenues  from  client  gepng  more   Post  edi@ng   ROI  and  willing  to  invest  in  new  languages.   Recurring  revenues  from  words  translated   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 34. Expand  Exis@ng   Added      Rela@onships   Func@onality   •  Opportuni@es  to  translate  addi@onal   •  Expand  service  offerings  with  new   material  for  markets  that  may  not  have   features  such  as  mul@lingual  customer   been  cost  viable  with  a  human  only   support   approach   •  Integrate  machine  transla@on  into   •  Reuse  custom  MT  for  mul@ple  purposes   exis@ng  client  technologies,  products   and  services   •  Enable  clients  to  beher  compete  in   markets  that  were  only  par@ally   addressed  due  to  cost  and  @me   Revenue   Revenue   Preparing  source  data   Same  as  for  Expanding  Exis@ng  Rela@ons   Terminology  defini@on     Non-­‐Translatable  terms   Addi@onally  able  to  charge  various  service  fees   Unknown  and  high-­‐frequency  phrase  resolu@on   rela@ng  to  the  new  services  offered.  For   Post  edi@ng   example,  transla@ng  common  Q&A  for   Recurring  revenues  from  words  translated   customer  support  and  a  commission  on   One  @me  revenues  from  resale  of  customiza@on   integrated  mul@lingual  support  products.   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 35. Post  Edi<ng  Cost   6   Cost  Per  Word                MT  learns  from  post  edi@ng  feedback  and  quality  of   5   Post  Edi<ng  (Human  Transla<on)   transla@on  constantly  improves.   4    Cost  of  post  edi@ng  progressively  reduces  as  MT  quality   3   increases  aGer  each  engine  learning  itera@on.   2   1   MT  Post  Edi<ng   1   2   3   4   5   6   Engine  Learning  Itera<on   Post  Edi<ng  Effort  Reduces  Over  Time   Publica<on  Quality  Target    The  post  edi@ng  and  cleanup  effort  gets  easier  as  the   MT  engine  improves.   Quality   Post  Edi<ng    Effort    Ini@al  efforts  should  focus  on  error  analysis  and   correc@on  of  a  representa@ve  sample  data  set.     Raw  MT  Quality    Each  successive  project  should  get  easier  and  more   efficient.   1   2   3   4   5   6   Engine  Learning  Itera<on   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 36. How  Omnilingua  Measures  Quality   –  Triangulate  to  find  the  data   –  Raw  MT  J2450  v.  Historical  Human  Quality  J2450   –  Time  Study  Measurements   –  OmniMT  EffortScore™   Everything  must  be  measured  by  effort  first   –  All  other  metrics  support  effort  metrics   –  Produc@vity  is  key   ∆  Effort  >  MT  System  Cost  +  Value  Chain  Sharing   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 37. •  Built  as  a  Human  Assessment  System:     –  Provides  7  defined  and  ac@onable  error  classifica@ons.   –  2  severity  levels  to  iden@fy  severe  and  minor  errors.     •  Provides  a  Measurement  Score  Between  1  and  0:     –  A  lower  score  indicates  fewer  errors.   –  Objec@ve  is  to  achieve  a  score  as  close  to  0  (no  errors/issues)  as   possible.     •  Provides  Scores  at  Mul@ple  Levels:     –  Composite  scores  across  an  en@re  set  of  data.   –  Scores  for  logical  units  such  as  sentences  and  paragraphs.     Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 38. Asia  Online  v.   Compe<ng  MT  System   Factor   Total  Raw  J2450  Errors   2x  Fewer   Raw  J2450  Score   2x  Beher   Total  PE  J2450  Errors   5.3x  Fewer   PE  J2450  Score   4.8x  Beher   PE  Rate   32%  Faster   “      There  were  far  fewer  errors  produced  by  the  Language  Studio™  custom  MT  engine   than  the  compe<tor's  legacy  MT  engine.       “        We  found  that  52%  of  the  raw  original  output   from  Asia  Online  had  no  errors  at  all     ” Notably  there  were  fewer  wrong  meanings,  structural  errors  and  wrong  terms  in  the   –  which  is  great  for  an  ini<al  engine.       Language  Studio™  custom  MT  engine,  that  were  "typical  SMT  problems"  in  the   ” compe@tor's  legacy  MT  engine.     “ The  final  transla<on  quality  aaer  post-­‐edi<ng  was  beWer  with  the  new  Language   Studio™  custom  MT  engine  than  the  compe<tor's  legacy  MT  engine  and  also  beWer   than  a  human  only  transla<on  approach.     –  Kevin  Nelson,     Managing  Director,     Terminology  was  more  consistent  with  a  combined  Language  Studio™  custom  MT  engine   plus  human  post  edi@ng  approach.     Omnilingua  Worldwide   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 39. •  LSP:  Sajan   •  End  Client  Profile:   –  Large  global  mul@na@onal  corpora@on  in  the  IT  domain.   –  Has  developed  its  own  proprietary  MT  system  that  has  been  developed  over  many  years.   •  Project  Goals   –  Eliminate  the  need  for  full  transla@on  and  limit  it  to  MT  +  Post-­‐edi@ng   •  Language  Pair:     –  English  -­‐>  Simplified  Chinese.   –  English  -­‐>  European  Spanish.   –  English  -­‐>  European  French.   •  Domain:  IT   •  2nd  Itera@on  of  Customized  Engine   –  Customized  ini@al  engine,  followed  by  an  incremental  improvement  based  on  client   feedback.   •  Data     –  Client  provided  ~3,000,000  phrase  pairs.     –  26%  were  rejected  in  cleaning  process  as  unsuitable  for  SMT  training.   •  Measurements:   –  Cost   –  Timeframe   –  Quality   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 40. •  Quality   –  Client  performed  their  own  metrics   –  Asia  Online  Language  Studio™  was   considerably  beher  than  the  clients   own  MT  solu@on.   –  Significant  quality  improvement  aGer   60%  Cost  Saving   providing  feedback  –  65  BLEU  score.   –  Chinese  scored  beher  than  first  pass   human  transla@on  as  per  client’s   feedback  and  was  faster  and  easier  to   edit.   •  Result     70%  Time  Saving   –  Client  extremely  impressed  with  result   especially  when  compared  to  the   output  of  their  own  MT  engine.   –  Client  has  commissioned  Sajan  to   work  with  more  languages   LRC  have  uploaded  Sajan’s  slides  and  video  PresentaCon  from  the  recent  LRC  conference:   Slides:  hLp://bit.ly/r6BPkT            Video:  hLp://bit.ly/trsyhg   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 41. Travel  &  Leisure  Ver@cal   English  to  Spanish  Language  Pair   Custom  MT  engines  built  and  programma@cally  consumed     A  human  post  edit  step  was  included  in  workflow  and   measurement   Scien@fic  measures  of  produc@vity  for  all  phases  of  process   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 42. Base  training  materials  provided  and  catalogued   Asia  Online  trained  the  engine  and  released  to  a  diagnos@c   stage   First  pass  of  new  content  through  diagnos@c  engine  yielded   posi@ve  results   Asia  Online  provided  advanced  data  genera@on  technologies   to  the  diagnos@c  engine  through  monolingual  data  crawling,   applica@on  of  run@me  rules,  and  pre-­‐transla@on  adjustments     Even  further  progress  achieved  from  extrac@ng  and  applying   a  industry  specific  high  frequency  term  list  from  the  source   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 43. 58%  of  segments   required  no  edits   Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 44. Post  Edit  Produc<vity  Analysis   Produc@vity  Percentage   328%  Increase   Produc@vity  Rate   8,208  words  a  day         Copyright  ©  2013,  Asia  Online  Pte  Ltd  
  • 45. Business  Strategies  for  Building   Strategic  Advantage  and  Revenue  from   Machine  Transla<on   Dion  Wiggins   Chief  Execu<ve  Officer   dion.wiggins@asiaonline.net     Copyright  ©  2013,  Asia  Online  Pte  Ltd