The document discusses the limitations of automated metrics like BLEU for evaluating machine translation quality and proposing alternatives like professional human annotation of errors in machine translation using frameworks like MQM/DQF. It presents error profiles of machine translation systems based on such annotations, which can provide more useful information than automated scores to understand issues and improve translation quality. The document advocates moving beyond fully automated metrics to involve human analysis of errors and source language phenomena to build test suites that help determine relationships between source barriers and target errors to enhance machine translation systems.