Dr. Tom Conway discusses monitoring the quality of pathogen genome data used in public health and clinical settings. Typical genomics workflows involve sample preparation, sequencing, and analyzing sequence data. Sequence quality is measured by aligning to reference sequences and identifying mismatches. As more sequence fragments accumulate, the number of distinct words increases until reaching an asymptote, and word frequency distributions become more informative. Comparing true and false word counts derived from sequences can estimate genome size and true word fraction for quality control. This quantifiable, efficient, and interpretable approach is valuable for clinical and public health applications across species.