MCB 432 Final Table PP 01.06.16

Keegan McAuliffe
MCB 432: Computing in
Molecular Biology
The following is my final presentation for MCB 432: detailing the process our
group undertook to determine the identity of a unknown bacteria. We were
provided with raw sequence reads of a bacteria, and we converted them into
contigs and scaffolds. We assembled the data into a complete genome, then
annotated for potential genes to successfully determine the identity of the
bacteria as Bacteroides vulgatus str. 3975.

Keegan McAuliffe
Henry Chen
Andrew Storm
Dominic Gentile
Team 10 Results and Discussion
Introduction:
The onset of new high throughput sequencing has increased our ability to analyze genetic information.
In this project, we demonstrate how to use raw sequence data from sampled organisms for genetic and
genomic analysis. With the raw sequenced reads provided by the PI, we assembled a genome for our
unknown microorganism. The genome assembly was accomplished by using the A5ud assembler
program (Table 1). With the data generated, we were able to determine the total number of contigs and
scaffolds and use these assemblies to predict and annotate genes (Table 2). Assembled genome on
hand, we are now capable of searching and analyzing predicted genes in order to characterize our
unknown organism, which we accomplished using the Prodigal algorithm for gene prediction. Prodigal
generates gene and protein predictions, but does not provide analysis to what those predicted genes
and proteins represent. Therefore, we need to employ other programs that function to annotate our
predictions and because genes are so complex, we need to be specific in choosing programs for gene
analysis. For instance, programs such as Emboss allow you to search for alignments and patterns in your
assembly to databases of well-known genes, HMM and Blast searches allow to you to compare protein
homology, and many other programs designed to search for features such as tRNA and signal peptides.
With this analytical power, we analyzed our genome and present how we accomplished these tasks and
our results.

Results: (Optional tasks)
The objective of Optional Task 1 was to determine the GC content of each gene. In order to ascertain
this information, it was first necessary to assemble our reads into contigs and scaffolds—the objective of
Mandatory Task 1. To do this, we first had to unzip or inflate the data of our read, using the “gunzip”
command. Next, we ran the A5ud assembler on the data. This generated a file for quality trimming
report, assembly report, initial scaffolding report, final scaffold quality check, error corrected reads,
contigs, crude scaffolds, broken scaffolds, and final scaffolds. The assembly report contained the GC
content for each contig, which we added to Table 3. The average GC content for all contigs is .407.
Because GC bonds are more stable than AT bonds, our genome is less stable than a genome of GC
content greater than .500.
The objective of Optional Task 3 was to determine the best BlastP match for our proteins against the NR
database. The first step of Task 3, then, was to determine the proper command to generate a single best
match from the NR database for each contig, with an E-value less than 1e-10, as well as the organism to
which it belongs, the accession number, and percent identity. The command we used was:
blastp –db nr –query TeamProject.faa –out TeamProject.br –evalue 1E-10
–outfmt 6 –max_target_seqs 1
This command gave us the E-value, accession number, and percent identity for the blast blastp match of
each contig. However, we still needed to the organism name and description of the gene. For this, we
used the program efetch.pl. Using a list of accession names as an input, efetch.pl generated the
organism name and gene annotation for each gene of interest. This data was recorded in Table 5. This
task was also instrumental in determining the most closely related genus, species, and strain to our
scaffolds.

The best blastp match for each contig was of the genus Bacteroides, and the overwhelming majority was
of the species Bacteroides vulgatus. More specifically, the strain Bacteroides vulgatus str. 3975 RP4
occurred 9 times out of 104 contigs. Furthermore, this represents 60% of the 15 blast results specific
enough to indicate strain. This data led us to conclude that Bacteroides vulgatus str. 3975 is the most
closely related strain.
The objective of Optional Tasks 4 and 5 were to analyze the CDSs for possible proteins and genes. The
scaffold sequence were analyzed using PFAM to determine possible protein matches and TIGRFAM to
determine possible gene matches. The hmmscan for the PFAM matches used the Pfam-A database and
TeamProject.faa. The hmmscan for the TIGRFAM matches used the TIGRFAMs_14.0.HMM database and
TeamProject.faa. The results were compiled into Table 6 and Table 7 from the TeamProject_pfam.txt and
TeamProject_tigrfam.txt. Only the best match for each CDS were added to Table 3. The PFAM hmmscan
revealed that many of the CDS had at least one related protein. The predicted proteins of CDSs with
multiple matches were all closely related. For example, all the predicted proteins for the 1_83 CDS are
from the Glycosyl transferase family 2. The TIGRFAM search revealed that there were fewer matches;
only 33 to the 191 matches of the PFAM search. Most of the CDS with TIGRFAM matches only have one
match. Only CDS 1_15, 1_39, 1_82, and 1_85 have multiple matches. These CDSs only had two matches
where several PFAM matches had four or five matches. The TIGRFAM and PFAM matches for each CDS
both predicted similar functions for the CDSs that had both TIGRFAM and PFAM matches.

Optional Task 6 used PHYRE2 to analyze CDS 1.1_1, 1.1_4, 1.1_14, 1.1_19, 1.1_32, 1.1_54, 1.1_57,
1.1_60, 1.1_68, and 2.1_8. All CDSs except 1.1_1 and 1.1_32 had a confidence of 100.0; with values of
61.1 and 49.4 respectively. The PHYRE2 predicted proteins agree with the PFAM predictions for all
except 1.1_1, 1.1_32, 1.1_57, and 1.1_60. The other possible PHYRE2 matches were also not the same
as the PFAM results. This may be because the structures of the PFAM matches are not in the PHYRE2
database.
For Optional Task 7 we used looked for more specific features such as signal peptides. We used our
assembled scaffold (team.fasta) and compared it to a reference database with gram negative
prokaryotes, we were able to identify potential signal peptides and determined the length of these
peptides. We compared our data to gram negative prokaryotes because our previous blast analysis
identified genes and proteins matched those found in the gram negative genus Bacteriodes. The output
data (which can be located in the file TeamProj_SigP_Summary.txt) specifically denoted the presence or
absence of the signal peptides and the cutoff points of those peptides (C-value). This allowed us to
determine the predicted lengths of the peptides. The results can be found in Table 3.
The objective of Optional Task 8 was to analyze the presence of rho-independent transcriptional
terminators. This is a particularly useful application as intrinsic terminators typically denote genes that
are actively transcribed. In order to accomplish this task, we needed to run our genome alignment
(team.fasta) for a RHO independent terminator database search while supplying the search with
predicted gene coordinates. These predicted gene coordinates were determined through our EMBOSS
infoseq analysis of predicted proteins on our assembly and restructured into the TeamProj.coords file for
use with our RHO analysis program. The report generated can be found in the file TeamProj_tt +
TeamProj_tt.txt and the results of which predicted genes had identifiable RHO independent terminators
are listed in Table 3.

Optional Task #9 determined if we can find any homologous RNA secondary structures from our assembled
genome. Like all genes, tRNA structure can provide valuable information on the function and origin of the gene,
which can be incredibly valuable when characterizing an unknown genome. With our assembled genome in hand
(team.fasta) we searched for matches in conserved RNA structures with a handful of RFAM databases: RF00005,
RF00010, RF00023, RF00029, RF00059, RF00174, RF00177, RF01693, RF01694, RF01726, RF01998, and RF02001.
The data can be found as TeamProj_RF*.txt. From our search we only found 1 tRNA match and include that match in
information on the matched gene in Table 3.
For Optional Task 14, we constructed an alignment of our scaffolds with the genome of the bacterial strain with the
most sequence matches, which we determined to be Bacteroides vulgatus str. 3975 RP4. On NCBI, we found 184
contigs of a whole genome-sequencing project for this strain. We concatenated these contigs to create a whole
genome, to which we compared our scaffolds using blastn. With that blast report as a reference, we aligned the
genomes using “act” and saved a screenshot of part of the alignment as Figure 3.

Discussion:
As we previously alluded to in the discussing the results of Optional Task 3, we used Blastp to
determine the best match of each contig within the database “NR.” This data, located in Table 5, clearly
indicates that genus of the closest relative is Bacteroides. After all, according to our blastp results, the
best match of every contig corresponds to the genus Bacteroides. We can further assert that the
species is Bacteroides vulgatus. 43 of the 104 contigs list Bacteroides vulgatus as their best match, and
of the blast matches that were specific to species, 43 of 49 contigs (87.76%) list Bacteroides vulgatus.
We can delve even deeper into the identity of the closest relative, as of the 104 contigs we were
searching against, the strain Bacteroides vulgatus str. 3975 RP4 occurred 9 times. Thus, 9 of 15 blast
results specific enough to indicate strain list Bacteroides vulgatus str. 3975 RP4. These data led us to
conclude that Bacteroides vulgatus str. 3975 is the most closely related strain.

Appendix
Contains 7 tables containing the raw data used to create our Results and
Discussion sections along with 1 figure showing our genome alignment

Table1GenomeAssemblystatistics forTeam10
No.ofReadPairs 47893
No.oflowqualityreads 1763
No.ofassembledReads 102640
No.ofunassembledReads 2382
No.ofContigs 2
No.ofScaffolds 2
Totalntlengthofscaffolds 126196
Length %G+C
No.ofreads
mapped Coverage
Contig 100.0 119,977 40.61% 4851245 6065.0
Contig 100.1 6,219 37.58% 240956 5811.0

Table 2 Gene annotation summary for scaffolds
CDS/ORFs tRNAs other RNAs
scaffold1.1 95 0 0
scaffold2.1 9 1 0

Table 3. Predicted Gene Coordinates
Scaffold Name Type Start Stop Strand NT Length AA Length GC % Signal Peptide?SP Length (AA) Best Blast Hit Blast description
scaffold 1.1 1_1 CDS 3 611 - 609 202 0.406 N gi|496057719|ref|WP_008782226.1| transposase, partial
scaffold 1.1 1_2 CDS 845 3022 - 2178 725 0.405 Y 21 gi|649547948|gb|KDS54658.1| hypothetical protein M099_1756
scaffold 1.1 1_3 CDS 3539 3766 - 228 75 0.403 N gi|649547946|gb|KDS54656.1|
glycoside hydrolase family 88
domain protein
scaffold 1.1 1_4 CDS 3949 4905 - 957 318 0.383 N gi|492435030|ref|WP_005843062.1|
MULTISPECIES: transcriptional
regulator
scaffold 1.1 1_5 CDS 5062 6291 + 1230 409 0.408 N gi|492435027|ref|WP_005843060.1| TonB-dependent receptor
scaffold 1.1 1_6 CDS 6311 7198 + 888 295 0.429 Y 18 gi|492435023|ref|WP_005843058.1| hypothetical protein
scaffold 1.1 1_7 CDS 7536 8942 + 1407 468 0.396 Y 21 gi|649547942|gb|KDS54652.1| ahpC/TSA family protein
scaffold 1.1 1_8 CDS 9027 9767 - 741 246 0.396 N gi|649547941|gb|KDS54651.1| ahpC/TSA family protein
scaffold 1.1 1_9 CDS 10111 12657 + 2547 848 0.421 N gi|495945682|ref|WP_008670261.1|
MULTISPECIES: hypothetical
protein
protein
scaffold 1.1 1_11 CDS 15884 16252 + 369 122 0.477 Y 19 gi|492458337|ref|WP_005851052.1| alpha-L-fucosidase
scaffold 1.1 1_12 CDS 16394 17275 - 882 293 0.468 N gi|492434987|ref|WP_005843035.1| tRNA dimethylallyltransferase 1
protein
MULTISPECIES: UDP-N-
acetylglucosamine acyltransferase
MULTISPECIES: hydroxymyristoyl-
ACP dehydratase
MULTISPECIES: UDP-3-O-
acylglucosamine N-acyltransferase
scaffold 1.1 1_17 CDS 22035 22727 + 693 230 0.43 N gi|500644323|ref|WP_011964621.1| phosphohydrolase
MULTISPECIES: orotidine 5'-
phosphate decarboxylase
MULTISPECIES: peptide chain
release factor 1
MULTISPECIES:
phosphoribosylformylglycinamidine
cyclo-ligase
scaffold 1.1 1_21 CDS 24081 24527 + 447 148 0.31 N gi|492434963|ref|WP_005843021.1| hypothetical protein
scaffold 1.1 1_22 CDS 24636 24818 + 183 60 0.409 N gi|492434961|ref|WP_005843020.1| MULTISPECIES: toxin Fic

Table 5. Single best blast hit of annotated ORFs from Team 10
Name Gene Identifier Description Organism % identity E-value
1_1 gi|496057719|ref|WP_008782226.1| transposase, partial Bacteroides sp. 3_1_40A 100 8.00E-88
1_2 gi|649547948|gb|KDS54658.1| hypothetical protein M099_1756 Bacteroides vulgatus str. 3975 RP4 100 4.00E-62
1_3 gi|649547946|gb|KDS54656.1| glycoside hydrolase family 88 domain protein Bacteroides vulgatus str. 3975 RP4 100 6.00E-62
1_4 gi|492435030|ref|WP_005843062.1| MULTISPECIES: transcriptional regulator Bacteroides 100 5.00E-82
1_5 gi|492435027|ref|WP_005843060.1| TonB-dependent receptor Bacteroides vulgatus 100 0
1_6 gi|492435023|ref|WP_005843058.1| hypothetical protein Bacteroides vulgatus 100 0
1_7 gi|649547942|gb|KDS54652.1| ahpC/TSA family protein Bacteroides vulgatus str. 3975 RP4 100 0
1_8 gi|649547941|gb|KDS54651.1| ahpC/TSA family protein Bacteroides vulgatus str. 3975 RP4 100 0
1_9 gi|495945682|ref|WP_008670261.1| MULTISPECIES: hypothetical protein Bacteroides 99.61 0
1_10 gi|495945680|ref|WP_008670259.1| MULTISPECIES: hypothetical protein Bacteroides 97.22 2.00E-16
1_11 gi|492458337|ref|WP_005851052.1| alpha-L-fucosidase Bacteroides vulgatus 100 0
1_12 gi|492434987|ref|WP_005843035.1| tRNA dimethylallyltransferase 1 Bacteroides vulgatus 100 0
1_13 gi|492434984|ref|WP_005843033.1| MULTISPECIES: hypothetical protein Bacteroides 100 9.00E-131
1_14 gi|492434981|ref|WP_005843031.1| MULTISPECIES: UDP-N-acetylglucosamine acyltransferaseBacteroides 100 3.00E-180
1_15 gi|492458346|ref|WP_005851058.1| MULTISPECIES: hydroxymyristoyl-ACP dehydrataseBacteroides 100 0
1_16 gi|492458349|ref|WP_005851060.1| MULTISPECIES: UDP-3-O-acylglucosamine N-acyltransferaseBacteroides 100 0
1_17 gi|500644323|ref|WP_011964621.1| phosphohydrolase Bacteroides vulgatus 100 0
1_18 gi|492434969|ref|WP_005843024.1| MULTISPECIES: orotidine 5'-phosphate decarboxylaseBacteroides 100 0
1_19 gi|492434967|ref|WP_005843023.1| MULTISPECIES: peptide chain release factor 1 Bacteroides 100 0
1_20 gi|492458355|ref|WP_005851064.1| MULTISPECIES: phosphoribosylformylglycinamidine cyclo-ligaseBacteroides 100 0
1_21 gi|492434963|ref|WP_005843021.1| hypothetical protein Bacteroides vulgatus 100 6.00E-138
1_22 gi|492434961|ref|WP_005843020.1| MULTISPECIES: toxin Fic Bacteroides 100 0
1_24 gi|492434958|ref|WP_005843019.1| hypothetical protein Bacteroides vulgatus 99.64 0
1_25 gi|492458364|ref|WP_005851068.1| MULTISPECIES: hypothetical protein Bacteroides 100 0
1_26 gi|492458366|ref|WP_005851069.1| MULTISPECIES: membrane protein Bacteroides 100 2.00E-43
1_28 gi|492458370|ref|WP_005851071.1| MULTISPECIES: beta-N-acetylhexosaminidase Bacteroides 100 0
1_29 gi|492434942|ref|WP_005843009.1| MULTISPECIES: endonuclease Bacteroides 99.71 0
1_30 gi|511016443|ref|WP_016270813.1| excinuclease ABC subunit A Bacteroides vulgatus 100 0
1_31 gi|492434935|ref|WP_005843004.1| MULTISPECIES: hypothetical protein Bacteroides 100 0
1_32 gi|492434933|ref|WP_005843003.1| MULTISPECIES: chromate transporter Bacteroides 100 1.00E-131
1_33 gi|492434930|ref|WP_005843001.1| MULTISPECIES: chromate transporter Bacteroides 100 1.00E-105
1_35 gi|511016441|ref|WP_016270811.1| phosphoribosylformylglycinamidine synthase Bacteroides vulgatus 100 0
1_36 gi|492434921|ref|WP_005842995.1| MULTISPECIES: translocator protein, LysE familyBacteroides 100 4.00E-150
1_38 gi|492458387|ref|WP_005851079.1| MULTISPECIES: dTDP-4-dehydrorhamnose reductaseBacteroides 100 0
1_39 gi|492434911|ref|WP_005842989.1| MULTISPECIES: peptide chain release factor 3 Bacteroides 100 0
1_40 gi|492434907|ref|WP_005842987.1| MULTISPECIES: molecular chaperone DnaJ Bacteroides 100 0
1_41 gi|492434904|ref|WP_005842985.1| dihydrofolate reductase Bacteroides vulgatus 100 0
1_42 gi|548318542|ref|WP_022508241.1| hypothetical protein Bacteroides vulgatus CAG:6 100 1.00E-174
1_44 gi|492458409|ref|WP_005851092.1| transcriptional regulator Bacteroides vulgatus 99.7 0

Table 6. PFAM domain matches for annotated genes from Team 10
Name PFAM ID Description E value
scaffold1.1_1 PF01610.12 Transposase 2.90E-25
scaffold1.1_2 PF11396.3 Protein of unknown function (DUF2874) 7.80E-15
scaffold1.1_4 PF03965.11 Penicillinase repressor 2.40E-25
scaffold1.1_5 PF03544.9 Gram-negative bacterial TonB protein C-termi 2.50E-23
scaffold1.1_5 PF13715.1 Domain of unknown function (DUF4480) 1.50E-16
scaffold1.1_5 PF05569.6 BlaR1 peptidase M56 1.00E-11
scaffold1.1_5 PF13620.1 Carboxypeptidase regulatory-like domain 2.90E-10
scaffold1.1_5 PF07715.10 TonB-dependent Receptor Plug Domain 2.10E-06
scaffold1.1_6 PF14559.1 Tetratricopeptide repeat 6.20E-13
scaffold1.1_6 PF13414.1 TPR repeat 6.70E-12
scaffold1.1_6 PF12895.2 Anaphase-promoting complex, cyclosome, subun 1.30E-07
scaffold1.1_7 PF00578.16 AhpC/TSA family 1.30E-11
scaffold1.1_7 PF00255.14 Glutathione peroxidase 4.20E-08
scaffold1.1_7 PF14289.1 Domain of unknown function (DUF4369) 1.70E-06
scaffold1.1_8 PF13905.1 Thioredoxin-like 1.40E-14
scaffold1.1_8 PF13098.1 Thioredoxin-like domain 1.90E-14
scaffold1.1_8 PF00085.15 Thioredoxin 2.70E-11
scaffold1.1_8 PF08534.5 Redoxin 4.30E-11
scaffold1.1_8 PF00578.16 AhpC/TSA family 1.00E-07
scaffold1.1_11 PF01120.12 Alpha-L-fucosidase 2.60E-87
scaffold1.1_12 PF01715.12 IPP transferase 7.70E-64
scaffold1.1_12 PF01745.11 Isopentenyl transferase 3.00E-12
scaffold1.1_12 PF04851.10 Type III restriction enzyme, res subunit 0.00022
scaffold1.1_13 PF07929.6 Plasmid pRiA4b ORF-3-like protein 4.00E-11
scaffold1.1_14 PF13720.1 Udp N-acetylglucosamine O-acyltransferase; D 1.20E-28
scaffold1.1_14 PF00132.19 Bacterial transferase hexapeptide (six repea 1.10E-25
scaffold1.1_15 PF03331.8 UDP-3-O-acyl N-acetylglycosamine deacetylase 6.00E-74
scaffold1.1_15 PF07977.8 FabA-like domain 1.10E-35
scaffold1.1_16 PF00132.19 Bacterial transferase hexapeptide (six repea 1.10E-29
scaffold1.1_16 PF04613.9 UDP-3-O-[3-hydroxymyristoyl] glucosamine N-a 7.00E-17
scaffold1.1_16 PF14602.1 Hexapeptide repeat of succinyl-transferase 1.20E-10
scaffold1.1_17 PF01966.17 HD domain 2.90E-08
scaffold1.1_18 PF00215.19 Orotidine 5'-phosphate decarboxylase / HUMPS 9.20E-30
scaffold1.1_19 PF03462.13 PCRF domain 3.40E-39
scaffold1.1_19 PF00472.15 RF-1 domain 2.60E-33
scaffold1.1_20 PF02769.17 AIR synthase related protein, C-terminal dom 1.70E-12
scaffold1.1_22 PF13310.1 Virulence protein RhuM family 5.70E-110
scaffold1.1_24 PF02638.10 Glycosyl hydrolase like GH101 1.80E-53
scaffold1.1_24 PF13200.1 Putative glycosyl hydrolase domain 3.40E-07
scaffold1.1_25 PF02554.9 Carbon starvation protein CstA 8.90E-79
scaffold1.1_25 PF13722.1 C-terminal domain on CstA (DUF4161) 2.30E-24

Table 7. TIGRFAM domain matches for annotated genes from Team 10
Name TIGRFAM ID Description E value
scaffold1.1_5TIGR04057 SusC_RagA_signa: TonB-dependent outer membrane receptor, SusC/RagA subfamily, signature region2.70E-16
scaffold1.1_5TIGR01352 tonB_Cterm: TonB family C-terminal domain 2.70E-12
scaffold1.1_12TIGR00174 miaA: tRNA dimethylallyltransferase 5.90E-75
scaffold1.1_14TIGR01852 lipid_A_lpxA: acyl-[acyl-carrier-protein]-UDP-N-acetylglucosamine O-acyltransferase 1.70E-92
scaffold1.1_15TIGR00325 lpxC: UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase 2.50E-56
scaffold1.1_15TIGR01750 fabZ: beta-hydroxyacyl-(acyl-carrier-protein) dehydratase FabZ 3.90E-49
scaffold1.1_16TIGR01853 lipid_A_lpxD: UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase LpxD 3.60E-105
scaffold1.1_18TIGR02127 pyrF_sub2: orotidine 5'-phosphate decarboxylase 3.60E-72
scaffold1.1_19TIGR00019 prfA: peptide chain release factor 1 1.10E-137
scaffold1.1_30TIGR00630 uvra: excinuclease ABC subunit A 0
scaffold1.1_38TIGR01214 rmlD: dTDP-4-dehydrorhamnose reductase 1.90E-89
scaffold1.1_39TIGR00503 prfC: peptide chain release factor 3 6.10E-207
scaffold1.1_39TIGR00231 small_GTP: small GTP-binding protein domain 2.20E-25
scaffold1.1_49TIGR02227 sigpep_I_bact: signal peptidase I 1.30E-19
scaffold1.1_52TIGR01730 RND_mfp: efflux transporter, RND family, MFP subunit 8.80E-48
scaffold1.1_56TIGR00221 nagA: N-acetylglucosamine-6-phosphate deacetylase 1.30E-81
scaffold1.1_57TIGR00057 TIGR00057: tRNA threonylcarbamoyl adenosine modification protein, Sua5/YciO/YrdC/YwlC family1.20E-44
scaffold1.1_59TIGR00460 fmt: methionyl-tRNA formyltransferase 8.00E-81
scaffold1.1_61TIGR02937 sigma70-ECF: RNA polymerase sigma factor, sigma-70 family 4.40E-29
scaffold1.1_63TIGR01163 rpe: ribulose-phosphate 3-epimerase 1.00E-83
scaffold1.1_64TIGR00360 ComEC_N-term: ComEC/Rec2-related protein 8.50E-27
scaffold1.1_67TIGR03990 Arch_GlmM: phosphoglucosamine mutase 1.80E-160
scaffold1.1_69TIGR00539 hemN_rel: putative oxygen-independent coproporphyrinogen III oxidase 4.50E-87
scaffold1.1_71TIGR00231 small_GTP: small GTP-binding protein domain 1.10E-18
scaffold1.1_76TIGR00166 S6: ribosomal protein S6 2.00E-25
scaffold1.1_77TIGR00165 S18: ribosomal protein S18 1.90E-33
scaffold1.1_78TIGR00158 L9: ribosomal protein L9 1.00E-35
scaffold1.1_82TIGR01579 MiaB-like-C: MiaB-like tRNA modifying enzyme 3.00E-122
scaffold1.1_82TIGR00089 TIGR00089: radical SAM methylthiotransferase, MiaB/RimO family 1.10E-113
scaffold1.1_85TIGR00525 folB: dihydroneopterin aldolase 5.10E-30

Table 8. Phyre2 predicted best crystal structure matches for annotated genes from Team 10
Name
PDB best
match Pct_identity Confidence
Aligned
region Description
1.1_1 c3f9kV 22 61.1 89-115 two domain fragment of hiv-2 integrase in complex with ledgf ibd
1.1_4 d1sd4a 19 100 3-120 Penicillinase repressor
1.1_14 c3i3aC 39 100 2-255 transferase, structural basis for the sugar nucleotide and acyl chain2 selectivity of leptospira interrogans lpxa
1.1_19 c3d5cX 43 100 8-369 peptide chain release factor 1, structural basis for translation termination on the 70s ribosome
1.1_32 c3dboA 29 49.4 36-67 toxin/antitoxin, crystal structure of a member of the vapbc family of toxin-antitoxin2 systems, vapbc-5, from mycobacterium tuberculosis
1.1_54 c4mt4C 12 100 27-478 transport protein, crystal structure of the campylobacter jejuni cmec outer membrane2 channel
1.1_57 c2eqaA 23 100 6-191 rna binding protein, crystal structure of the hypothetical sua5 protein from2 sulfolobus tokodaii
1.1_60 c3k6oA 24 100 29-237 structural genomics, unknown function, crystal structure of protein of unknown function duf13442 (yp_001299214.1) from bacteroides vulgatus atcc 8482
1.1_68 c1upsB 16 100 21-262 glycosyl hydrolase, glcnac[alpha]1-4gal releasing endo-[beta]-galactosidase2 from clostridium perfringens

Figure 3 is a screenshot of the whole-genome alignment
of our scaffolds against the genome of Bacteroides
vulgatus str. 3975 RP4, which we determined to be the
strain with the most blastp matches against our contigs.

MCB 432 Final Table PP 01.06.16

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie MCB 432 Final Table PP 01.06.16

Ähnlich wie MCB 432 Final Table PP 01.06.16 (20)

MCB 432 Final Table PP 01.06.16