EPA Chemicals Dashboard Provides Data for Non-Targeted Analysis

Non-targeted analysis supported by data
and cheminformatics delivered via the US
EPA CompTox Chemicals Dashboard
Antony Williams, Alex Chao, Tom Transue, Tommy Cathey,
Elin Ulrich and Jon Sobus
1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC
3) GDIT, Research Triangle Park, North Carolina, United State
4) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC
August 2019
ACS Fall Meeting, San Diego
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA

An intro to the Dashboard
• Freely available web-based database from the
National Center for Computational Toxicology
• Providing data for 875,000 substances including
– Experimental and predicted physicochemical properties
– In vivo toxicity data harvested from dozens of public resources
– In vitro bioactivity data for thousands of chemicals and assays
– Exposure data including chemicals in consumer products
– Real time predictions for >20 physchem and toxicological endpoints
• Dashboard is used by mass spectrometrists for
chemical identification
• A quick view of general capabilities…
1

CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
2
875k Chemical Substances

Access to Chemical Hazard Data
4

Sources of Exposure to Chemicals
5

Link Access
6
Links based on chemical
identifiers to dozens of
online resources –
including analytical data

MassBank of North America
https://mona.fiehnlab.ucdavis.edu
7

Overview of MS-Ready Structures
• All structure-based chemical substances are
algorithmically processed to
– Split multicomponent chemicals into individual structures
– Desalt and neutralize individual structures
– Remove stereochemical bonds from all chemicals
• MS-Ready structures are then mapped to
original substances to provide a path between
chemicals detected by mass spectrometry to
original substances
9

MS-Ready Mappings from
Details Page
11

MS-Ready Mappings Set
All substances containing component
13

Mass/Formula
Searching and
Metadata Ranking
14

Advanced Searches
Mass Search
15

Advanced Searches
Mass Search
16

MS-Ready Structures for
Formula Search
17

MS-Ready Mappings
• EXACT Formula: C10H16N2O8: 3 Hits
18

MS-Ready Mappings
• Same Input Formula: C10H16N2O8
• MS Ready Formula Search: 125 Chemicals
19

MS-Ready Mappings
• Exact Formula – 3 hits
• MS-Ready Formula – 125 hits!!
– ONLY 8 of the 125 are single component chemicals
– 3 are neutral compounds and 2 are charged
• How can we rank the candidates list?
20

Candidate ranking
using metadata
21

Data Source Ranking of
“known unknowns”
22
• A mass and/or formula search is
for an unknown chemical but it
is a known chemical contained
within a reference database
• Most likely candidate chemicals
have the most associated data
sources, most associated
literature articles or both
C14H22N2O3
266.16304
Chemical
Reference
Database
Sorted candidate
structures

The original ChemSpider work
23

Is a bigger database better?
24
• ChemSpider was 26 million chemicals for
the original work
• Much BIGGER today
• Is bigger better??
• Are there other metadata to use for ranking?

Using Metadata for Ranking
• Chosen dashboard metadata to rank candidates
– Associated data sources
• Lists in the underlying database (more about lists later)
• Associated data sources in PubChem
• Specific source types (e.g. water, surfactants, pesticides)
– Number of associated literature articles (Pubmed)
– Chemicals in the environment – the number of
products/categories containing the chemical is an
important source of data (from CPDat database)
25

Identification ranks for 1783 chemicals
using multiple data streams
26
DS: Data Sources
PC: PubChem
PM: PubMed
STOFF: DB
KEMI: DB
Data Sources alone
rank ~75% of the
chemicals as Top Hit

Comparing Search Performance
27
• When dashboard contained 720k chemicals
• Only 3% of ChemSpider size
• What was the comparison in performance?

SAME dataset for comparison
28

How did performance compare?
29
For the same 162 chemicals,
Dashboard outperforms
ChemSpider for both Mass and
Formula Ranking

How did performance compare?
30

Data Quality is important
• Data quality in free web-based databases!
31

Public Databases require curation
• There is significant bloating in the public
databases because of lack of curation
• The number of hits retrieved based on
mass or formula searching can explode
based on poorly represented chemicals –
especially stereochemistry issues
• MS-Ready structures will map back to
multiple versions of “the same chemical”.
32

Will the correct Microcystin LR Stand Up?
ChemSpider Skeleton Search
33

Comparing ChemSpider Structures
34

Comparing ChemSpider Structures
35

Batch Searching
mass and formula
37

Batch Searching
• Singleton searches are useful but we work
with thousands of masses and formulae!
• Typical questions
– What is the list of chemicals for the formula CxHyOz
– What is the list of chemicals for a mass +/- error
– Can I get chemical lists in Excel files? In SDF files?
– Can I include properties in the download file?
38

Batch Searching Formula/Mass
39

Searching batches using MS-Ready
Formula (or mass) searching
40

Mass
Spectrometry
Related Searches
41

Find me “related structures”
Formula-Based Search
42

Select Chemicals of Interest
43

Based on Structure Similarity
44

Based on Structure Similarity
45

Structure Similarity – sort on mass
46

EPAHFR: Hydraulic Fracturing
49

Predicted Mass Spectra
http://cfmid.wishartlab.com/
• MS/MS spectra prediction for ESI+, ESI-, and EI
• Predictions generated and stored for >800,000
structures, to be accessible via Dashboard
52

Search Expt. vs. Predicted Spectra

API services and Open Data
• Present API and web services available at
https://actorws.epa.gov/actorws/ but major
redevelopment is underway
• Downloadable data available via the
downloads page
58

Web Services
https://actorws.epa.gov/actorws/
• Data in UI, JSON and XML format
59

InChIKey to DTXCIDs
60
https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier
=UVOFGKIRTCCNKG-UHFFFAOYSA-N

Data and Services
used by the
Community
61

NORMAN Suspect List Exchange
https://www.norman-network.com/?q=node/236
62

Integration to MetFrag in place
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2
63

MassBank mapping to Dashboard
Based on Web Service lookup
64

Conclusion
• Dashboard access to data for ~875,000 chemicals
• MS-Ready data facilitates structure identification
• Related metadata facilitates candidate ranking
65
• Relationship mappings and
chemical lists of great utility
• Dashboard and contents
are one part of the solution
• New developments in
progress, especially API
development, will be very
enabling…

Acknowledgements
• IT Development team – especially Jeff
Edwards and Jeremy Dunne
• Chris Grulke for the ChemReg system
• Andrew McEachran (now at Agilent)
• Valery Tkachenko (working on new MS-Ready)
• NERL colleagues – Jon Sobus, Elin Ulrich,
Mark Strynar, Seth Newton, Alex Chao
• Emma Schymanski, LCSB, Luxembourg
• NORMAN Network and all contributors
66

Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology
EMAIL: Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
67

EPA Chemicals Dashboard Provides Data for Non-Targeted Analysis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie EPA Chemicals Dashboard Provides Data for Non-Targeted Analysis

Ähnlich wie EPA Chemicals Dashboard Provides Data for Non-Targeted Analysis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

EPA Chemicals Dashboard Provides Data for Non-Targeted Analysis

Hinweis der Redaktion