Anzeige

Más contenido relacionado

Similar a Evolution of open chemical information(20)

Más de Valery Tkachenko(20)

Anzeige

Evolution of open chemical information

  1. Evolution of open chemical information Valery Tkachenko Royal Society of Chemistry ACS Fall 2016 Philadelphia, PA
  2. The Short History of Time Image credit: Rhys Taylor, Cardiff University ~1992
  3. Chemical database
  4. PubChem
  5. • 57 million chemicals and growing • Data sourced from >500 different sources • Crowdsourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • A structure centric hub for web-searching
  6. ChemSpider
  7. ChemSpider
  8. ChemSpider real-time curation
  9. Article X-ray Compounds Reaction Analytical Data Text and References
  10. Reaction 1: NextMove reaction text-mined from RSC archive – original article
  11. Reaction 1: NextMove reaction text- mined from RSC archive – cml output <?xml version="1.0" encoding="UTF-8"?> <reactionList xmlns="http://www.xml-cml.org/schema" xmlns:cmlDict="http://www.xml-cml.org/dictionary/cml/" xmlns:nameDict="http://www.xml-cml.org/dictionary/cml/name/" xmlns:unit="http://www.xml-cml.org/unit/" xmlns:cml="http://www.xml-cml.org/schema" xmlns:dl="http://bitbucket.org/dan2097"> <reaction> <dl:source> <dl:documentId>c3ra45871g</dl:documentId> <dl:paragraphText>Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL), concentrated. The residue was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid. [α]D20 −24.2 (c 1.1, CHCl3); 1H NMR (CDCl3, 300 MHz) δ 0.04 (s, 3H), 0.07 (s, 3H), 0.85 (s, 9H), 1.34 (s, 3H), 1.44 (s, 3H), 2.16 (br, 1H), 3.68–3.81 (m, 3H), 4.16 (t, J = 13.8 Hz, J = 13.8 Hz, 1H), 4.59 (t, J = 6.6 Hz, J = 6.6 Hz, 1H), 5.22 (d, J = 10.7 Hz, 1H), 5.34 (d, J = 17.1 Hz, 1H), 5.90 (ddd, J = 7.2 Hz, J = 10.2 Hz, J = 17.2 Hz, 1H); 13C NMR (CDCl3, 75 MHz) δ 134.1, 118.4, 108.5, 79.5, 78.8, 70.8, 65.0, 27.8, 25.9, 25.4, 18.1, −3.7, −4.4. HRMS (ESI) calcd for [M + Na]+ (C15H30O4SiNa) 325.1811, found 325.1807.</dl:paragraphText> </dl:source> <dl:reactionSmiles>[H- ].C([Al+]CC(C)C)C(C)C.C([O:17][CH2:18][C@@H:19]([O:29][Si:30]([C:33]([CH3:36])([CH3:35])[CH3:34])([CH3:32])[CH3:31])[C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3 :28])([CH3:27])[O:21]1)(=O)C(C)(C)C&gt;ClCCl&gt;[C:33]([Si:30]([CH3:32])([CH3:31])[O:29][C@@H:19]([C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3:28])([CH3:27])[O:21 ]1)[CH2:18][OH:17])([CH3:36])([CH3:35])[CH3:34] |f:0.1|</dl:reactionSmiles> <productList> <product role="product"> <molecule id="m0"> <name dictRef="nameDict:unknown">10</name> <dl:nameResolved>(R)-2-((tert-Butyldimethylsilyl)oxy)-2-((4S,5S)-2,2-dimethyl-5-vinyl-1,3-dioxolan-4-yl)ethanol</dl:nameResolved> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00102">1.02 mmol</amount> <amount dl:propertyType="MASS" dl:normalizedValue="0.308">308 mg</amount> <amount dl:propertyType="PERCENTYIELD" dl:normalizedValue="79">79%</amount> <amount dl:propertyType="CALCULATEDPERCENTYIELD" dl:normalizedValue="79.1" units="unit:percentYield">79.1</amount> <identifier dictRef="cml:smiles" value="C(C)(C)(C)[Si](O[C@H](CO)[C@H]1OC(O[C@H]1C=C)(C)C)(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/C15H30O4Si/c1-9-11-13(18-15(5,6)17-11)12(10-16)19-20(7,8)14(2,3)4/h9,11-13,16H,1,10H2,2-8H3/t11-,12+,13-/m0/s1"/> <dl:entityType>definiteReference</dl:entityType> <dl:appearance>colourless</dl:appearance> <dl:state>liquid</dl:state> </product> </productList> <reactantList> <reactant role="reactant"> <molecule id="m1"> <name dictRef="nameDict:unknown">Diisobutylaluminium hydride</name> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00323">3.23 mmol</amount>
  12. Reaction 1: procedure steps Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid (aq) (w/w, 10%, 5 mL), concentrated. The residue was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid. Text mining breaks down procedure summary into steps: <dl:reactionActionList/dl:reactionActions> dl:phraseTexts • action="Add“: Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C • action=" Stir“: The reaction mixture was stirred at −78 °C for another 2 h • action="Heat“: warmed up to rt • action="Quench“: quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL) • action="Concentrate“: concentrated • action="Add“: The residue was added with water (10 mL) • action="Extract“: extracted with dichloromethane (12 mL × 3) • action="Dry“: dried over Na2SO4 • action="Filter“: filtered • action="Concentrate“: concentrated • action="Purify“: The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) • action="Yield“: to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid
  13. http://www.wired.com/2014/04/google-project-ara/ http://www.wsj.com/articles/googles-modular-phones-to-go-on- sale-next-year-1463783371
  14. The World we are heading into http://www.gartner.com/newsroom/id/3143521
  15. Our World is hyperconnected
  16. Standards?
  17. Data quality issues Robochemistry Proliferation of errors in public and private databases Automated quality control system
  18. CVSP
  19. CVSP – submission details
  20. CVSP – issues review
  21. J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287
  22. CVSP - mapping
  23. CVSP – rules
  24. Dimensions and complexity of science
  25. D2I2K2W
  26. info@openphactsfoundation.org @Open_PHACTS Open PHACTS Practical Semantics OpenPHACTS GlaxoSmithKline – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen Esteve Almirall OpenLink Scibite The Open PHACTS Foundation Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca Pfizer
  27. Why is it so hard to…. Competitors? What’s the structure? Are they in our file? What’s similar? What’s the target?Pharmacology data? Known Pathways? Working On Now? Connections to disease? Expressed in right cell type? IP?
  28. @gray_alasdair Big Data Integration 30 Knowledge is federated
  29. Publishing – then…
  30. …and now?
  31. http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
  32. Data Market
  33. Publishers - the guardians of knowledge This is a poster for Guardians of the Galaxy. The poster art copyright is believed to belong to the distributor of the Film, Walt Disney Studios Motion Pictures, the publisher, Marvel Studios, or the graphic artist.
  34. Data Publishing Original artist: Joseph Ferdinand Keppler (1838-1894) Restoration: Adam Cuerden - http://www.loc.gov/pictures/item/2011661385/ by way ofhttp://adamcuerden.deviantart.com/gallery/#/d5onmxh
  35. The World we live in
  36. Moore’s Law
  37. "Internet host count history". Internet Systems Consortium. Retrieved May 16,2012.
  38. We are on a verge of a new technical revolution and it feels great to anticipate it and be ready to ride! Image from surfline.com by Mike Cianciulli
  39. Data Science @ RSC The team. From left to right: Valery Tkachenko and Alexey Pshenichnov, based in the United States; Aileen Day, based in Southampton; John Boyle, Peter Corbett, Colin Batchelor, Jeff White, Nicholas Bailey and Val the plant, based at TGH
  40. Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16

Hinweis der Redaktion

  1. What about science and chemistry in particular?
  2. Remember this, some of these questions are easier to answer than others
  3. Open PHACTS was developed to support the key questions of drug discovery Business questions have been at the heart of Open PHACTS and have driven the development of the platform Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commercial Data provided by many publishers Originally in many formats: relational, SD files and RDF Worked closely with publishers Data licensing was a major issue Over 5 billion triples – 14 datasets & growing Hosted on beefy hardware; data in memory (aim) Extensive memcaching Pose complex queries to extract data
Anzeige