Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

HEPData Open Repositories 2016 Talk

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
HEPData workshop talk
HEPData workshop talk
Wird geladen in …3
×

Hier ansehen

1 von 52 Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Anzeige

Ähnlich wie HEPData Open Repositories 2016 Talk (20)

Aktuellste (20)

Anzeige

HEPData Open Repositories 2016 Talk

  1. 1. Eamonn Maguire, CERN hepdata.net OR 2016, Dublin, Ireland
  2. 2. What is it? HEP Scattering experiments going back to the 1950s Each group of scientists will analyse particular signals by processing large numbers of collisions. The resulting analysis will be published as a paper. But where does the processed data go? RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation)
  3. 3. What is it? HEPDATA Physics paper Table description RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation) RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation) Table 1 Table 2 (F1) HEPData is the go to place for physicists to get access to the data underlying plots and tables in a publication. It also links to the scripts and ROOT files for instance used in the analysis (for reproducibility).
  4. 4. What is it?
  5. 5. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox Data Consumers 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments What is new for you? Whether you’re a data provider, or consumer, the new HEPData has many functionalities
  6. 6. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  7. 7. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  8. 8. Submission Process
  9. 9. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  10. 10. HEPData submission archive YAML Data Record YAML Data Record ROOT PYTHON C++ ROOT submission.yaml data records external data files & links links the submission together by detailing the data files to be loaded, their name and descrip- tion, and their assocated analysis files and code. YAML (or JSON) representation of the underlying data files including value errors in a verbose format. analysis files, code, links to code repositories, etc.
  11. 11. HEPdata submission archive RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation) HEPDATA Table 1 {JSON} Tables and plots Processes YAML file, inserts records in to database and links publication record with data and files. Web Server Table description Plots rendered automatically using a custom library built upon D3.js Tables rendered from JSON DownloadScripts
  12. 12. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  13. 13. Comprehensive Review System
  14. 14. Comprehensive Review System
  15. 15. Dashboard for Submission Management
  16. 16. Dashboard for Submission Management
  17. 17. Interactive Plotting Library
  18. 18. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  19. 19. Versioning
  20. 20. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  21. 21. DOIs All HEPData records get DOIs. Each data table gets a versioned DOI. The whole HEPData record is also given a DOI to encompass the whole collection.
  22. 22. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  23. 23. Sandbox
  24. 24. Sandbox
  25. 25. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  26. 26. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  27. 27. The System - Demo hepdata.net
  28. 28. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  29. 29. HEPData Our current search system is classical in that we do what ever other system does at the search level… 1. search based on some publication metadata; and 2. see if the data is what you need; if it’s good: download; else: check another record. But this process is long and can be tedious. Can we provide a better way, going from the data directly?
  30. 30. Lots of filtering options! HEPData Enter a search criteria, e.g. a variable name. We find all data with measurements on that variable and display an aggregated view.
  31. 31. HEPData
  32. 32. HEPData
  33. 33. Or edit the plot to remove variables we’re not interested in. HEPData Delete
  34. 34. Or build an entirely new plot from scratch! HEPData
  35. 35. A new type of search Data driven instead of publication driven Full data provenance captured Every data point is recorded with its parent publication’s inspire id and table number HEPData Fast data rendering With WebGL, now capable of rendering 100s of 1000s of data points in the browser. API Query HEPData directly from Mathematica, ROOT, etc. and process our results as you need
  36. 36. Masters Project of Juan Luis Boya Garcia, University of Salamanca, Spain. HEPData Coming soon…
  37. 37. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  38. 38. Semantic Publishing Every article is tagged with schema.org vocabulary. Makes it possible for Google and other search engines to understand our content. https://hepdata.net/record/ins1397180 Google’s view Google’s View https://hepdata.net/search
  39. 39. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  40. 40. Converter Convert from YAML to ROOT, YODA, CSV Install via PIP, use as a web service, and contribute to more conversions!
  41. 41. Conversion to many formats
  42. 42. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  43. 43. All data available in JSON Search results… https://hepdata.net/search/? collaboration=ATLAS&page=1 &format=json
  44. 44. All data available in JSON …Individual records https://hepdata.net/record/ ins1426695?format=json
  45. 45. All data available in JSON … & Data https://www.hepdata.net/ record/data/73442/65837/1
  46. 46. The same can be said for use in ROOT or any analysis platform with a file parser :) Use case: search, access, and get data directly from Mathematica No need to leave the software environment you like.
  47. 47. Example File here
  48. 48. Everything on Github! http://www.github.com/hepdata
  49. 49. Acknowledgements Eamonn Maguire Salvatore Mele HEPData @ CERN Graeme Watt Michael Whalley Frank Kraus HEPData @ Durham Lukas Heinrich HEPData @ NYU Kyle Cramner Alumni Jan Stypka Laura Rueda-Garcia Michal Szoziak Summer Student Juan Luis Boya Garcia HEPData @ Salamanca
  50. 50. Any questions?

×