Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Research Knowledge Graphs at NFDI4DS & GESIS

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 21 Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Research Knowledge Graphs at NFDI4DS & GESIS (20)

Weitere von Stefan Dietze (20)

Anzeige

Aktuellste (20)

Research Knowledge Graphs at NFDI4DS & GESIS

  1. 1. Research Knowledge Graphs at NFDI4DS & GESIS RKG Workshop Stefan Dietze, 23.06.2022
  2. 2. NFDI4DS - Consortium 2 FHI Universität Leipzig  Several universities as centers of excellence in DS and AI.  Several non-university research institutes as key contributors.  Scientific information centers and infrastructure providers as providers of data and for access to domain user groups.
  3. 3. ● Paradigm shift in computer science towards data- and deep learning driven methods (DS = fourth scientific paradigm). ● Reproducibility crisis. ○ Study on neural recommender publications at top-tier- venues finds only 7/18 reproducible and only 1/18 beating state-of-the-art. [Dacrema et al., RecSys2019] ● Bias and discrimination from learned data-induced biases. ○ Gender classification performance of neural approaches performs significantly worse for darker skinned women. [Buolamwini et al., FAccT2018] ○ Health risk of black people consistently underestimated by predictive algorithms. [Obermeyer et al., Science2019] Data Science & AI Challenges 3
  4. 4. ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Research Data Cycle Provenance & Dependencies of Research Data, Resources, Knowledge
  5. 5. Common questions for researchers • Which top-tier publications cite which data/method? („dataset authority“) • Which data was used to train/evaluate which method? Which method to produce what data? • Which claims are supported/cited/rejected by what dataset or publication? ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Provenance & Dependencies of Research Data, Resources, Knowledge
  6. 6. Challenges • Data & metadata about resources and concepts not represented in structured, machine-interpretable, integrated manner (hidden in publications, web pages etc) • Persistent identifiers (e.g. DOIs) used inconsistently (e.g. on publications/datasets) • Relations and semantics not explicit • Reproducibility crisis in CS/DS/AI ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Provenance & Dependencies of Research Data, Resources, Knowledge
  7. 7. • Data interoperability and reuse through established W3C standards for data sharing (on the Web), e.g. RDF, JSON, shared vocabularies (e.g. schema.org, DCAT, DDI), APIs for data reuse and linking • Making links between resources and concepts explicit & machine- interpretable (e.g. which publications cite what dataset?) • Consistent use of persisent IDs (e.g. URIs, DOIs) across all data, e.g. concepts, resources etc („DOIs for all“) • Partners in NFDI4DS/TA2 (“RKGs”) Knowledge Graphs for FAIR Research Data in NFDI4DS Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  8. 8. GESIS & TIB RKGs in NFDI4DS 10 • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  9. 9. 11 Research KGs in Practice: integrated search @ GESIS https://search.gesis.org/ Dataset Rel. Publications
  10. 10. 12 From publications to machine-interpretable metadata KGs Disambiguation of dataset & software/script citations https://data.gesis.org/softwarekg ▪ Training deep learning- based model for extraction software & data references in large-scale data (3.5 M publications) ▪ Data lifting into KG ▪ 300+ M triples / statements ▪ Search across data/software/publications (GESIS Search)
  11. 11. From publications to machine-interpretable metadata KGs Understanding scientific software/data usage https://data.gesis.org/softwarekg (Schindler et al., CIKM2021) ▪ Understanding SW usage, citation habits and their evolution across disciplines ▪ Rise of data science = rise of software usage
  12. 12. https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… From publications to machine-interpretable metadata KGs Understanding scientific software/data usage
  13. 13. https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… ▪ …follow the worst citation habits From publications to machine-interpretable metadata KGs Understanding scientific software/data usage
  14. 14. http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:Intensity "0.75" onyx:Intensity "0.0" http://dbpedia.org/resource/Solid wna:negative-emotion From social media to machine-interpretable research data KGs Building a public research knowledge graph from Twitter data https://data.gesis.org/tweetskb
  15. 15. From social media to machine-interpretable research data KGs https://data.gesis.org/tweetskb TweetsKB – a large-scale research KG of societal opinions ▪ Harvesting & archiving of 10 Billion tweets (permanent collection from Twitter 1% sample since 2013) ▪ Information extraction pipeline to build a KG of entities, interactions & sentiments (distributed batch processing via Hadoop Map/Reduce) o Entity linking with knowledge graph/DBpedia (“president”/“potus”/”trump” => dbp:DonaldTrump) o Sentiment analysis/annotation o Geotagging o Lifting into knowledge graph schema ▪ Public, privacy-aware, large-scale research corpus of public opinions and their evolution => interdisciplinary research P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
  16. 16. RKG-based social science research using TweetsKB https://dd4p.gesis.org Investigating Vaccine Hesitancy in DACH countries
  17. 17. Germany suspends vaccinations with Astra Zeneca Twitter discourse zu “Impfbereitschaft” RKG-based social science research using TweetsKB Investigating Vaccine Hesitancy in DACH countries https://dd4p.gesis.org
  18. 18. RKG-based discourse analysis using TweetsKB Vaccine Hesitancy– key topics in “safety” category „Schwangerschaft“ „Kimmich“ „Alter“ „Nebenwirkungen“ „Herzinfarkt“ „Zulassung“ https://dd4p.gesis.org
  19. 19. Summary: Research KGs @ GESIS 21 Tools for constructing scholarly knowledge graphs ● NLP and deep learning-powered methods for extracting large-scale KGs about methods, claims, data, software involved in the scientific process Large-scale scholarly KGs, e.g. ● KGs about scholarly use of software & research data (e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted from 3 M publications, https://data.gesis.org/softwarekg/) ● Web mined KGs of social science research data, e.g. public opinions, claims and attitudes expressed on social media (e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments, https://data.gesis.org/tweetskb) Semantic Search powered by KGs and related tools ● RKG-powered search across scholarly publications, datasets, methods and their relations (e.g. GESIS Search, https://search.gesis.org) https://gesis.org/en/kts https://search.gesis.org
  20. 20. Outlook: a joint Research KG in NFDI4DS 22 Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG
  21. 21. 23 @stefandietze http://stefandietze.net

×