Generative AI on Enterprise Cloud with NiFi and Milvus
Scaling out federated queries for Life Sciences Data In Production
1. SCALING OUT FEDERATED QUERIES
FOR LIFE SCIENCES DATA IN PRODUCTION
Dieter De Witte, Laurens De Vocht, et al.
dieter.dewitte@ugent.be
• IMEC– IDLAB – GHENT UNIVERSITY
• ONTOFORCE
2. Catch 22!?
A. No Semantic Web Applications
because no Semantic Data
B. No Semantic Data
because no applications
3. A. The LOD Cloud for Life Sciences...
Ontoforce’s
DISQOVER
covers
> 110
Life Sciences
Datasets
4. B. DISQOVER is an Exploratory
Semantic Search UI (faceted browsing)
To Click = To SPARQL
5. The missing link in our catch 22?
“How to run federated queries?”
Direct ETL
6. • Cloud Instances
• PAGO amis:
Scientific Benchmark
= Reproducible Benchmark
Benchmark Client
• 1 single-threaded warm-up
run (all 1,223 queries)
• 1 multi-threaded (8) run
• (8 x randomized order)
Database Node(s)
7. How to evaluate
an RDF Database solution?
Performance (
Data store,
Dataset,
Configuration,
Number of nodes,
Hardware (RAM)
)
8. Performance (
NoSQL Triple stores,
Watdiv 10M, 100M, 1000M,
Standard Configs,
Single Node,
32 GB RAM
)
SIGMOD 2016:
Single Node SOTA on artificial data
20. FILTERs, UNIONs are challenging but
ORDER + GROUP + OPTIONAL dominate
COUNT DISTINCT
600 – 1,223
BGPs
21. Conclusions & Future Work
• Additional diagnostics for RDF solutions!
• Extend benchmarking software with query
correctness assessment!
• Multi-node RDF solutions???
• Towards Full paper:
– NoSQL for Ontoforce Data
– Scale out approaches for Watdiv + test LDF
– Release reusable end-to-end benchmark software:
• Setup AND Postprocessing
22. Thanks for your attention!!
SCALING OUT FEDERATED QUERIES
FOR LIFE SCIENCES DATA IN PRODUCTION
Dieter De Witte, Laurens De Vocht, et al.
contact: dieter.dewitte@ugent.be
slideshare:
• IMEC– IDLAB – GHENT UNIVERSITY
• ONTOFORCE