SlideShare a Scribd company logo
1 of 22
Download to read offline
Bringing the Semantic Web
closer to reality
PostgreSQL as RDF Graph Database
Jimmy Angelakos
EDINA, University of Edinburgh
FOSDEM
04-05/02/2017
or how to export your data to
someone who's expecting RDF
Jimmy Angelakos
EDINA, University of Edinburgh
FOSDEM
04-05/02/2017
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Semantic Web? RDF?
● Resource Description Framework
– Designed to overcome the limitations of HTML
– Make the Web machine readable
– Metadata data model
– Multigraph (Labelled, Directed)
– Triples (Subject – Predicate – Object)
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
● Information addressable via URIs
● <http://example.org/person/Mark_Twain>
<http://example.org/relation/author>
<http://example.org/books/Huckleberry_Finn>
● <http://edina.ac.uk/ns/item/74445709>
<http://purl.org/dc/terms/title>
"The power of words: A model of honesty and
fairness" .
● Namespaces
@prefix dc:
<http://purl.org/dc/elements/1.1/> .
RDF Triples
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Triplestores
● Offer persistence to our RDF graph
● RDFLib extended by RDFLib-SQLAlchemy
● Use PostgreSQL as storage backend!
● Querying
– SPARQL
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
{ +
"DOI": "10.1007/11757344_1", +
"URL": "http://dx.doi.org/10.1007/11757344_1", +
"type": "book-chapter", +
"score": 1.0, +
"title": [ +
"CrossRef Listing of Deleted DOIs" +
], +
"member": "http://id.crossref.org/member/297", +
"prefix": "http://id.crossref.org/prefix/10.1007", +
"source": "CrossRef", +
"created": { +
"date-time": "2006-10-19T13:32:01Z", +
"timestamp": 1161264721000, +
"date-parts": [ +
[ +
2006, +
10, +
19 +
] +
] +
}, +
"indexed": { +
"date-time": "2015-12-24T00:59:48Z", +
"timestamp": 1450918788746, +
"date-parts": [ +
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
import psycopg2
from rdflib import plugin, Graph, Literal, URIRef
from rdflib.namespace import Namespace
from rdflib.store import Store
import rdflib_sqlalchemy
EDINA = Namespace('http://edina.ac.uk/ns/')
PRISM =
Namespace('http://prismstandard.org/namespaces/basic/2.1/')
rdflib_sqlalchemy.registerplugins()
dburi =
URIRef('postgresql+psycopg2://myuser:mypass@localhost/rdfgraph'
)
ident = EDINA.rdfgraph
store = plugin.get('SQLAlchemy', Store)(identifier=ident)
gdb = Graph(store, ident)
gdb.open(dburi, create=True)
gdb.bind('edina', EDINA)
gdb.bind('prism', PRISM)
item = EDINA['item/' + str(1)]
triples = []
triples += (item, RDF.type, EDINA.Item),
triples += (item, PRISM.doi, Literal('10.1002/crossmark_policy'),
gdb.addN(t + (gdb,) for t in triples)
gdb.serialize(format='turtle')
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
BIG
DATA
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Super size me!
● Loop over original database contents
● Create triples
● Add them to graph efficiently
● Serialise graph efficiently
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
But but… ?
● Big data without Java?
● Graphs without Java?
– Gremlin? BluePrints? TinkerPop? Jena? Asdasdfaf? XYZZY?
● Why not existing triplestores? “We are the only Graph DB that...”
● Python/Postgres run on desktop hardware
● Simplicity (few LOC and readable)
● Unoptimised potential for improvement→
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
conn = psycopg2.connect(database='mydb', user='myuser')
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
seqcur = conn.cursor()
seqcur.execute("""
CREATE SEQUENCE IF NOT EXISTS item;
""")
conn.commit()
cur = conn.cursor('serverside_cur',
cursor_factory=psycopg2.extras.DictCursor)
cur.itersize = 50000
cur.arraysize = 10000
cur.execute("""
SELECT data
FROM mytable
""")
while True:
recs = cur.fetchmany(10000)
if not recs:
break
for r in recs:
if 'DOI' in r['data'].keys():
seqcur.execute("SELECT nextval('item')")
item = EDINA['item/' + str(seqcur.fetchone()[0])]
triples += (item, RDF.type, EDINA.Item),
triples += (item, PRISM.doi, Literal(r['data']['DOI'])),
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
1st
challenge
● rdflib-sqlalchemy
– No ORM, autocommit (!)
– Creates SQL statements, executes one at a time
– INSERT INTO … VALUES (…);
INSERT INTO … VALUES (…);
INSERT INTO … VALUES (…);
– We want INSERT INTO … VALUES (…),(…),
(…)
– Creates lots of indexes which must be dropped
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
2nd
challenge
● How to restart if interrupted?
● Solved with querying and caching.
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
from rdflib.plugins.sparql import prepareQuery
orgQ = prepareQuery("""
SELECT ?org ?pub
WHERE { ?org a foaf:Organization .
?org rdfs:label ?pub . }
""", initNs = { 'foaf': FOAF, 'rdfs': RDFS })
orgCache = {}
for o in gdb.query(orgQ):
orgCache[o[1].toPython()] = URIRef(o[0].toPython())
if 'publisher' in r['data'].keys():
publisherFound = False
if r['data']['publisher'] in orgCache.keys():
publisherFound = True
triples += (item, DCTERMS.publisher, orgCache[r['data']
['publisher']]),
if not publisherFound:
seqcur.execute("SELECT nextval('org')")
org = EDINA['org/' + str(seqcur.fetchone()[0])]
orgCache[r['data']['publisher']] = org
triples += (org, RDF.type, FOAF.Organization),
triples += (org, RDFS.label, Literal(r['data']
['publisher'])),
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
3rd
challenge
● rdflib-sqlalchemy (yup… guessed it)
– Selects whole graph into memory
● Server side cursor:
res = connection.execution_options(
stream_results=True).execute(q)
● Batching:
while True:
result = res.fetchmany(1000) …
yield …
– Inexplicably caches everything read in RAM!
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
4th
challenge
● Serialise efficiently! Multiprocessing→
– Processes, JoinableQueues
● Turtle: Unsuitable N-Triples→
– UNIX magic
python3 rdf2nt.py |
split -a4 -d -C4G
--additional-suffix=.nt
--filter='gzip > $FILE.gz' -
exported/rdfgraph_
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Desktop hardware...
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
5th
challenge
● Serialisation outran HDD!
● Waits for:
– JoinableQueues to empty
– sys.stdout.flush()
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
rdfgraph=> dt
List of relations
Schema | Name | Type | Owner
--------+-----------------------------------+-------+----------
public | kb_a8f93b2ff6_asserted_statements | table | myuser
public | kb_a8f93b2ff6_literal_statements | table | myuser
public | kb_a8f93b2ff6_namespace_binds | table | myuser
public | kb_a8f93b2ff6_quoted_statements | table | myuser
public | kb_a8f93b2ff6_type_statements | table | myuser
(5 rows)
rdfgraph=> x
Expanded display is on.
rdfgraph=> select * from kb_a8f93b2ff6_asserted_statements limit
1;
-[ RECORD 1 ]-----------------------------------------------
id | 62516955
subject | http://edina.ac.uk/ns/creation/12993043
predicate | http://www.w3.org/ns/prov#wasAssociatedWith
object | http://edina.ac.uk/ns/agent/12887967
context | http://edina.ac.uk/ns/rdfgraph
termcomb | 0
Time: 0.531 ms
rdfgraph=>
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
More caveats!
● Make sure you are not entering literals in a URI field.
● Also make sure your URIs are valid (amazingly some
DOIs fail when urlencoded)
● rdflib-sqlalchemy unoptimized for FTS
– Your (BTree) indices will be YUGE.
– Don't use large records (e.g. 10+ MB cnt:bytes)
● you need to drop index; insert; create
index
– pg_dump is your friend
– Massive maintenance work memory (Postgres)
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
DROP INDEX public.kb_a8f93b2ff6_uri_index;
DROP INDEX public.kb_a8f93b2ff6_type_mkc_key;
DROP INDEX public.kb_a8f93b2ff6_quoted_spoc_key;
DROP INDEX public.kb_a8f93b2ff6_member_index;
DROP INDEX public.kb_a8f93b2ff6_literal_spoc_key;
DROP INDEX public.kb_a8f93b2ff6_klass_index;
DROP INDEX public.kb_a8f93b2ff6_c_index;
DROP INDEX public.kb_a8f93b2ff6_asserted_spoc_key;
DROP INDEX public."kb_a8f93b2ff6_T_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_Q_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_Q_s_index";
DROP INDEX public."kb_a8f93b2ff6_Q_p_index";
DROP INDEX public."kb_a8f93b2ff6_Q_o_index";
DROP INDEX public."kb_a8f93b2ff6_Q_c_index";
DROP INDEX public."kb_a8f93b2ff6_L_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_L_s_index";
DROP INDEX public."kb_a8f93b2ff6_L_p_index";
DROP INDEX public."kb_a8f93b2ff6_L_c_index";
DROP INDEX public."kb_a8f93b2ff6_A_termComb_index";
DROP INDEX public."kb_a8f93b2ff6_A_s_index";
DROP INDEX public."kb_a8f93b2ff6_A_p_index";
DROP INDEX public."kb_a8f93b2ff6_A_o_index";
DROP INDEX public."kb_a8f93b2ff6_A_c_index";
ALTER TABLE ONLY public.kb_a8f93b2ff6_type_statements
DROP CONSTRAINT kb_a8f93b2ff6_type_statements_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_quoted_statements
DROP CONSTRAINT kb_a8f93b2ff6_quoted_statements_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_namespace_binds
DROP CONSTRAINT kb_a8f93b2ff6_namespace_binds_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_literal_statements
DROP CONSTRAINT kb_a8f93b2ff6_literal_statements_pkey;
ALTER TABLE ONLY public.kb_a8f93b2ff6_asserted_statements
DROP CONSTRAINT kb_a8f93b2ff6_asserted_statements_pkey;
Bringing the Semantic Web closer to reality
PostgreSQL as RDF Graph Database
Thank you =)
Twitter: @vyruss
EDINA Labs blog – http://labs.edina.ac.uk
Hack – http://github.com/vyruss/rdflib-sqlalchemy
Dataset – Stay tuned

More Related Content

What's hot

sslcompressionの設定方法および性能測定結果
sslcompressionの設定方法および性能測定結果sslcompressionの設定方法および性能測定結果
sslcompressionの設定方法および性能測定結果
kawarasho
 
Designing and developing vocabularies in RDF
Designing and developing vocabularies in RDFDesigning and developing vocabularies in RDF
Designing and developing vocabularies in RDF
Open Data Support
 
MongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDBMongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDB
ippei_suzuki
 

What's hot (20)

20190703 AWS Black Belt Online Seminar Amazon MQ
20190703 AWS Black Belt Online Seminar Amazon MQ20190703 AWS Black Belt Online Seminar Amazon MQ
20190703 AWS Black Belt Online Seminar Amazon MQ
 
sslcompressionの設定方法および性能測定結果
sslcompressionの設定方法および性能測定結果sslcompressionの設定方法および性能測定結果
sslcompressionの設定方法および性能測定結果
 
ODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live DiskODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live Disk
 
Designing and developing vocabularies in RDF
Designing and developing vocabularies in RDFDesigning and developing vocabularies in RDF
Designing and developing vocabularies in RDF
 
IAM Roles Anywhereのない世界とある世界(2022年のAWSアップデートを振り返ろう ~Season 4~ 発表資料)
IAM Roles Anywhereのない世界とある世界(2022年のAWSアップデートを振り返ろう ~Season 4~ 発表資料)IAM Roles Anywhereのない世界とある世界(2022年のAWSアップデートを振り返ろう ~Season 4~ 発表資料)
IAM Roles Anywhereのない世界とある世界(2022年のAWSアップデートを振り返ろう ~Season 4~ 発表資料)
 
Hiveを高速化するLLAP
Hiveを高速化するLLAPHiveを高速化するLLAP
Hiveを高速化するLLAP
 
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
DockerでCKANを動かそう
DockerでCKANを動かそうDockerでCKANを動かそう
DockerでCKANを動かそう
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
コンテナを止めるな! PacemakerによるコンテナHAクラスタリングとKubernetesとの違いとは
コンテナを止めるな!  PacemakerによるコンテナHAクラスタリングとKubernetesとの違いとはコンテナを止めるな!  PacemakerによるコンテナHAクラスタリングとKubernetesとの違いとは
コンテナを止めるな! PacemakerによるコンテナHAクラスタリングとKubernetesとの違いとは
 
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Db2をAWS上に構築する際のヒント&TIPS 2019年7月版
Db2をAWS上に構築する際のヒント&TIPS 2019年7月版Db2をAWS上に構築する際のヒント&TIPS 2019年7月版
Db2をAWS上に構築する際のヒント&TIPS 2019年7月版
 
MongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDBMongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDB
 
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
機密データとSaaSは共存しうるのか!?セキュリティー重視のユーザー層を取り込む為のネットワーク通信のアプローチ
 
SHACL Overview
SHACL OverviewSHACL Overview
SHACL Overview
 
AWS Black Belt Online Seminar 2017 Docker on AWS
AWS Black Belt Online Seminar 2017 Docker on AWSAWS Black Belt Online Seminar 2017 Docker on AWS
AWS Black Belt Online Seminar 2017 Docker on AWS
 
平成最後の1月ですし、Databricksでもやってみましょうか
平成最後の1月ですし、Databricksでもやってみましょうか平成最後の1月ですし、Databricksでもやってみましょうか
平成最後の1月ですし、Databricksでもやってみましょうか
 
Amazon Redshift 概要 (20分版)
Amazon Redshift 概要 (20分版)Amazon Redshift 概要 (20分版)
Amazon Redshift 概要 (20分版)
 
20180703 AWS Black Belt Online Seminar Amazon Neptune
20180703 AWS Black Belt Online Seminar Amazon Neptune20180703 AWS Black Belt Online Seminar Amazon Neptune
20180703 AWS Black Belt Online Seminar Amazon Neptune
 

Similar to Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database

SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
DataWorks Summit
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Sneha Challa
 

Similar to Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database (20)

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiA Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules Damji
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 

More from Jimmy Angelakos

More from Jimmy Angelakos (9)

Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Changing your huge table's data types in production
Changing your huge table's data types in productionChanging your huge table's data types in production
Changing your huge table's data types in production
 
The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12The State of (Full) Text Search in PostgreSQL 12
The State of (Full) Text Search in PostgreSQL 12
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλονEισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
Eισαγωγή στην PostgreSQL - Χρήση σε επιχειρησιακό περιβάλλον
 
PostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data ReplicationPostgreSQL: Mέθοδοι για Data Replication
PostgreSQL: Mέθοδοι για Data Replication
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database

  • 1. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Jimmy Angelakos EDINA, University of Edinburgh FOSDEM 04-05/02/2017
  • 2. or how to export your data to someone who's expecting RDF Jimmy Angelakos EDINA, University of Edinburgh FOSDEM 04-05/02/2017
  • 3. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Semantic Web? RDF? ● Resource Description Framework – Designed to overcome the limitations of HTML – Make the Web machine readable – Metadata data model – Multigraph (Labelled, Directed) – Triples (Subject – Predicate – Object)
  • 4. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database ● Information addressable via URIs ● <http://example.org/person/Mark_Twain> <http://example.org/relation/author> <http://example.org/books/Huckleberry_Finn> ● <http://edina.ac.uk/ns/item/74445709> <http://purl.org/dc/terms/title> "The power of words: A model of honesty and fairness" . ● Namespaces @prefix dc: <http://purl.org/dc/elements/1.1/> . RDF Triples
  • 5. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Triplestores ● Offer persistence to our RDF graph ● RDFLib extended by RDFLib-SQLAlchemy ● Use PostgreSQL as storage backend! ● Querying – SPARQL
  • 6. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database { + "DOI": "10.1007/11757344_1", + "URL": "http://dx.doi.org/10.1007/11757344_1", + "type": "book-chapter", + "score": 1.0, + "title": [ + "CrossRef Listing of Deleted DOIs" + ], + "member": "http://id.crossref.org/member/297", + "prefix": "http://id.crossref.org/prefix/10.1007", + "source": "CrossRef", + "created": { + "date-time": "2006-10-19T13:32:01Z", + "timestamp": 1161264721000, + "date-parts": [ + [ + 2006, + 10, + 19 + ] + ] + }, + "indexed": { + "date-time": "2015-12-24T00:59:48Z", + "timestamp": 1450918788746, + "date-parts": [ +
  • 7. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database import psycopg2 from rdflib import plugin, Graph, Literal, URIRef from rdflib.namespace import Namespace from rdflib.store import Store import rdflib_sqlalchemy EDINA = Namespace('http://edina.ac.uk/ns/') PRISM = Namespace('http://prismstandard.org/namespaces/basic/2.1/') rdflib_sqlalchemy.registerplugins() dburi = URIRef('postgresql+psycopg2://myuser:mypass@localhost/rdfgraph' ) ident = EDINA.rdfgraph store = plugin.get('SQLAlchemy', Store)(identifier=ident) gdb = Graph(store, ident) gdb.open(dburi, create=True) gdb.bind('edina', EDINA) gdb.bind('prism', PRISM) item = EDINA['item/' + str(1)] triples = [] triples += (item, RDF.type, EDINA.Item), triples += (item, PRISM.doi, Literal('10.1002/crossmark_policy'), gdb.addN(t + (gdb,) for t in triples) gdb.serialize(format='turtle')
  • 8. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database BIG DATA
  • 9. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Super size me! ● Loop over original database contents ● Create triples ● Add them to graph efficiently ● Serialise graph efficiently
  • 10. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database But but… ? ● Big data without Java? ● Graphs without Java? – Gremlin? BluePrints? TinkerPop? Jena? Asdasdfaf? XYZZY? ● Why not existing triplestores? “We are the only Graph DB that...” ● Python/Postgres run on desktop hardware ● Simplicity (few LOC and readable) ● Unoptimised potential for improvement→
  • 11. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database conn = psycopg2.connect(database='mydb', user='myuser') cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) seqcur = conn.cursor() seqcur.execute(""" CREATE SEQUENCE IF NOT EXISTS item; """) conn.commit() cur = conn.cursor('serverside_cur', cursor_factory=psycopg2.extras.DictCursor) cur.itersize = 50000 cur.arraysize = 10000 cur.execute(""" SELECT data FROM mytable """) while True: recs = cur.fetchmany(10000) if not recs: break for r in recs: if 'DOI' in r['data'].keys(): seqcur.execute("SELECT nextval('item')") item = EDINA['item/' + str(seqcur.fetchone()[0])] triples += (item, RDF.type, EDINA.Item), triples += (item, PRISM.doi, Literal(r['data']['DOI'])),
  • 12. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 1st challenge ● rdflib-sqlalchemy – No ORM, autocommit (!) – Creates SQL statements, executes one at a time – INSERT INTO … VALUES (…); INSERT INTO … VALUES (…); INSERT INTO … VALUES (…); – We want INSERT INTO … VALUES (…),(…), (…) – Creates lots of indexes which must be dropped
  • 13. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 2nd challenge ● How to restart if interrupted? ● Solved with querying and caching.
  • 14. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database from rdflib.plugins.sparql import prepareQuery orgQ = prepareQuery(""" SELECT ?org ?pub WHERE { ?org a foaf:Organization . ?org rdfs:label ?pub . } """, initNs = { 'foaf': FOAF, 'rdfs': RDFS }) orgCache = {} for o in gdb.query(orgQ): orgCache[o[1].toPython()] = URIRef(o[0].toPython()) if 'publisher' in r['data'].keys(): publisherFound = False if r['data']['publisher'] in orgCache.keys(): publisherFound = True triples += (item, DCTERMS.publisher, orgCache[r['data'] ['publisher']]), if not publisherFound: seqcur.execute("SELECT nextval('org')") org = EDINA['org/' + str(seqcur.fetchone()[0])] orgCache[r['data']['publisher']] = org triples += (org, RDF.type, FOAF.Organization), triples += (org, RDFS.label, Literal(r['data'] ['publisher'])),
  • 15. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 3rd challenge ● rdflib-sqlalchemy (yup… guessed it) – Selects whole graph into memory ● Server side cursor: res = connection.execution_options( stream_results=True).execute(q) ● Batching: while True: result = res.fetchmany(1000) … yield … – Inexplicably caches everything read in RAM!
  • 16. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 4th challenge ● Serialise efficiently! Multiprocessing→ – Processes, JoinableQueues ● Turtle: Unsuitable N-Triples→ – UNIX magic python3 rdf2nt.py | split -a4 -d -C4G --additional-suffix=.nt --filter='gzip > $FILE.gz' - exported/rdfgraph_
  • 17. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Desktop hardware...
  • 18. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database 5th challenge ● Serialisation outran HDD! ● Waits for: – JoinableQueues to empty – sys.stdout.flush()
  • 19. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database rdfgraph=> dt List of relations Schema | Name | Type | Owner --------+-----------------------------------+-------+---------- public | kb_a8f93b2ff6_asserted_statements | table | myuser public | kb_a8f93b2ff6_literal_statements | table | myuser public | kb_a8f93b2ff6_namespace_binds | table | myuser public | kb_a8f93b2ff6_quoted_statements | table | myuser public | kb_a8f93b2ff6_type_statements | table | myuser (5 rows) rdfgraph=> x Expanded display is on. rdfgraph=> select * from kb_a8f93b2ff6_asserted_statements limit 1; -[ RECORD 1 ]----------------------------------------------- id | 62516955 subject | http://edina.ac.uk/ns/creation/12993043 predicate | http://www.w3.org/ns/prov#wasAssociatedWith object | http://edina.ac.uk/ns/agent/12887967 context | http://edina.ac.uk/ns/rdfgraph termcomb | 0 Time: 0.531 ms rdfgraph=>
  • 20. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database More caveats! ● Make sure you are not entering literals in a URI field. ● Also make sure your URIs are valid (amazingly some DOIs fail when urlencoded) ● rdflib-sqlalchemy unoptimized for FTS – Your (BTree) indices will be YUGE. – Don't use large records (e.g. 10+ MB cnt:bytes) ● you need to drop index; insert; create index – pg_dump is your friend – Massive maintenance work memory (Postgres)
  • 21. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database DROP INDEX public.kb_a8f93b2ff6_uri_index; DROP INDEX public.kb_a8f93b2ff6_type_mkc_key; DROP INDEX public.kb_a8f93b2ff6_quoted_spoc_key; DROP INDEX public.kb_a8f93b2ff6_member_index; DROP INDEX public.kb_a8f93b2ff6_literal_spoc_key; DROP INDEX public.kb_a8f93b2ff6_klass_index; DROP INDEX public.kb_a8f93b2ff6_c_index; DROP INDEX public.kb_a8f93b2ff6_asserted_spoc_key; DROP INDEX public."kb_a8f93b2ff6_T_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_Q_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_Q_s_index"; DROP INDEX public."kb_a8f93b2ff6_Q_p_index"; DROP INDEX public."kb_a8f93b2ff6_Q_o_index"; DROP INDEX public."kb_a8f93b2ff6_Q_c_index"; DROP INDEX public."kb_a8f93b2ff6_L_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_L_s_index"; DROP INDEX public."kb_a8f93b2ff6_L_p_index"; DROP INDEX public."kb_a8f93b2ff6_L_c_index"; DROP INDEX public."kb_a8f93b2ff6_A_termComb_index"; DROP INDEX public."kb_a8f93b2ff6_A_s_index"; DROP INDEX public."kb_a8f93b2ff6_A_p_index"; DROP INDEX public."kb_a8f93b2ff6_A_o_index"; DROP INDEX public."kb_a8f93b2ff6_A_c_index"; ALTER TABLE ONLY public.kb_a8f93b2ff6_type_statements DROP CONSTRAINT kb_a8f93b2ff6_type_statements_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_quoted_statements DROP CONSTRAINT kb_a8f93b2ff6_quoted_statements_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_namespace_binds DROP CONSTRAINT kb_a8f93b2ff6_namespace_binds_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_literal_statements DROP CONSTRAINT kb_a8f93b2ff6_literal_statements_pkey; ALTER TABLE ONLY public.kb_a8f93b2ff6_asserted_statements DROP CONSTRAINT kb_a8f93b2ff6_asserted_statements_pkey;
  • 22. Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database Thank you =) Twitter: @vyruss EDINA Labs blog – http://labs.edina.ac.uk Hack – http://github.com/vyruss/rdflib-sqlalchemy Dataset – Stay tuned