SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Making document search system
slightly friendlier to the power user.
Judgements search case study
Michał Łopuszyński
2017.11.29, London, UK
Search Solutions 2017
saos.org.pl
Before judgements scattered between many search systems•
Goal: Unify access to Polish case-law•
We provide unified search, rest API , WCAG compliant service•
Data volume ~ 300k documents and growing•
Constitutional
Tribunal
Import, metadata extraction
http://saos.org.pl
Supreme
Court
Common
Courts
National
Appeals
Chamber
API
Search
Analysis
~3k daily visits•
saos.org.pl
Side-goal: provide some non-mainstream approaches to
explore document collections
•
The analysis tool (the trender) – in production•
Creating maps of document collections – only in the lab•
The trender
The trender – saos.org.pl/analysis
Maps of document collections
Maps of document collections – a caveat
All low dimensional "embeddings" are wrong•
Some are useful (perhaps)•
The graph from Matti Lyra, PyData Berlin 2017, https://www.youtube.com/watch?v=UkmIljRIG_M
For t-SNE, see also https://distill.pub/2016/misread-tsne/
Maps of document collections – PCA vs t-SNE
PCA t-SNE
2000 judgements from National Appeal Chamber, common court,
Supreme Court, and Constitutional Tribunal visualised
•
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
Maps of document collections – PCA vs t-SNE
The previous picture coloured by issuing court (however, note that
issuing court was not used directly in map generation process)
•
National Appeal Chamber
common courts
Supreme Court
Constitutional Tribunal
PCA t-SNE
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
Maps of document collections – t-SNE example
2000 judgements from
common courts
tagged with different
keywords
•
granting
pensions
military
pensions
increase/recalculation
of pensions
pension
compensation
offence
agreement
personal rights
M.Jungiewicz, M. Łopuszyński,
Towards Meaningful Maps of
Polish Case Law, JURIX 2015,
185 (2015)
Maps of document collections – in the wild
Demo of Andrej Karpathy – papers, t-SNE based•
http://cs.stanford.edu/people/karpathy/scholaroctopus/
Paperscape – papers, based on citation networks•
http://paperscape.org
Acknowledgements
The Team•
Piotr Waglowski (the boss)•
Data science team: Michał Jungiewicz, Michał Łopuszyński•
Tech team: Łukasz Dumiszewski (tech lead), Aleksander Nowiński,
Monika Maksymiuk, Krzysztof Mądry, Łukasz Pawełczak, Jan Pavtel
•
The funding•
Grant of National Centre for Research and Development (PL),
within Social Innovations programme
•
Network analysis team: Michał Bojanowski, Bartosz Chrol
Monika Pawluczuk,
•
Thank you for your attention!
Questions?
@lopusz
http://slideshare.net/lopusz

Weitere ähnliche Inhalte

Was ist angesagt?

Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013
CH_Bundesarchiv
 

Was ist angesagt? (20)

Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
 
GND and URIs: Integration and Identification
GND and URIs: Integration and IdentificationGND and URIs: Integration and Identification
GND and URIs: Integration and Identification
 
Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019Wikidata Introduction, Linked Digital Future Initiative, August 2019
Wikidata Introduction, Linked Digital Future Initiative, August 2019
 
Data centre networking at the University of Bristol - Networkshop44
Data centre networking at the University of Bristol  - Networkshop44Data centre networking at the University of Bristol  - Networkshop44
Data centre networking at the University of Bristol - Networkshop44
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Wikidata and performing_arts_20170811
Wikidata and performing_arts_20170811Wikidata and performing_arts_20170811
Wikidata and performing_arts_20170811
 
Wikidata and performing_arts_20180116
Wikidata and performing_arts_20180116Wikidata and performing_arts_20180116
Wikidata and performing_arts_20180116
 
Linked Data at the German National Library
Linked Data at the German National LibraryLinked Data at the German National Library
Linked Data at the German National Library
 
Europeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlin
 
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
Historical Wiki of Vienna - the largest city wiki, Christoph Sonnlechner, SMW...
 
Open Data: EU Policies and Activities
Open Data: EU Policies and ActivitiesOpen Data: EU Policies and Activities
Open Data: EU Policies and Activities
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
Linked data in the swiss federal data infra
Linked data in the swiss federal data infraLinked data in the swiss federal data infra
Linked data in the swiss federal data infra
 
Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013
 
Local open data reaping the benefits
Local open data   reaping the benefitsLocal open data   reaping the benefits
Local open data reaping the benefits
 
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilots
 
[2015 e-Government Program] Action Plan : Warsaw(Poland)
[2015 e-Government Program] Action Plan : Warsaw(Poland)[2015 e-Government Program] Action Plan : Warsaw(Poland)
[2015 e-Government Program] Action Plan : Warsaw(Poland)
 
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
 

Ähnlich wie Making document search system slightly friendlier to the power user

Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptx
WARCnet
 

Ähnlich wie Making document search system slightly friendlier to the power user (20)

ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniData
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017
 
Austrian Experience in Building Data Value Chain
Austrian Experience in Building Data Value ChainAustrian Experience in Building Data Value Chain
Austrian Experience in Building Data Value Chain
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
Data science and the future of statistics
Data science and the future of statisticsData science and the future of statistics
Data science and the future of statistics
 
Doing data in the social sciences and humanities: links to and from published...
Doing data in the social sciences and humanities: links to and from published...Doing data in the social sciences and humanities: links to and from published...
Doing data in the social sciences and humanities: links to and from published...
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 
Preparing documentation and adapting work processes for acquiring DSA
Preparing documentation and adapting work processes for acquiring DSAPreparing documentation and adapting work processes for acquiring DSA
Preparing documentation and adapting work processes for acquiring DSA
 
Open, Digital Science in Europe
Open, Digital Science in EuropeOpen, Digital Science in Europe
Open, Digital Science in Europe
 
OpenGovIntelligence Workshop at NTTS2017
OpenGovIntelligence Workshop at NTTS2017OpenGovIntelligence Workshop at NTTS2017
OpenGovIntelligence Workshop at NTTS2017
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptxGauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptx
 
Towards a common danish infrastructure
Towards a common danish infrastructureTowards a common danish infrastructure
Towards a common danish infrastructure
 
Infrastructures for Open, Digital Science
Infrastructures for Open, Digital ScienceInfrastructures for Open, Digital Science
Infrastructures for Open, Digital Science
 
TIDSR
TIDSRTIDSR
TIDSR
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
Recent advances in the project EXCITE – Extraction of Citations from PDF Docu...
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museums
 

Kürzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Kürzlich hochgeladen (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Making document search system slightly friendlier to the power user

  • 1. Making document search system slightly friendlier to the power user. Judgements search case study Michał Łopuszyński 2017.11.29, London, UK Search Solutions 2017
  • 2. saos.org.pl Before judgements scattered between many search systems• Goal: Unify access to Polish case-law• We provide unified search, rest API , WCAG compliant service• Data volume ~ 300k documents and growing• Constitutional Tribunal Import, metadata extraction http://saos.org.pl Supreme Court Common Courts National Appeals Chamber API Search Analysis ~3k daily visits•
  • 3. saos.org.pl Side-goal: provide some non-mainstream approaches to explore document collections • The analysis tool (the trender) – in production• Creating maps of document collections – only in the lab•
  • 5. The trender – saos.org.pl/analysis
  • 6. Maps of document collections
  • 7. Maps of document collections – a caveat All low dimensional "embeddings" are wrong• Some are useful (perhaps)• The graph from Matti Lyra, PyData Berlin 2017, https://www.youtube.com/watch?v=UkmIljRIG_M For t-SNE, see also https://distill.pub/2016/misread-tsne/
  • 8. Maps of document collections – PCA vs t-SNE PCA t-SNE 2000 judgements from National Appeal Chamber, common court, Supreme Court, and Constitutional Tribunal visualised • M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  • 9. Maps of document collections – PCA vs t-SNE The previous picture coloured by issuing court (however, note that issuing court was not used directly in map generation process) • National Appeal Chamber common courts Supreme Court Constitutional Tribunal PCA t-SNE M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  • 10. Maps of document collections – t-SNE example 2000 judgements from common courts tagged with different keywords • granting pensions military pensions increase/recalculation of pensions pension compensation offence agreement personal rights M.Jungiewicz, M. Łopuszyński, Towards Meaningful Maps of Polish Case Law, JURIX 2015, 185 (2015)
  • 11. Maps of document collections – in the wild Demo of Andrej Karpathy – papers, t-SNE based• http://cs.stanford.edu/people/karpathy/scholaroctopus/ Paperscape – papers, based on citation networks• http://paperscape.org
  • 12. Acknowledgements The Team• Piotr Waglowski (the boss)• Data science team: Michał Jungiewicz, Michał Łopuszyński• Tech team: Łukasz Dumiszewski (tech lead), Aleksander Nowiński, Monika Maksymiuk, Krzysztof Mądry, Łukasz Pawełczak, Jan Pavtel • The funding• Grant of National Centre for Research and Development (PL), within Social Innovations programme • Network analysis team: Michał Bojanowski, Bartosz Chrol Monika Pawluczuk, •
  • 13. Thank you for your attention! Questions? @lopusz http://slideshare.net/lopusz