SlideShare a Scribd company logo
1 of 12
Download to read offline
Applied Data Analysis Lab – a profile 
Dr. Łukasz Bolikowski 
ICM, University of Warsaw 
December 2014
ADA Lab  ICM  UW 
University of Warsaw (UW) is one of the top Polish higher education establishments. 
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) 
is a supercomputing and research data centre within the University of Warsaw. 
Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.
ADA Lab’s Scope of Interest 
Scalable Text and Data Mining Informatics for Open Science 
Legal Text Mining 
Business Data Mining 
Training  Outreach 
Scholarly PDF Mining 
Map of Science 
Persistent IDs 
Data Anonymization
Legal Text Mining 
Building a judgment analysis system for Poland. 
Integrating data from common courts, the 
Supreme Administrative Court, the Supreme 
Court, and the Constitutional Tribunal. 
Planning a larger, European project with similar 
goals (Horizon 2020; currently building consor-tium 
and defining scope).
Business Data Mining 
Leveraging high demand for data science skills. 
For-profit projects with business partners. 
Usually can’t discuss details due to NDAs. 
Our favourite toolset: 
R for data understanding and modelling 
Apache Spark for analysing larger data sets 
D3 for information visualization 
CRISP-DM for managing our projects 
(Cross-Industry Standard Process for Data Mining)
Training and Outreach 
“Web-Scale Data Mining and Processing” 
(Course at Polish Academy of Sciences) 
“Introduction to Text Mining” 
(Course at Warsaw School of Data Analysis organised by ICM) 
Internal trainings on Hadoop, Spark 
Presentations at Big Data conferences 
(Target audience: business partners) 
Workshops and internships for talented youth 
(In collaboration with Polish Children’s Fund)
Scholarly PDF Mining 
Extracting metadata, bibliographic references, and full text 
from scholarly PDFs. Research direction: semantic anno-tation 
of paragraphs, sentences, phrases. 
CERMINE is an open software (AGPL license), with users 
worldwide: OpenAIRE.eu, Paperity.org, Public Knowledge 
Project. 
Interfaces for humans and for machines (RESTful API). 
Try CERMINE at: http://cermine.ceon.pl/
Map of Science 
A comprehensive map of academia. Mining available 
documents and data sets in order to reconstruct the 
graph of relations between: people, documents, insti-tutions, 
topics, funding sources. 
Final result: a publicly available data set. 
Why? Better understanding of science. Cool features 
in digital libraries and research information systems. 
Elements of the map currently developed in OpenAIRE 
and OCEAN projects.
Persistent IDs 
To achieve long-term preservation of research arti-facts, 
we need an identifier minting and management 
scheme that can outlive the organization managing 
the scheme. 
We are developing a distributed scheme based on 
public-key cryptography and P2P networking (a lot 
in common with Bitcoin).
Data Anonymization 
Privacy-preserving research data publication is a 
cross-cutting issue, applies to various types of 
data analysed at ICM: legal judgments, medical 
records, social network activity.
Thank you for your attention. Let’s stay in touch! 
adalab.icm.edu.pl/blog 
twitter.com/adalab_icm 
linkedin.com/in/bolikowski 
twitter.com/bolikowski 
lukasz.bolikowski@icm.edu.pl
License 

c 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the following 
sources were used: 
https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0) 
https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0) 
https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0) 
https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0) 
https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)

More Related Content

What's hot

Hypermedia database on the Web
Hypermedia database on the WebHypermedia database on the Web
Hypermedia database on the WebMelvin Balajadia
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining E2MATRIX
 
Semi-automatic Text MiningNK
Semi-automatic Text MiningNKSemi-automatic Text MiningNK
Semi-automatic Text MiningNKbutest
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1ErhardRahm
 
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
AAAI 2016 - A Visual Semantic Framework For Innovation AnalyticsAAAI 2016 - A Visual Semantic Framework For Innovation Analytics
AAAI 2016 - A Visual Semantic Framework For Innovation AnalyticsKripa (कृपा) Rajshekhar
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Nikolaos Konstantinou
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsSeth Grimes
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining Bhawi247
 
document-part- (6).doc
document-part- (6).docdocument-part- (6).doc
document-part- (6).docmayuramanirudh
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeDan Brickley
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 

What's hot (18)

Hypermedia database on the Web
Hypermedia database on the WebHypermedia database on the Web
Hypermedia database on the Web
 
Intro to DE-DV
Intro to DE-DVIntro to DE-DV
Intro to DE-DV
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining
 
Text mining
Text miningText mining
Text mining
 
Dspace OAI-PMH
Dspace OAI-PMHDspace OAI-PMH
Dspace OAI-PMH
 
Semi-automatic Text MiningNK
Semi-automatic Text MiningNKSemi-automatic Text MiningNK
Semi-automatic Text MiningNK
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
AAAI 2016 - A Visual Semantic Framework For Innovation AnalyticsAAAI 2016 - A Visual Semantic Framework For Innovation Analytics
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
 
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and Semantics
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining
 
document-part- (6).doc
document-part- (6).docdocument-part- (6).doc
document-part- (6).doc
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 

Similar to A profile of Applied Data Analysis Lab (ADA Lab)

Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Peter Löwe
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research CentreMichael Hausenblas
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009pkdoorn
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Research Data Alliance
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...The Research Council of Norway, IKTPLUSS
 
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...J S
 
Intact danish workshop_20171001
Intact danish workshop_20171001Intact danish workshop_20171001
Intact danish workshop_20171001Dirk Pieper
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 

Similar to A profile of Applied Data Analysis Lab (ADA Lab) (20)

186-RISIS
186-RISIS186-RISIS
186-RISIS
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Lod2
Lod2Lod2
Lod2
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research Centre
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
 
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
 
Intact danish workshop_20171001
Intact danish workshop_20171001Intact danish workshop_20171001
Intact danish workshop_20171001
 
DERI Overview March 2009
DERI Overview March 2009DERI Overview March 2009
DERI Overview March 2009
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

A profile of Applied Data Analysis Lab (ADA Lab)

  • 1. Applied Data Analysis Lab – a profile Dr. Łukasz Bolikowski ICM, University of Warsaw December 2014
  • 2. ADA Lab ICM UW University of Warsaw (UW) is one of the top Polish higher education establishments. Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) is a supercomputing and research data centre within the University of Warsaw. Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.
  • 3. ADA Lab’s Scope of Interest Scalable Text and Data Mining Informatics for Open Science Legal Text Mining Business Data Mining Training Outreach Scholarly PDF Mining Map of Science Persistent IDs Data Anonymization
  • 4. Legal Text Mining Building a judgment analysis system for Poland. Integrating data from common courts, the Supreme Administrative Court, the Supreme Court, and the Constitutional Tribunal. Planning a larger, European project with similar goals (Horizon 2020; currently building consor-tium and defining scope).
  • 5. Business Data Mining Leveraging high demand for data science skills. For-profit projects with business partners. Usually can’t discuss details due to NDAs. Our favourite toolset: R for data understanding and modelling Apache Spark for analysing larger data sets D3 for information visualization CRISP-DM for managing our projects (Cross-Industry Standard Process for Data Mining)
  • 6. Training and Outreach “Web-Scale Data Mining and Processing” (Course at Polish Academy of Sciences) “Introduction to Text Mining” (Course at Warsaw School of Data Analysis organised by ICM) Internal trainings on Hadoop, Spark Presentations at Big Data conferences (Target audience: business partners) Workshops and internships for talented youth (In collaboration with Polish Children’s Fund)
  • 7. Scholarly PDF Mining Extracting metadata, bibliographic references, and full text from scholarly PDFs. Research direction: semantic anno-tation of paragraphs, sentences, phrases. CERMINE is an open software (AGPL license), with users worldwide: OpenAIRE.eu, Paperity.org, Public Knowledge Project. Interfaces for humans and for machines (RESTful API). Try CERMINE at: http://cermine.ceon.pl/
  • 8. Map of Science A comprehensive map of academia. Mining available documents and data sets in order to reconstruct the graph of relations between: people, documents, insti-tutions, topics, funding sources. Final result: a publicly available data set. Why? Better understanding of science. Cool features in digital libraries and research information systems. Elements of the map currently developed in OpenAIRE and OCEAN projects.
  • 9. Persistent IDs To achieve long-term preservation of research arti-facts, we need an identifier minting and management scheme that can outlive the organization managing the scheme. We are developing a distributed scheme based on public-key cryptography and P2P networking (a lot in common with Bitcoin).
  • 10. Data Anonymization Privacy-preserving research data publication is a cross-cutting issue, applies to various types of data analysed at ICM: legal judgments, medical records, social network activity.
  • 11. Thank you for your attention. Let’s stay in touch! adalab.icm.edu.pl/blog twitter.com/adalab_icm linkedin.com/in/bolikowski twitter.com/bolikowski lukasz.bolikowski@icm.edu.pl
  • 12. License c 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the following sources were used: https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0) https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0) https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0) https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0) https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)