"OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS" set of slides was prepared for the Guest Lecture, which I has delivered to the students of the University of South-Eastern Norway (USN), October 2021
Comparative analysis of national open data portals or whether your portal is ...Anastasija Nikiforova
This file is a supplementary material for the following article -> Nikiforova, A. (2020). Comparative analysis of national open data portals or whether your portal is ready to bring benefits from open data. In IADIS International Conference on ICT, Society and Human Beings (pp. 21-23).
This paper focuses on the analysis of usability of the national open data portals. Open [government] data are considered as one of the most influenceable tool for preventing and reducing corruption and reaching innovative solutions that create added value for society.Thus, it is important to ensure that they are provided in a form that are useful and suitable for the original purpose of the open data. Critical voices and many discussions on whether open government data and national open data portals are of sufficient quality appear more frequently. Therefore, this study deals with this topic and aims to find the main challenges that can negatively impact users’ experience through an analysis of usability of 42 open data portals by applying a unified methodology on them allowing their comparative analysis to be carried out.This study highlights the weakest aspects for 42 national open data portals, pointing on both, the most common weakest points, and individual. The analysis carried out also identifies portals that can be considered as leaders and as an example for the less successful open data portals.
Analysis of open health data quality using data object-driven approach to dat...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2019). Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context. In IADIS International Conference e-Health (pp. 119-126).
This research focuses on the analysis of the quality of open health data that are freely available and can be used by everyone for their own purposes. The quality of open data is crucial as it can lead to unreliable decision-making and financial losses, however, the quality of open health data has even more critical role.Despite its importance, this topic is rarely discussed.Therefore, the previously proposed data object-driven approach to data quality evaluation is applied to open health data in Latvia in order to (a) evaluate their quality, highlighting common quality issues that should be considered by both, users and data publishers, (b) demonstrate that the used approach is suitable for given purpose as it is simple enough,and ensures the involvement of users even without IT and data quality knowledge (domain experts) in the data quality analysis examining data for their own purposes. The proposed solution seems to be useful in establishing communication between data users and publishers,improving the overall quality of data.
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.
The paper addresses the “timeliness” of data in open government data (OGD) portals. It is one of the primary principles of open data, which is considered to be a success factor, while at the same time it is one of the biggest barriers that can disrupt users trust in data and even the desire to use the entire open data portal. However, assessing this aspect is a very difficult task that, in most cases, becomes an impossible for open data users. There is therefore a lack of comparative studies on the timeliness of data of different national open data portals. Unfortunately, 2020 gave the opportunity to find out this. It became easy enough to compare how long is the data path from the data holder to the OGD portal by analysing the timeliness of Covid-19-related data sets in relation to the first case observed in a country. The study thus fills the gap of comparative studies by addressing 60 countries and their OGD portals concerning the timeliness of the data, providing a report on how much and what countries provide the open data as quickly as possible. It makes it possible to understand how quickly OGD portals react to emergencies by opening and updating data for their further potential reuse, which is essential in the digital data-driven world.
Read paper here -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.https://ieeexplore.ieee.org/abstract/document/9264298?casa_token=FtfC_6bqZnsAAAAA:TaSnKrE7ZCxLyq5hvxX-X8O2sK_vZYcodTBtxoWOvaOAIFmMmy65f5dIK-kKYxFAMiC5jyl7Eeg
Towards enrichment of the open government data: a stakeholder-centered determ...Anastasija Nikiforova
This set of slides is a part of the presentation prepared and delivered in the scope of the 14th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2021), 6-8 October, 2021, Smart Digital Governance for Global Sustainability
It is based on the paper -> Nikiforova, A. (2021, October). Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia. In 14th International Conference on Theory and Practice of Electronic Governance (pp. 367-372) -> https://dl.acm.org/doi/abs/10.1145/3494193.3494243?casa_token=bPeuwmFWwQwAAAAA:ls-xXIPK5uXDHyxtBxqsMJOCuV6ud_ip59BX8n78uJnqvql6e8H9urlDG9zzeNklRmGFwI4sCXU06w
Assessment of the usability of Latvia’s open data portal or how close are we ...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020). Assessment of the usability of Latvia’s open data portal or how close are we to gaining benefits from open data. In In IADIS 14th International Conference on Interfaces and Human Computer Interaction (pp. 51-28).
Nowadays, more and more countries are launching their own open data portals, seeking to provide their citizens with open data in a form that is useful and suitable for the original purpose of the open data, and Latvia is not an exception. Despite the fact that the Latvian open data portal was launched only in 2017, it is considered to be a fast-tracker. However, despite the overall high evaluations, critical voices,and many discussions about whether the Latvia’s open data portal is of sufficient quality to be appeared. Therefore, while previous studies deal with quality of open data, this study focuses on the analysis of the Latvian open data portal and aims to find the key challenges that may have a negative impact on user experience.The paper assesses the current situation and recommends corrective actions,highlighting the aspects to be considered when developing and improving open data portals.
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...Anastasija Nikiforova
This presentation is devoted to the "IoTSE-based Open Database Vulnerability inspection in three Baltic Countries: ShoBEVODSDT sees you" research paper developed by Artjoms Daskevics and Anastasija Nikiforova and presented during the The International conference on Internet of Things, Systems, Management and Security (IOTSMS2021) co-located with The 8th International Conference on Social Networks Analysis, Management and Security (SNAMS2021), December 6-9, 2021, Valencia, Spain (online)
Read paper here -> Daskevics, A., & Nikiforova, A. (2021, December). IoTSE-based open database vulnerability inspection in three Baltic countries: ShoBEVODSDT sees you. In 2021 8th International Conference on Internet of Things: Systems, Management and Security (IOTSMS) (pp. 1-8). IEEE -> https://ieeexplore.ieee.org/abstract/document/9704952?casa_token=NfEjYuud0wEAAAAA:6QxucVPuY762I3qzD6D_oWqa0B9eMUFRNMG-E7dyHKohSYIzI0bH1V9bLaAcly_Lp-Ll52ghO5Y
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
This presentations is a supplementary material for presenting the "Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business" (authored by Anastasija Nikiforova and Natalija Kozmina) research paper during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based)
Read paper here -> Nikiforova, A., & Kozmina, N. (2021, November). Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 66-73). IEEE -> https://ieeexplore.ieee.org/abstract/document/9660802?casa_token=LFJa20LrXAwAAAAA:wVwhTcCPWqxdloAvDQ3-l98KkkLx70xzG3zNvIIkJbC6wvJ4VxwX_VGc3mmW_7c1T-QJlOtTiao
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A., & Bicevskis, J. (2019). An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis. In ICEIS (1) (pp. 274-281).
The research is an extension of a data object-driven approach to data quality evaluation allowing to analyse data object quality in scope of multiple data objects. Previously presented approach was used to analyse one particular data object, mainly focusing on syntactic analysis. It means that the primary data object quality can be analysed against secondary data objects of unlimited number. This opportunity allows making more comprehensive, in-depth contextual data object analysis. The given analysis was applied to open data, making comparison between previously obtained results and results of application of the extended approach, underlying importance and benefits of the given extension.
Comparative analysis of national open data portals or whether your portal is ...Anastasija Nikiforova
This file is a supplementary material for the following article -> Nikiforova, A. (2020). Comparative analysis of national open data portals or whether your portal is ready to bring benefits from open data. In IADIS International Conference on ICT, Society and Human Beings (pp. 21-23).
This paper focuses on the analysis of usability of the national open data portals. Open [government] data are considered as one of the most influenceable tool for preventing and reducing corruption and reaching innovative solutions that create added value for society.Thus, it is important to ensure that they are provided in a form that are useful and suitable for the original purpose of the open data. Critical voices and many discussions on whether open government data and national open data portals are of sufficient quality appear more frequently. Therefore, this study deals with this topic and aims to find the main challenges that can negatively impact users’ experience through an analysis of usability of 42 open data portals by applying a unified methodology on them allowing their comparative analysis to be carried out.This study highlights the weakest aspects for 42 national open data portals, pointing on both, the most common weakest points, and individual. The analysis carried out also identifies portals that can be considered as leaders and as an example for the less successful open data portals.
Analysis of open health data quality using data object-driven approach to dat...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2019). Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context. In IADIS International Conference e-Health (pp. 119-126).
This research focuses on the analysis of the quality of open health data that are freely available and can be used by everyone for their own purposes. The quality of open data is crucial as it can lead to unreliable decision-making and financial losses, however, the quality of open health data has even more critical role.Despite its importance, this topic is rarely discussed.Therefore, the previously proposed data object-driven approach to data quality evaluation is applied to open health data in Latvia in order to (a) evaluate their quality, highlighting common quality issues that should be considered by both, users and data publishers, (b) demonstrate that the used approach is suitable for given purpose as it is simple enough,and ensures the involvement of users even without IT and data quality knowledge (domain experts) in the data quality analysis examining data for their own purposes. The proposed solution seems to be useful in establishing communication between data users and publishers,improving the overall quality of data.
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.
The paper addresses the “timeliness” of data in open government data (OGD) portals. It is one of the primary principles of open data, which is considered to be a success factor, while at the same time it is one of the biggest barriers that can disrupt users trust in data and even the desire to use the entire open data portal. However, assessing this aspect is a very difficult task that, in most cases, becomes an impossible for open data users. There is therefore a lack of comparative studies on the timeliness of data of different national open data portals. Unfortunately, 2020 gave the opportunity to find out this. It became easy enough to compare how long is the data path from the data holder to the OGD portal by analysing the timeliness of Covid-19-related data sets in relation to the first case observed in a country. The study thus fills the gap of comparative studies by addressing 60 countries and their OGD portals concerning the timeliness of the data, providing a report on how much and what countries provide the open data as quickly as possible. It makes it possible to understand how quickly OGD portals react to emergencies by opening and updating data for their further potential reuse, which is essential in the digital data-driven world.
Read paper here -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.https://ieeexplore.ieee.org/abstract/document/9264298?casa_token=FtfC_6bqZnsAAAAA:TaSnKrE7ZCxLyq5hvxX-X8O2sK_vZYcodTBtxoWOvaOAIFmMmy65f5dIK-kKYxFAMiC5jyl7Eeg
Towards enrichment of the open government data: a stakeholder-centered determ...Anastasija Nikiforova
This set of slides is a part of the presentation prepared and delivered in the scope of the 14th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2021), 6-8 October, 2021, Smart Digital Governance for Global Sustainability
It is based on the paper -> Nikiforova, A. (2021, October). Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia. In 14th International Conference on Theory and Practice of Electronic Governance (pp. 367-372) -> https://dl.acm.org/doi/abs/10.1145/3494193.3494243?casa_token=bPeuwmFWwQwAAAAA:ls-xXIPK5uXDHyxtBxqsMJOCuV6ud_ip59BX8n78uJnqvql6e8H9urlDG9zzeNklRmGFwI4sCXU06w
Assessment of the usability of Latvia’s open data portal or how close are we ...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020). Assessment of the usability of Latvia’s open data portal or how close are we to gaining benefits from open data. In In IADIS 14th International Conference on Interfaces and Human Computer Interaction (pp. 51-28).
Nowadays, more and more countries are launching their own open data portals, seeking to provide their citizens with open data in a form that is useful and suitable for the original purpose of the open data, and Latvia is not an exception. Despite the fact that the Latvian open data portal was launched only in 2017, it is considered to be a fast-tracker. However, despite the overall high evaluations, critical voices,and many discussions about whether the Latvia’s open data portal is of sufficient quality to be appeared. Therefore, while previous studies deal with quality of open data, this study focuses on the analysis of the Latvian open data portal and aims to find the key challenges that may have a negative impact on user experience.The paper assesses the current situation and recommends corrective actions,highlighting the aspects to be considered when developing and improving open data portals.
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...Anastasija Nikiforova
This presentation is devoted to the "IoTSE-based Open Database Vulnerability inspection in three Baltic Countries: ShoBEVODSDT sees you" research paper developed by Artjoms Daskevics and Anastasija Nikiforova and presented during the The International conference on Internet of Things, Systems, Management and Security (IOTSMS2021) co-located with The 8th International Conference on Social Networks Analysis, Management and Security (SNAMS2021), December 6-9, 2021, Valencia, Spain (online)
Read paper here -> Daskevics, A., & Nikiforova, A. (2021, December). IoTSE-based open database vulnerability inspection in three Baltic countries: ShoBEVODSDT sees you. In 2021 8th International Conference on Internet of Things: Systems, Management and Security (IOTSMS) (pp. 1-8). IEEE -> https://ieeexplore.ieee.org/abstract/document/9704952?casa_token=NfEjYuud0wEAAAAA:6QxucVPuY762I3qzD6D_oWqa0B9eMUFRNMG-E7dyHKohSYIzI0bH1V9bLaAcly_Lp-Ll52ghO5Y
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
This presentations is a supplementary material for presenting the "Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business" (authored by Anastasija Nikiforova and Natalija Kozmina) research paper during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based)
Read paper here -> Nikiforova, A., & Kozmina, N. (2021, November). Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 66-73). IEEE -> https://ieeexplore.ieee.org/abstract/document/9660802?casa_token=LFJa20LrXAwAAAAA:wVwhTcCPWqxdloAvDQ3-l98KkkLx70xzG3zNvIIkJbC6wvJ4VxwX_VGc3mmW_7c1T-QJlOtTiao
AN EXTENDED DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: CONTEXTUA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A., & Bicevskis, J. (2019). An Extended Data Object-driven Approach to Data Quality Evaluation: Contextual Data Quality Analysis. In ICEIS (1) (pp. 274-281).
The research is an extension of a data object-driven approach to data quality evaluation allowing to analyse data object quality in scope of multiple data objects. Previously presented approach was used to analyse one particular data object, mainly focusing on syntactic analysis. It means that the primary data object quality can be analysed against secondary data objects of unlimited number. This opportunity allows making more comprehensive, in-depth contextual data object analysis. The given analysis was applied to open data, making comparison between previously obtained results and results of application of the extended approach, underlying importance and benefits of the given extension.
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
This presentation is prepared as a part of my talk on the openness (open data and open science) in the context of Society 5.0 during the International Conference and Expo on Nanotechnology and Nanomaterials. It was very pleasant to receive an invitation to deliver the talk on my recently published article Smarter Open Government Data for Society 5.0: Are Your Open Data Smart Enough? (Sensors 2021, 21(15), 5204), which I have entitled as “Open Data as a driver of Society 5.0: how you and your scientific outputs can contribute to the development of the Super Smart Society and transformation into Smart Living?“. The paper has been briefly discussed in my previous post, thus, just a few words on this talk and overall experience.
Slides of the presentation by Michael Martin (ULEI, INFAI) and Martin Kaltenböck (Semantic Web Company) at the OKCon2011 in Berlin on 30th of June 2011: The LOD2 Open Government Data Stakeholder Survey
The document discusses the NIC's initiative to develop semantic web capabilities to improve access to health innovation information. It outlines the NIC's progress in building an ontology, populating a triplestore with curated data, and developing demonstration applications. The next steps include further developing applications and widgets, deploying an NLP crawler, and semantically tagging sections of the NIC website.
The linked open government data and metadata lifecycleOpen Data Support
This document discusses the lifecycle of linked open government data and metadata. It begins by examining existing data and metadata lifecycles, noting that they primarily focus on the supply side. It then presents a hybrid lifecycle model that includes both supply and demand sides. The supply side covers the selection, modeling, publishing and linking of data and metadata by governments. The demand side involves finding, integrating, reusing and providing feedback on open data by consumers. The document also provides best practices for publishing data and metadata at various stages of the lifecycle.
Data privacy and security in ICT4D - Meeting Report UN Global Pulse
On May 8th, 2015 UN Global Pulse hosted a workshop on data privacy and security in technology-enabled development projects and programmes, as part of a series of events about the Nine Principles for Digital Development. This report summarizes the presentations and discussions from the workshop. http://unglobalpulse.org/blog/improving-privacy-and-data-security-ict4d-projects
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)UN Global Pulse
The Data Innovation Risk Assessment Tool is an initial assessment of potential risks for data use that includes seven guiding checkpoints to understand: the "Data Type" involved in the data analytics process, the "Risks and Harms" of data use, the mode and legitimacy of "Data Access", the "Data Use", the adequacy of "Data Security", the adequate level of "Communication and Transparency" and the due diligence on engagement of "Third Parties". The Assessment contains guiding comments for each checkpoint and its questions are grounded in the key international data privacy and data protection principles and concepts such as Purpose Specification, Purpose Compatibility, Data Minimization, Consent Legitimacy, Lawfulness and Fairness of data access and use.
This document discusses augmenting open government data with social media data. It presents a research agenda to integrate open government data and social media data. The agenda involves understanding both data sources, integrating them based on common elements, and using a proof of concept involving UK election data from 2010 to demonstrate objective and subjective views as well as an integrated view. The goal is to provide infrastructure to better exploit the potential of open data and social media data.
Open data barometer global report - 2nd edition yann le gigan
This document provides an introduction and overview of the Open Data Barometer report. The report analyzes global trends in open data by assessing countries' readiness, implementation, and impact of open data initiatives. It finds that while open data initiatives have spread rapidly, more work is needed to support data-enabled democracy worldwide and ensure data access, skills, and freedoms are distributed equitably. The report evaluates 86 countries across different clusters and provides recommendations for tailoring open data strategies based on countries' varying capacities and needs. It aims to contribute to understanding challenges and opportunities in realizing open data's potential to increase transparency, empower citizens, and inspire innovation.
1. The document discusses open data in Canada and argues that open civic data and information are important for informed citizen participation in decision-making and generating innovative solutions.
2. It provides examples of open data projects in various Canadian cities like child poverty mapping and Inuit land use atlases.
3. Key principles for opening government data are outlined, and challenges are noted around issues like interoperability, licensing, and cultural change needed for more open data.
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
Keynote presentation of Martin Kaltenböck (LOD2 project, Semantic Web Company) at the Government Linked Data Workshop in the course of the OGD Camp 2011 in Warsaw, Poland: Putting the L in front: from Open Data to Linked Open Data
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
talk for paper published at ICWE2019:
Primo F, Missier P, Romanovsky A, Mickael F, Cacho N. A customisable pipeline for continuously harvesting socially-minded Twitter users. In: Procs. ICWE’19. Daedjeon, Korea; 2019.
This document discusses the current state and issues surrounding public sector data in Korea. It defines open government data as data produced by government entities that can be freely used, reused and redistributed. While Korea has built various data through e-government initiatives, most data is not truly "open" as it cannot be freely reused or used for commercial purposes without restrictions. The document also outlines some international and domestic open data portals that have been established to increase access to government data.
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
With its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services, which is built on Semantic Web technologies, be well positioned to support automated scientific discovery on a global scale.
A gigantic archive of terabytes of information is created every day from current data frameworks and computerized advances, for example, Internet of Things and distributed computing. Examination of these gigantic information requires a ton of endeavors at various levels to extricate information for dynamic. Hence, huge information examination is an ebb and flow region of innovative work. The essential goal of this paper is to investigate the likely effect of huge information challenges, and different instruments related with it. Accordingly, this article gives a stage to investigate enormous information at various stages. Moreover, it opens another skyline for analysts to build up the arrangement, in light of the difficulties and open exploration issues.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
In this digital era, social media is an important tool for information dissemination. Twitter is a popular social media platform. Social media analytics helps make informed decisions based on people's needs and opinions. This information, when properly perceived provides valuable insights into different domains, such as public policymaking, marketing, sales, and healthcare. Topic modeling is an unsupervised algorithm to discover a hidden pattern in text documents. In this study, we explore the Latent Dirichlet Allocation (LDA) topic model algorithm. We collected tweets with hashtags related to corona virus related discussions. This study compares regular LDA and LDA based on collapsed Gibbs sampling (LDAMallet) algorithms. The experiments use different data processing steps including trigrams, without trigrams, hashtags, and without hashtags. This study provides a comprehensive analysis of LDA for short text messages using un-pooled and pooled tweets. The results suggest that a pooling scheme using hashtags helps improve the topic inference results with a better coherence score.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of
research, industry, and media attention of late. There is an urgent need for a new generation of
computational theories and tools to assist researchers in extracting useful information from the
rapidly growing volumes of digital data.
Big data characteristics, value chain and challengesMusfiqur Rahman
Abstract—Recently the world is experiencing an deluge of
data from different domains such as telecom, healthcare
and supply chain systems. This growth of data has led to
an explosion, coining the term Big Data. In addition to the
growth in volume, Big Data also exhibits other unique
characteristics, such as velocity and variety. This large
volume, rapidly increasing and verities of data is becoming
the key basis of completion, underpinning new waves of
productivity growth, innovation and customer surplus. Big
Data is about to offer tremendous insight to the
organizations, but the traditional data analysis
architecture is not capable to handle Big Data. Therefore,
it calls for a sophisticated value chain and proper analytics
to unearth the opportunity it holds. This research
identifies the characteristics of Big Data and presents a
sophisticated Big Data value chain as finding of this
research. It also describes the typical challenges of Big
Data, which are required to be solved. As a part of this
research twenty experts from different industries and
academies of Finland were interviewed.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )albert ca
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
Slides for the talk delivered as part of EGOV-CeDEM-ePart 2023 (EGOV2023) conference, aimed at examining how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks, which was done by conducting a Systematic Literature Review.
Read the paper here -> https://link.springer.com/chapter/10.1007/978-3-031-41138-0_14
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
This presentation is prepared as a part of my talk on the openness (open data and open science) in the context of Society 5.0 during the International Conference and Expo on Nanotechnology and Nanomaterials. It was very pleasant to receive an invitation to deliver the talk on my recently published article Smarter Open Government Data for Society 5.0: Are Your Open Data Smart Enough? (Sensors 2021, 21(15), 5204), which I have entitled as “Open Data as a driver of Society 5.0: how you and your scientific outputs can contribute to the development of the Super Smart Society and transformation into Smart Living?“. The paper has been briefly discussed in my previous post, thus, just a few words on this talk and overall experience.
Slides of the presentation by Michael Martin (ULEI, INFAI) and Martin Kaltenböck (Semantic Web Company) at the OKCon2011 in Berlin on 30th of June 2011: The LOD2 Open Government Data Stakeholder Survey
The document discusses the NIC's initiative to develop semantic web capabilities to improve access to health innovation information. It outlines the NIC's progress in building an ontology, populating a triplestore with curated data, and developing demonstration applications. The next steps include further developing applications and widgets, deploying an NLP crawler, and semantically tagging sections of the NIC website.
The linked open government data and metadata lifecycleOpen Data Support
This document discusses the lifecycle of linked open government data and metadata. It begins by examining existing data and metadata lifecycles, noting that they primarily focus on the supply side. It then presents a hybrid lifecycle model that includes both supply and demand sides. The supply side covers the selection, modeling, publishing and linking of data and metadata by governments. The demand side involves finding, integrating, reusing and providing feedback on open data by consumers. The document also provides best practices for publishing data and metadata at various stages of the lifecycle.
Data privacy and security in ICT4D - Meeting Report UN Global Pulse
On May 8th, 2015 UN Global Pulse hosted a workshop on data privacy and security in technology-enabled development projects and programmes, as part of a series of events about the Nine Principles for Digital Development. This report summarizes the presentations and discussions from the workshop. http://unglobalpulse.org/blog/improving-privacy-and-data-security-ict4d-projects
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)UN Global Pulse
The Data Innovation Risk Assessment Tool is an initial assessment of potential risks for data use that includes seven guiding checkpoints to understand: the "Data Type" involved in the data analytics process, the "Risks and Harms" of data use, the mode and legitimacy of "Data Access", the "Data Use", the adequacy of "Data Security", the adequate level of "Communication and Transparency" and the due diligence on engagement of "Third Parties". The Assessment contains guiding comments for each checkpoint and its questions are grounded in the key international data privacy and data protection principles and concepts such as Purpose Specification, Purpose Compatibility, Data Minimization, Consent Legitimacy, Lawfulness and Fairness of data access and use.
This document discusses augmenting open government data with social media data. It presents a research agenda to integrate open government data and social media data. The agenda involves understanding both data sources, integrating them based on common elements, and using a proof of concept involving UK election data from 2010 to demonstrate objective and subjective views as well as an integrated view. The goal is to provide infrastructure to better exploit the potential of open data and social media data.
Open data barometer global report - 2nd edition yann le gigan
This document provides an introduction and overview of the Open Data Barometer report. The report analyzes global trends in open data by assessing countries' readiness, implementation, and impact of open data initiatives. It finds that while open data initiatives have spread rapidly, more work is needed to support data-enabled democracy worldwide and ensure data access, skills, and freedoms are distributed equitably. The report evaluates 86 countries across different clusters and provides recommendations for tailoring open data strategies based on countries' varying capacities and needs. It aims to contribute to understanding challenges and opportunities in realizing open data's potential to increase transparency, empower citizens, and inspire innovation.
1. The document discusses open data in Canada and argues that open civic data and information are important for informed citizen participation in decision-making and generating innovative solutions.
2. It provides examples of open data projects in various Canadian cities like child poverty mapping and Inuit land use atlases.
3. Key principles for opening government data are outlined, and challenges are noted around issues like interoperability, licensing, and cultural change needed for more open data.
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
Keynote presentation of Martin Kaltenböck (LOD2 project, Semantic Web Company) at the Government Linked Data Workshop in the course of the OGD Camp 2011 in Warsaw, Poland: Putting the L in front: from Open Data to Linked Open Data
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
talk for paper published at ICWE2019:
Primo F, Missier P, Romanovsky A, Mickael F, Cacho N. A customisable pipeline for continuously harvesting socially-minded Twitter users. In: Procs. ICWE’19. Daedjeon, Korea; 2019.
This document discusses the current state and issues surrounding public sector data in Korea. It defines open government data as data produced by government entities that can be freely used, reused and redistributed. While Korea has built various data through e-government initiatives, most data is not truly "open" as it cannot be freely reused or used for commercial purposes without restrictions. The document also outlines some international and domestic open data portals that have been established to increase access to government data.
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
With its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services, which is built on Semantic Web technologies, be well positioned to support automated scientific discovery on a global scale.
A gigantic archive of terabytes of information is created every day from current data frameworks and computerized advances, for example, Internet of Things and distributed computing. Examination of these gigantic information requires a ton of endeavors at various levels to extricate information for dynamic. Hence, huge information examination is an ebb and flow region of innovative work. The essential goal of this paper is to investigate the likely effect of huge information challenges, and different instruments related with it. Accordingly, this article gives a stage to investigate enormous information at various stages. Moreover, it opens another skyline for analysts to build up the arrangement, in light of the difficulties and open exploration issues.
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
Multiple Regression Analysis and Covid-19 policy is the contemporary agenda. It demonstrates how to use Python to do data wrangler, to use R to do statistical analysis, and is enable to publish in standard academic journal. The model will explain whether lockdown policy is relevant to control Covid-19 outbreak? It cinc
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
In this digital era, social media is an important tool for information dissemination. Twitter is a popular social media platform. Social media analytics helps make informed decisions based on people's needs and opinions. This information, when properly perceived provides valuable insights into different domains, such as public policymaking, marketing, sales, and healthcare. Topic modeling is an unsupervised algorithm to discover a hidden pattern in text documents. In this study, we explore the Latent Dirichlet Allocation (LDA) topic model algorithm. We collected tweets with hashtags related to corona virus related discussions. This study compares regular LDA and LDA based on collapsed Gibbs sampling (LDAMallet) algorithms. The experiments use different data processing steps including trigrams, without trigrams, hashtags, and without hashtags. This study provides a comprehensive analysis of LDA for short text messages using un-pooled and pooled tweets. The results suggest that a pooling scheme using hashtags helps improve the topic inference results with a better coherence score.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of
research, industry, and media attention of late. There is an urgent need for a new generation of
computational theories and tools to assist researchers in extracting useful information from the
rapidly growing volumes of digital data.
Big data characteristics, value chain and challengesMusfiqur Rahman
Abstract—Recently the world is experiencing an deluge of
data from different domains such as telecom, healthcare
and supply chain systems. This growth of data has led to
an explosion, coining the term Big Data. In addition to the
growth in volume, Big Data also exhibits other unique
characteristics, such as velocity and variety. This large
volume, rapidly increasing and verities of data is becoming
the key basis of completion, underpinning new waves of
productivity growth, innovation and customer surplus. Big
Data is about to offer tremendous insight to the
organizations, but the traditional data analysis
architecture is not capable to handle Big Data. Therefore,
it calls for a sophisticated value chain and proper analytics
to unearth the opportunity it holds. This research
identifies the characteristics of Big Data and presents a
sophisticated Big Data value chain as finding of this
research. It also describes the typical challenges of Big
Data, which are required to be solved. As a part of this
research twenty experts from different industries and
academies of Finland were interviewed.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )albert ca
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
Slides for the talk delivered as part of EGOV-CeDEM-ePart 2023 (EGOV2023) conference, aimed at examining how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks, which was done by conducting a Systematic Literature Review.
Read the paper here -> https://link.springer.com/chapter/10.1007/978-3-031-41138-0_14
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and
media attention of late. There is an urgent need for a new generation of computational theories and tools to assist
researchers in extracting useful information from the rapidly growing volumes of digital data.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
.
This Journal provides a forum for researchers who address this issue and to present their work in a peer-reviewed open access forum. Authors are solicited to contribute to the Journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only.
Topics of interest include, but are not limited to the following
Data mining foundations
Parallel and Distributed Data Mining Algorithms, Data Streams Mining, Graph Mining, Spatial Data Mining, Text video, Multimedia Data Mining, Web Mining,Pre-Processing Techniques, Visualization, Security and Information Hiding in Data Mining
Data mining Applications
Databases, Bioinformatics, Biometrics, Image Analysis, Financial Modeling, Forecasting, Classification, Clustering, Social Networks, Educational Data Mining
Knowledge Processing
Data and Knowledge Representation, Knowledge Discovery Framework and Process, Including Pre- and Post-Processing, Integration of Data Warehousing, OLAP and Data Mining, Integrating Constraints and Knowledge in the KDD Process , Exploring Data Analysis, Inference of Causes, Prediction, Evaluating, Consolidating and Explaining Discovered Knowledge, Statistical Techniques for Generation a Robust, Consistent Data Model, Interactive Data Exploration/Visualization and Discovery, Languages and Interfaces for Data Mining, Mining Trends, Opportunities and Risks, Mining from Low-Quality Information Sources
Important Dates
Submission Deadline : August 23, 2020
Notification : September 23, 2020
Final Manuscript Due : October 01, 2020
Publication Date : Determined by the Editor-in-Chief
Call for Papers
Scope & Topics
Ethics
Archives
Most Cited Articles
Download leaflet
FAQ
ijdkp 10th year logo
Top 10 Cited Papers
From
2011 Volumes
10
Issues
54 Articles
227
Conferences
DMML 2020 - India
DTMN 2020 - Sydney
DMS 2020 - India
DBDM 2020 - Dubai
DMDBS 2020 - India
Courtesy
Smiley face
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
A Comprehensive Overview of Advance Techniques, Applications and Challenges i...IRJTAE
— The field of data science uses scientific methods, algorithms, processes, and systems to extract
insights and knowledge from structured and unstructured data. It combines principles from mathematics,
statistics, computer science, and domain expertise to analyse, interpret, and present data in meaningful ways. Its
primary aim is to uncover patterns, trends, and correlations across various domains to aid in making informed
decisions, predictions, and optimizations. Data science encompasses data collection, cleaning, analysis,
interpretation, and communication of findings. Techniques such as machine learning, statistical analysis, data
mining, and data visualization are commonly employed to derive valuable insights and solve complex problems.
Data scientists use programming languages and tools to manage large volumes of data, transforming raw
information into actionable intelligence, driving innovation, and enabling evidence-based decision-making in
businesses, research, and various other applications. This review seeks to provide a valuable resource for
researchers, practitioners, and enthusiasts who wish to gain in-depth knowledge and understanding of data
science and its implications for the ever-evolving data-driven world.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
This Journal provides a forum for researchers who address this issue and to present their work in a peer-reviewed open access forum. Authors are solicited to contribute to the Journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to these topics only.
Call for Papers - International Journal of Data Mining & Knowledge Management...IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount ofresearch, industry, and media attention of late. There is an urgent need for a new generation ofcomputational theories and tools to assist researchers in extracting useful information from therapidly growing volumes of digital data
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
International Journal of Data Mining & Knowledge Management Process(IJDKP)albert ca
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )albert ca
This document calls for papers for the International Journal of Data Mining & Knowledge Management Process. It discusses how data mining and knowledge discovery have grown in importance with large amounts of digital data. The journal provides a forum for peer-reviewed research in data mining foundations, applications, and knowledge processing. Authors are invited to submit original, unpublished papers by February 27, 2021 related to topics such as data streams mining, spatial data mining, bioinformatics, social networks, and data representation.
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
Data Mining & Knowledge Management Process (IJDKP)IJDKP
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. There is an urgent need for a new generation of computational theories and tools to assist researchers in extracting useful information from the rapidly growing volumes of digital data.
This document discusses challenges and outlooks related to big data. It begins with an introduction describing how big data is being collected and analyzed in various fields such as science, education, healthcare, urban planning, and more. It then outlines the key phases in big data analysis: data acquisition and recording, information extraction and cleaning, data integration and representation, query processing and analysis, and result interpretation. For each phase, it discusses challenges and how existing techniques can be applied or extended to address big data issues. Some of the major challenges discussed are data scale, heterogeneity, lack of structure, privacy, timeliness, provenance, and visualization across the entire big data analysis pipeline.
Big data promises to make the world more intelligent by identifying patterns and allowing decisions to be made based on large datasets rather than limited expertise. However, individual privacy must be protected and public interests considered. Economically, big data allows high-velocity analysis of large, diverse datasets to extract value. This changes business models and relationships as firms optimize operations and networks using data. Policymakers must ensure big data is used responsibly while creating opportunities for stakeholders.
Ähnlich wie OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS (20)
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
“Data is the new oil” is only partly true, since according to Forbes, data is more than oil, while according to Ataccama, “Manual Data Quality Doesn’t Cut It in 2023” – this was the main driver behind of my guest lecture entitled “Data Quality for AI or AI for Data quality: advances in Data Quality Management for the success and sustainability of emerging technologies, business and society”, as part of which we discussed what is the role of artificial intelligence in data quality management and what is the role of data quality for AI, concluding that it is not about “data quality for AI” OR “AI for data quality” but rather about AND.
We also looked at what is the current market offer regarding AI-driven data quality management, what are the pros and cons of these solutions and what are the prerequisites that we have to take into account when using them (e.g., metadata and their quality for those, which derive DQ rules based on metadata analysis), and how possibly more promising solution could be built.
We also looked at what are those data quality specificities we should consider depending on the artifact – a data object (dataset), whose owner is known / is unknown (open data), Information System, Data Warehouse, Data Lake, Data Lakehouse, Data Mesh – where, when and how DQ takes place in them? What are the current trends? And are these indeed trends or rather hype?
Public data ecosystems in and for smart cities: how to make open / Big / smar...Anastasija Nikiforova
This is a set of slides used as part of my keynote "Public data ecosystems in and for smart cities: how to make open / Big / smart / geo data ecosystems value-adding for SDG-compliant Smart Living and Society 5.0" delivered at the 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023) -> https://carmaconf2023.wordpress.com/keynote-speakers/. read more here -> https://anastasijanikiforova.com/2023/06/30/keynote-at-the-5th-international-conference-on-advanced-research-methods-and-analytics-carma-2023/
Artificial Intelligence for open data or open data for artificial intelligence?Anastasija Nikiforova
This is a presentation used to deliver an invited talk for Babu Banarasi Das University (BBDU, Department of Computer Science and Engineering) Development Program «Artificial Intelligence for Sustainable Development» organized by AI Research Centre, Department of Computer Science & Engineering, ShodhGuru Research Labs, Soft Computing Research Society, IEEE UP Section, Computational Intelligence Society Chapter in 2022. Read more here -> https://anastasijanikiforova.com/2022/09/24/ai-for-open-data-or-open-data-for-ai-an-invited-talk-for-bbdu-development-program-artificial-intelligence-for-sustainable-development%f0%9f%8e%a4/
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
This presentation is a supplementary material for the article "Overlooked aspects of data governance: workflow framework for enterprise data deduplication" (Azeroual, Nikiforova, Shei) presented at The International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023).
Abstract of the paper: Data quality in companies is decisive and critical to the benefits their products and services can provide. However, in heterogeneous IT infrastructures where, e.g., different applications for Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), product management, manufacturing, and marketing are used, duplicates, e.g., multiple entries for the same customer or product in a database or information system, occur. There can be several reasons for this, but the result of non-unique or duplicate records is a degraded data quality. This ultimately leads to poorer, inefficient, and inaccurate data-driven decisions. For this reason, in this paper, we develop a conceptual data governance framework for effective and efficient management of duplicate data, and improvement of data accuracy and consistency in large data ecosystems. We present methods and recommendations for companies to deal with duplicate data in a meaningful way.
Data Quality as a prerequisite for you business success: when should I start ...Anastasija Nikiforova
These are slides for my talk "Data Quality as a prerequisite for you business success: when should I start taking care of it?" I delivered as an invited keynote for HackCodeX Forum that gathered international experts to share their experience and knowledge on the emerging technologies and areas such as Artificial Intelligence, Security, Data Quality, Quantum Computing, Sustainability, Open Data, Privacy etc.
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
This presentation is a supplementary material for the article "Framework for understanding quantum computing use cases from a multidisciplinary perspective and future research directions" (Ukpabi, D.C., Karjaluoto, H., Botticher, A., Nikiforova, A., Petrescu, D.I., Schindler, P., Valtenbergs, V., Lehmann, L., & Yakaryılmaz, A) available at https://arxiv.org/ftp/arxiv/papers/2212/2212.13909.pdf. THe presentation, however, was delivered for QWorld Quantum Science Days 2023 | May 29-31.
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
This presentation was delivered as part of the Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence“ organized by the Institute of Computer Science (University of Tartu) in cooperation with Swedbank.
In this presentation I talked about:
*“Data warehouse vs. data lake – what are they and what is the difference between them?” (structured vs unstructured, static vs dynamic (real-time data), schema-on-write vs schema on-read, ETL vs ELT) with further elaboration on What are their goals and purposes? What is their target audience? What are their pros and cons?
*“Is the Data warehouse the only data repository suitable for BI?” – no, (today) data lakes can also be suitable. And even more, both are considered the key to “a single version of the truth”. Although, if descriptive BI is the only purpose, it might still be better to stay within data warehouse. But, if you want to either have predictive BI or use your data for ML (or do not have a specific idea on how you want to use the data, but want to be able to explore your data effectively and efficiently), you know that a data warehouse might not be the best option.
*“So, the data lake will save my resources a lot, because I do not have to worry about how to store /allocate the data – just put it in one storage and voila?!” – no, in this case your data lake will turn into a data swamp! And you are forgetting about the data quality you should (must!) be thinking of!
*“But how do you prevent the data lake from becoming a data swamp?” – in short and simple terms – proper data governance & metadata management is the answer (but not as easy as it sounds – do not forget about your data engineer and be friendly with him [always… literally always :D) and also think about the culture in your organization.
*“So, the use of a data warehouse is the key to high quality data?” – no, it is not! Having ETL do not guarantee the quality of your data (transform&load is not data quality management). Think about data quality regardless of the repository!
*“Are data warehouses and data lakes the only options to consider or are we missing something?“– true! Data lakehouse!
*“If a data lakehouse is a combination of benefits of a data warehouse and data lake, is it a silver bullet?“– no, it is not! This is another option (relatively immature) to consider that may be the best bit for you, but not a panacea. Dealing with data is not easy (still)…
In addition, in this talk I also briefly introduced the ongoing research into the integration of the data lake as a data repository and data wrangling seeking for an increased data quality in IS. In short, this is somewhat like an improved data lakehouse, where we emphasize the need of data governance and data wrangling to be integrated to really get the benefits that the data lakehouses promise (although we still call it a data lake, since a data lakehouse is nut sufficiently mature concept with different definitions of it).
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Anastasija Nikiforova
This presentation is a supplementary material for "Putting FAIR Principles in the Context of Research Information: FAIRness for CRIS and CRIS for FAIRness" (Otmane Azeroual, Joachim Schopfel, Janne Polonen, and Anastasija Nikiforova) paper presented at 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) conference, and also received the Best Paper Award. In this presentation we raise a discussion on this topic showing that the improvement of FAIRness is a dual or bidirectional process, where CRIS promotes and contributes to the FAIRness of data and infrastructures, and FAIR principles push for further improvement in the underlying CRIS data model and format, positively affecting the sustainability of these systems and underlying artifacts. CRIS are beneficial for FAIR, and FAIR is beneficial for CRIS.
See the text here -> https://www.scitepress.org/Link.aspx?doi=10.5220/0011548700003335
Cite as -> Azeroual, O.; Schöpfel, J.; Pölönen, J. and Nikiforova, A. (2022). Putting FAIR Principles in the Context of Research Information: FAIRness for CRIS and CRIS for FAIRness. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS, ISBN 978-989-758-614-9; ISSN 2184-3228, pages 63-71. DOI: 10.5220/0011548700003335
Open data hackathon as a tool for increased engagement of Generation Z: to h...Anastasija Nikiforova
This is presentation for the paper "Open data hackathon as a tool for increased engagement of Generation Z: to hack or not to hack?" presented at EGETC2022.
A hackathon is known as a form of civic innovation in which participants representing citizens can point out existing problems or social needs and propose a solution. Given the high social, technical, and economic potential of open government data (OGD), the concept of open data hackathons is becoming popular around the world. This concept has become popular in Latvia with the annual hackathons organised for a specific cluster of citizens – Generation Z. This study presents the latest findings on the role of open data hackathons and the benefits that they can bring to both the society, participants, and government. First, a systematic literature review is carried out to establish a knowledge base. Then, empirical research of 4 case studies of open data hackathons for Generation Z participants held between 2018 and 2021 in Latvia is conducted to understand which ideas dominated and what were the main results of these events for the OGD initiative. It demonstrates that, despite the widespread belief that young people are indifferent to current
societal and natural problems, the ideas developed correspond to current situation and are aimed at solving them, revealing aspects for improvement in both the
provision of data, infrastructure, culture, and government- related areas.
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Anastasija Nikiforova
This is the presentation for our ongoing study "Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Innovation Resistance Theory" (Anastasija Nikiforova, Anneke Zuiderwijk) presented at ICEGOV2022 conference – 15th International Conference on Theory and Practice of Electronic Governance (nominated to the Best Paper Awards).
In short, the study aims to develop an Open Government Data-adapted Innovation Resistance Theory model to empirically identify predictors affecting public agencies’ resistance to openly sharing government data. Here we want to understand:
💡what are functional and behavioural factors that facilitate or hamper opening government data by public organizations?
💡does IRT provide a new and more complete insight compared to more traditional UTAUT and TAM? – IRT has not been applied in this domain, yet, so we are checking whether it should be considered, or rather those models we are familiar so much are the best ones?
💡and additionally – does the COVID-19 pandemic had an [obvious/significant] effect on the public agencies in terms of their readiness or resistance to openly share government data?
Based on a review of the literature on both IRT research and barriers associated with open data sharing by public agencies, we developed an initial version of the model. Once the model is refined in a qualitative study (interviews with public agencies), we will validate it to study the resistance of public authorities to openly sharing government data in a quantitative study.
Read the paper and cite as -> Nikiforova A., Zuiderwijk A. (2022) Barriers to openly sharing government data: towards an open data-adapted innovation resistance theory, In 15th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2022). Association for Computing Machinery, New York, NY, USA, 215–220, https://doi.org/10.1145/3560107.3560143 – best paper award nominee
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
This presentation is a supplementary material for the "Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS" presented at 15th International Conference on Current Research Information Systems (CRIS2022) - Linking Research Information across data spaces. It provides an insight on the ongoing study of combining data lake as a data repository and data wrangling seeking for an increased data quality in CRIS systems, although the proposed approach is domain-agnostic and can be used not only within CRIS.
Read the article here -> Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova, A. (2022, May). Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS. In CRIS2022: 15th International Conference on Current Research Information Systems --> https://hal.archives-ouvertes.fr/hal-03694519/
The role of open data in the development of sustainable smart cities and smar...Anastasija Nikiforova
This presentation is a supplementary material for the guest lecture "The role of open data in the development of sustainable smart cities and smart society" I delivered for the Federal University of Technology – Paraná (Universidade Tecnológica Federal do Paraná (UTFPR)) (Brazil, May 2022).
Data security as a top priority in the digital world: preserve data value by ...Anastasija Nikiforova
Today, in the age of information and Industry 4.0, billions of data sources, including but not limited to interconnected devices (sensors, monitoring devices) forming Cyber-Physical Systems (CPS) and the Internet of Things (IoT) ecosystem, continuously generate, collect, process, and exchange data. With the rapid increase in the number of devices and information systems in use, the amount of data is increasing. Moreover, due to the digitization and variety of data being continuously produced and processed with a reference to Big Data, their value, is also growing. As a result, the risk of security breaches and data leaks. The value of data, however, is dependent on several factors, where data quality and data security that can affect the data quality if the data are accessed and corrupted, are the most vital. Data serve as the basis for decision-making, input for models, forecasts, simulations etc., which can be of high strategical and commercial / business value. This has become even more relevant in terms of COVID-19 pandemic, when in addition to affecting the health, lives, and lifestyle of billions of citizens globally, making it even more digitized, it has had a significant impact on business. This is especially the case because of challenges companies have faced in maintaining business continuity in this so-called “new normal”. However, in addition to those cybersecurity threats that are caused by changes directly related to the pandemic and its consequences, many previously known threats have become even more desirable targets for intruders, hackers. Every year millions of personal records become available online. Moreover, the popularity of IoTSE decreased a level of complexity of searching for connected devices on the internet and easy access even for novices due to the widespread popularity of step-by-step guides on how to use IoT search engine to find and gain access if insufficiently protected to webcams, routers, databases and other artifacts. A recent research demonstrated that weak data and database protection in particular is one of the key security threats. Various measures can be taken to address the issue. The aim of the study to which this presentation refers is to examine whether “traditional” vulnerability registries provide a sufficiently comprehensive view of DBMS security, or whether they should be intensively and dynamically inspected by DBMS holders by referring to Internet of Things Search Engines moving towards a sustainable and resilient digitized environment. The paper brings attention to this problem and make the reader think about data security before looking for and introducing more advanced security and protection mechanisms, which, in the absence of the above, may bring no value.
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
This presentation is devoted to the "ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you" research paper developed by Artjoms Daskevics and Anastasija Nikiforova and presented during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based).
Read paper here -> Daskevics, A., & Nikiforova, A. (2021, November). ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 38-45). IEEE.
Atvērtā lekcija "Atvērto datu potenciāls" notika LU SZF maģistrantūras kursa “Datu sabiedrības vadība” ietvaros, ko nolasīja Dr.sc.comp. Anastasija Ņikiforova, LU Datorikas fakultātes docente, pētniece.
Atvērtie dati tiek uzskatīti par vērtīgu resursu, kura izmantošana ir potenciāli spējīga sniegt ievērojamus ekonomiskus, tehnoloģiskus un sociālus ieguvumus. Taču to panākšanai ir jāizpildās virknei priekšnosacījumu, kas attiecināmi gan uz datiem, gan uz infrastruktūru, gan uz lietotājiem, t.i. atvērto datu iniciatīvas veiksmes faktors ir ilgtspējīgas atvērto pārvaldes datu ekosistēmas izveide un uzturēšana. Lekcijas mērķis ir sniegt ieskatu par atvērto datu popularitāti un potenciālu tehnoloģisko un ekonomisko procesu attīstībai, uzmanību pievēršot to praktiskiem pielietojumiem gan Latvijā, gan ārpus tās, datus transformējot (inovatīvajos) risinājumos un pakalpojumos. Tāpat, ir plānots sniegts ieskatu par nozīmīgākajiem aspektiem, kas potenciāli ir spējīgi sekmēt ilgtspējīgas atvērto datu ekosistēmas izveidi, nodrošinot iespēju ikvienam interesentam atvērtus datus transformēt vērtībā.
PhD, Dc. comp.sc. Anastasija Ņikiforova ir Latvijas Universitātes Datorikas Fakultātes docente un Inovatīvo informācijas tehnoloģiju laboratorijas pētniece. Dr. Ņikiforovas pētnieciskas intereses ir saistītas ar datu pārvaldības, īpaši datu kvalitātes, un atvērto datu saistītājiem jautājumiem. LU Datorikas fakultātē papildus citiem docētājiem kursiem viņa ir izstrādājusi Specsemināru “Atvērtie dati un datu kvalitāte” un maģistra programmas kursu “Atvērtie pārvaldes dati datu-virzītā pasaulē”. Dr. Ņikiforova ir Latvijas Zinātnes padomes eksperte Inženierzinātnes un tehnoloģijas (Elektrotehnika, elektronika, informācijas un komunikāciju tehnoloģijas) un Dabaszinātnes (Datorzinātnes un informātika) nozarēs, kā arī LATA (Latvijas Atvērto Tehnoloģiju Asociācija) asociētā biedre. Viņa ir vairāk kā 25 zinātnisko rakstu (līdz-)autore, 4 no kuriem ir publicēti augstākā rangā Q1 žurnālos.
This presentation is a supplementary material for the following article -> Nikiforova, A., Bicevskis, J., & Karnitis, G. (2020, December). Towards a Concurrence Analysis in Business Processes. In 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 1-6). IEEE.
This paper presents first steps towards a solution aimed to provide concurrent business processes analysis methodology for predicting the probability of incorrect business process execution. The aim of the paper is to (a) look at approaches to describing and dealing with the execution of concurrent processes, mainly focusing on the transaction mechanisms in database management systems, (b) present an idea and a preliminary version of an algorithm that detects the possibility of incorrect execution of concurrent business processes. Analyzing business process according to the proposed procedure allows to configure transaction processing optimally.
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A., Bicevskis, J., Bicevska, Z., & Oditis, I. (2020, December). Data quality model-based testing of information systems: the use-case of E-scooters. In 2020 7th International Conference on Internet of Things: Systems, Management and Security (IOTSMS) (pp. 1-8). IEEE.
The paper proposes a data quality model-based testing methodology aimed at improving testing methodology of information systems (IS) using previously proposed data quality model. The solution supposes creation of a description of the data to be processed by IS and the data quality requirements used for the development of the tests, followed by performing an automated test of the system on the generated tests verifying the correctness of data to be entered and stored in the database. The generation of tests for all possible data quality conditions creates a complete set of tests that verify the operation of the IS under all possible data quality conditions. The proposed solution is demonstrated by the real example of the system dealing with e-scooters. Although the proposed solution is demonstrated by applying it to the system that is already in use, it can also be used when developing a new system.
This paper is a supplementary material for the following article -> Bicevskis, J., Nikiforova, A., Bicevska, Z., Oditis, I., & Karnitis, G. (2019, October). A step towards a data quality theory. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 303-308). IEEE.
Data quality issues have been topical for many decades. However, a unified data quality theory has not been proposed yet, since many concepts associated with the term “data quality” are not straightforward enough. The paper proposes a user-oriented data quality theory based on clearly defined concepts. The concepts are defined by using three groups of domain-specific languages (DSLs): (1) the first group uses the concept of a data object to describe the data to be analysed, (2) the second group describes the data quality requirements, and (3) the third group describes the process of data quality evaluation. The proposed idea proved to be simple enough, but at the same time very effective in identifying data defects, despite the different structures of data sets and the complexity of data. Approbation of the approach demonstrated several advantages: (a) a graphical data quality model allows defining of data quality even by non-IT and non-data quality professionals, (b) data quality model is not related to the information system that has accumulated data, i.e., this approach lets users analyse the “third-party” data, and (c) data quality can be described at least at two levels of abstraction - informally, using natural language, or formally, including executable program routines or SQL statements.
The paper proposes a user-oriented data quality theory based on clearly defined concepts. The concepts are defined byusing three groups of domain-specific languages(DSLs): (1) the first group usestheconcept of a data object to describe the data to be analysed, (2) the second group describes the data quality requirements, and (3) the third group describes the process of data quality evaluation. The proposed ideaproved to be simple enough,but at the same time very effectivein identifyingdata defects, despitethedifferent structures of data sets andthe complexity ofdata. Approbation of the approach demonstratedseveral advantages: (a) a graphical data quality model allows defining of data quality even by non-IT and non-data qualityprofessionals, (b) data quality model is not related to the information system that has accumulated data, i.e., this approach letsusers analysethe"third-party” data, and (c) data quality can be described at least attwo levelsof abstraction –informally,using natural language,or formally,including executable program routines or SQL statements.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
1. Anastasija Nikiforova, PhD
Assistant professor and researcher, University of Latvia, Faculty of Computing (“Innovative Information Technologies” Lab, Programming Department)
IT-expert and researcher, Latvian Biomedical Research and Study Centre (BMC, BBMRI-LV), Latvian Biobank (LGDB)
Expert of the Latvian Council of Sciences (1) Computer Science and Informatics and (2) Electrical Engineering, Electronics, ICT
Associate member of the Latvian Open Technology Association
Anastasija.Nikiforova@lu.lv, https://anastasijanikiforova.com/
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE
TRENDS, SUCCESS STORIES AND BARRIERS
Guest Lecture for the University of South-Eastern Norway (USN), October 2021
4. Source: Lokers, R., Knapen, R., Janssen, S., van Randen, Y., & Jansen, J. (2016). Analysis of Big Data technologies for use in agro-environmental science. Environmental Modelling & Software, 84, 494-504.
***
5. Image source: https://arcticanthropology.org/
Image source: https://globalbusinesscoalition.org/global-governance-news/b20-tokyo-summit-joint-recommendations-society-5-0-for-sdgs/attachment/figure-2-society-5-0-for-sdgs/
SOCIETY 5.0
6. a way by which to guide and mobilize action in science, technology and innovation (STI)
to achieve a prosperous, sustainable, and inclusive future
-- Japan, 5th Science and Technology Basic Plan
SOCIETY 5.0
Source: https://www.japanhouselondon.uk/whats-on/2020/society-5-0-a-new-model-for-an-ageing-society-a-talk-by-professor-harayama-yuko/
Society 5.0 aims to resolve various modern social challenges by incorporating game-changing innovations such as the
Internet of things (IoT), robotics, AI and big data into all industries and social activities.
Rather than a future controlled and monitored by AI and robots, technology is harnessed to achieve a human-centred society
in which each and every person can lead an active and enjoyable life.
Following the hunting society (Society 1.0), the agricultural society (Society 2.0), the industrial society (Society 3.0), and the information
society (Society 4.0), the far-reaching policies of Society 5.0 propose a new transformation of contemporary ways of life.
8. DIGITAL
TRANSFORMATION
IMAGINATION AND CREATIVITY
OF INDIVIDUALS
PROBLEM SOLVING
VALUE CREATION
X
https://digital.thecatcompanyinc.com/b20magazine/tokyo-2019/society-5-0-updates-on-japanese-business-and-economy/
10. OPENNESS
Open data, i.e., freely accessible, shareable, and usable data;
Open science, i.e., making scientific research and its
dissemination accessible to all levels of the society;
Open standards, i.e., technology neutral specifications for
hardware, software, or data developed through an open process;
Open source software, i.e., free and open collaborative software
development;
Open hardware, i.e., physical products, machines and systems
designed and offered by means of publicly shared information;
Open education, i.e., learning and teaching without barriers
11. “Open data” are data
that anyone can access, use and share
The McKinsey Global Institute
report estimated that open data
could
add over $3 trillion annually
in total value to the global
economy.
The aggregate economic
impact from applications
based on open data across
the EU27 economy
is estimated to be
€140 billion annually.
improve
economic
growth
14. tracking events, identifying the situation, tracking the spread of
the disease etc.
data-driven decision-making
planning, forecasting
better understanding of decisions of the government, including
tracking the decisions and restrictions introduced
Improvement of the transparency, accountability and trust in
government decisions to promote
analysis of impacts, causes-effects and relationships
development of data-driven solutions
updating and / or enrichning other datasets
and for very many other purposes...
Attēla avots: https://4.bp.blogspot.com/-VA_zM5jXPvk/VxbRtA0U2JI/AAAAAAAACnY/WKvyQnAwvBIQOsjjLF0-TchL3EtD-q-
hQCLcB/s1600/actual%2Becosystem.png
15. Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential elements of open data ecosystems. Information polity, 19(1, 2), 17-33.
18. OGD: the quality aspect takes only the 4th place by popularity
after policy, benefit and risk, although quality can impact these
aspects. (Klein et al., 2018)
Data quality appears as one of most problematical dimensions
for open data portals.
Def. II: «Quality» is a desirable goal to be achieved through management of the
production process.
Def. III: «Data quality» is a relative concept, largely dependent on specific requirements
resulting from the data use.
(SunlightFoundation, 2007), (European Data Portal, 2018)
AND WHAT ABOUT DATA QUALITY???
*
Complete
Primary
Accessible
Machine-
processable
Timely
Non-discriminatory
Non-proprietary
Licence-free
OPEN DATA PRINCIPLES AND DATA QUALITY
20. Data quality is only included
from 2018!
2020 2019 2018
OPEN DATA QUALITY. EUROPEAN DATA PORTAL
21. DATA QUALITY vs. EDP OPEN DATA QUALITY
https://iso25000.com/index.php/en/news/140-information-about-iso-iec-25012-is-now-available-in-iso25000-com
ISO/IEC 25012
DAMA UK
https://studylib.net/doc/25344494/dimension-dama-uk-working-group
22. !!! The same data may be sufficiently qualitative in one case
BUT
completely useless under other circumstances.
open data are usually used by wide audience that
may not have deep knowledge in IT or data quality areas
a solution should be simple enough
ensuring particular users with possibility to take part in
the analysis of «third-party» open data
for their own purposes
OPEN DATA QUALITY
«… This state of affairs has led to much confusion within
the data quality community and is even more bewildering
for those who are new to the discipline and more
importantly to business stakeholders…»
(DAMA UK, 2018)
** In different proposals, dimensions of the same name can
have different semantics and vice versa.
(Batini, 2016)
23. TDQM data quality lifecycle
Data quality
definition
Data quality
measuring
Data quality
analysis
Data quality
improvement
DATA QUALITY
TDQM
data quality
lifecycle
General studies on data and information quality - define different dimensions
of quality and their groupings.
✘ The key data quality dimensions are not universally*;
✘ There is no agreement on their meanings and usability **;
✘ Each dimension can be supplied with one or more metrics that
varies from one solution to another;
✘ The number of different data quality dimensions, their definitions
and grouping are often useful for only particular solution.
Question: How to relate particular dimension (and which one?) to a
particular use-case???
Problem: necessity to involve data quality experts at every stage of data quality analysis
process.
Solution (not a silver bullet, but just an attempt): data object-driven approach to data
quality evaluation
Nikiforova, A. (2020). Definition and Evaluation of Data Quality: User-Oriented Data Object-Driven Approach to Data Quality
Assessment. Baltic Journal of Modern Computing, 8(3), 391-432.
24. Q: Are open data generally of high quality?
A: NO, open data have a high number of different data quality problem, however, data
publishers (who provides data used in their IS), probably, don’t even aware of them.
The most frequently occurred are:
✘ contextual data quality issues;
✘ empty values even for primary data;
✘ multiple denotation for the same object within one data object and even a parameter;
✘ issues on interrelated parameters.
OPEN DATA QUALITY...
28. A USER-CENTRED USABILITY ANALYSIS OF
OPEN GOVERNMENT DATA PORTALS
41 portal
40 experts
1 framework
14 criteria (3 categories)
For more details see:
Nikiforova, A., & McBride, K. (2021).
Open government data portal usability:
A user-centred usability analysis of 41
open government data portals.
Telematics and Informatics, 58, 101539.
2,975
2,25
2,35
2,2
2,15
2,9
1,65
2,325
2
1,8
1,4
1,15
1
1,15
31. Supporting growing economies
To support the emergence of new data-driven
businesses and the growth of existing ones,
governments need to publish key datasets.
Governments also need to support data infrastructure
that connects data with those who use it.
In return, governments are reaping the benefits of a
growing data economy, such as in Finland where
SMEs with access to open data grew 15% faster than
those without.
Take me to the Finnish case study
Improved service delivery
Governments need to balance the demands of
growing populations with the need to tackle
small-scale, local issues.
The availability of detailed open data is
essential to improving delivery of services at
the local level.
Some of these new services are available now:
Take me to mySociety
Take me to the Hungarian 'right to know' portal
Take me to Fix my Street Norway
Cost savings
Open data allows governments to make
savings in key areas such as healthcare,
education and utilities.
In the UK, open data helped reveal £200
million of savings in the health service.
In France, energy data is being used to drive
more efficient energy generation practices.
Show me the France energy data.
Open data can also bring transparency and
accountability to budgets.
Source: https://data.europa.eu/elearning/en/module2/#/id/co-01
OPEN DATA USE. GOVERNMENT (source:
data.europa.eu)
32. Improving the way we move
Open data has the power to revolutionise the way
we travel.
Within the Dutch transport industry, open data is
helping a growing number of small companies to
develop new services.
French app Tranquilien improves passenger
comfort on transport and promotes efficient use of
public transport by providing relevant
information about empty seats, leaving times
A new Dutch app, winner of the prestigious
Apps4Europe competition, helps disabled people
to book travel assistance for their journeys using
open data.
Open transport data saves commuters time, makes
journeys more accessible and helps tourists to
travel in unfamiliar cities.
Improving the way we work
Open data is changing the way we work.
Open data reduces the time needed to find
information and allows professionals to focus
more of their time on productive activities.
OpenCorporates offers an open database of
companies around the world, showing their
networks, financial stability and environmental
impact. This helps organisations learn more about
prospective clients, providers and partners.
Take me to OpenCorporates
The Finnish Kannattaako kauppa service
provides insights on the price development of
real estate in the future, making it easy to
compare houses and neighborhoods by price and
population.
Improving the way we govern
Open data is becoming a key source of evidence for governments in the
policymaking process.
Public administration will gain the most from opening up data, with a
value of 22 bn EUR in 2020. For agriculture, the arts and entertainment
sector, the benefits expected are smaller with 379 million EUR each.
They still have a lot of potential in these sectors but will take more time
to reach the full potential.
They are also making the development of public policy more
transparent and supporting dialogue between governments and
citizens.
Data on key issues such as immigration, trade and budget cuts can be
used to inform important policy decisions.
CityScale is a Ukrainian platform that provides Ukrainian citizens with
relevant open data, such as on crime rates, health care, and air
pollution.
Take me to London fire station analysis
OPEN DATA USE. COMMUNITY AND PUBLIC
TRANSFORMATION
33. Environment
Open data helps farmers to improve yields and support a
growing population without the need to destroy valuable
habitats.
Plantwise are collecting open data to produce valuable
information packs for farmers about plant health and threats
from diseases. Take me to Plantwise
CIARD has produced a central repository of more than 1,500
open agricultural research collections worldwide,
highlighting new research opportunities. Take me to CIARD
Saving lives
Open data is helping to save lives. Open
geographic data and aid statistics are being used
by humanitarian groups to deliver targeted
supplies in disaster zones.
Open mapping data helped disaster response
teams target aid delivery during the 2010 Haiti
earthquake. Haiti Open Street Map.
Open data was also used for responses to the
Philippines typhoon in 2014.
Culture
Open data is connecting people with important cultural issues and
helping to shape a more informed debate around them.
OpenGLAM is helping to capture the heritage and cultural
memories of groups in Germany, Switzerland and Finland. Take
me to OpenGLAM.
The Open Data Institute is leading a global Data as Culture
programme, with artists in residence re-examining the
fundamental ways in which data is perceived. Take me to ODI
Data as Culture
OPEN DATA USE. CULTURE AND ENVIRONMENT
34. OPEN DATA IN THE SCIENCE
*López, V.; Čukić, M. A dynamical model of SARS-CoV-2 based on people flow networks. Saf. Sci. 2021, 134, 105034
** Stieb, D.M.; Evans, G.J.; To, T.M.; Brook, J.R.; Burnett, R.T. An ecological analysis of long-term exposure to PM2.5 and incidence of COVID-19 in Canadian health regions. Environ. Res. 2020,
191, 110052
***Yacchirema, D.C.; Sarabia, D.; Palau, C.E.; Esteve, M. A Smart System for Sleep Monitoring by Integrating IoT With Big Data Analytics. IEEE Access 2018, 6, 35988–36001
****Chen, L.J.; Ho, Y.H.; Lee, H.C.; Wu, H.C.; Liu, H.M.; Hsieh, H.H.; Lung, S.C.C. An open framework for participatory PM2.5 monitoring in smart cities. IEEE Access 2017, 5, 14441–14454.
COVID-19 OGD a SARS-CoV-2
virus transmission model based on
human flow networks new
perspectives + modeling of different
scenarios + illustrating the evolution
of and trends in the pandemic*.
relationship between COVID-19 open
data and PM2.5 a positive
relationship between long-term
PM2.5 exposure and the incidence of
COVID-19 **.
air pollution open data catalog detecting and treatment of one
of the most important sleep disorders, Obtrusive sleep apnea
(OSA)***.
real-time (!!!) open data urban-sensing framework for fine
particulate matters PM2.5 - Taiwan +29 countries one of the
largest deployment projects for PM2.5 monitoring in the world
collected data are released in real time and in an open data
manner, which has contributed to the development of other
products and services using data which has been made open,
thereby creating a chain of valuable open data-based solutions
and services**
https://www.mdpi.com/1424-8220/21/15/5204/htm
35. OPEN DATA IN THE SCIENCE. TOOL OR
RESOURCE?
INPUT DATA (RESOURCE)
New services, solutions etc.
Example: medicine, transport, environment,
Smart City etc.
TOOL
Improvement of existing algorithms
Optimization of the existing algorithms,
development of new algorithms (using as training
data or supplementing data etc.).
44. SMARTER OPEN GOVERNMENT DATA FOR SOCIETY 5.0: ARE
YOUR OPEN DATA SMART ENOUGH?
40 out of 51 OGD portals provide open data related to COVID-19,
32 portals provide real-time data
29 provide sensor data.
many countries are trying to follow the latest trends and provide data that could be important for their users in transforming into innovative
solutions and services that create value for both the economy and the society, including moving towards the Smart City and the Smart Society
BUT some countries have not yet opened these data.
Although in general, “smarter” data and higher quality data are often typical for highly developed countries, which follow the
trends of Smart Cities at both economic, political and social levels, for many countries this relationship is less obvious:
developed countries can demonstrate weak results in terms of data provision and their usability, while less developed countries can be characterized
by relatively competitive results;
among the countries that have already opened these data, the majority of portals have gaps in the usability of these data in terms of their machine-
readability, the unavailability of API, and the timeliness and frequency of updates.
Partial compliance with Society 5.0 trends (users needs and intentions)
45. SMARTER OPEN GOVERNMENT DATA FOR SOCIETY 5.0.
SOME RESULTS
Real-time data Sensor data
COVID-19 data
For more details see:
Nikiforova, A. (2021). Smarter Open Government Data for Society
5.0: Are Your Open Data Smart Enough?. Sensors, 21(15), 5204.
46. CONCLUSIONS
open data became a daily phenomena.
Society 5.0 requires some more advanced guidelines to be defined and involved.
Openness (open data + open science) is
one of the crucial drivers of a sustainable economy and real transformation of the society, science, government
a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0
allows solving problems that were not central research objects to the original data holders,
improve previous results, establish cooperation to tackle challenges together.
Let’s transform the world together!
BUT! In order to get these benefits:
the data should must be qualitative,
the data should be valuable and “smart ”(in line with “high-value data” term in
both PSI Directive and country-specific sense)
portal – usable and user-friendly,
service – supportive,
policy – active, effective and efficient.
47. THANK YOU FOR
ATTENTION!
QUESTIONS?
For more information, see ResearchGate
See also https://anastasijanikiforova.com/
For question or any queries, contact me via
anastasija.nikiforova@lu.lv