SlideShare a Scribd company logo
1 of 13
This document is confidential and contains proprietary information, including trade secrets of CitiusTech. Neither the document nor any of the information
contained in it may be reproduced or disclosed to any unauthorized person under any circumstances without the express written permission of CitiusTech.
De-duplicated Refined Zone in Healthcare Data
Lake Using Big Data Processing Frameworks
3rd May, 2018 | Author: Sagar Engineer| Technical Lead
CitiusTech Thought
Leadership
2
Objective
 Data preparation is a costly and complex process. Even a small error may lead to inconsistent
records and incorrect insights. Rectifying data errors often involves a significant time and effort.
 Veracity plays an important role in data quality. Veracity generally describes issues such as
inconsistency, incompleteness, duplication and ambiguity of data; one of the important one is
data duplication.
 Duplicate records can cause:
• Incorrect / unwanted / ambiguous reports and skewed decisions
• Difficulty in creating 360-degree view for a patient
• Problems in providing prompt issue resolution to customers
• Inefficiency and loss of productivity
• Large number of duplicate records may need more unnecessary processing power / time
 Moreover, the data duplication issue becomes difficult to handle in Big Data because:
• Hadoop / Big Data ecosystem only supports appending data, record level updates are not
supported
• Updates are only possible by rewriting the entire dataset with merged records
 The objective of this document is to provide an effective approach to create de-duplicated zone
in Data Lake using Big Data frameworks.
3
Agenda
 Addressing the Data Duplication Challenge
 High Level Architecture
 Implementing the Solution
 References
4
Addressing the Data Duplication Challenge (1/2)
Approach 1: Keep duplicate records in Data Lake, query records using maximum of timestamp to
get unique records
 User needs to provide maximum timestamp as predicate in each data retrieval query
 This option can cause performance issues when data increases beyond few terabytes depending
on the cluster size
 In order to get better performance, this option needs a powerful cluster, causing an increase in
RAM / memory cost
Pros
 Eliminates an additional step for de-duplication using batch processing
 Leverages in-memory processing logic for retrieval of the latest records
 Will work for datasets up to few hundreds of terabytes depending on the cluster size
Cons
 Not feasible for hundreds of petabytes of data
 High infrastructure cost for RAM / memory to fit in hundreds of terabytes of data
 Response time for retrieval queries will be high if table joins are involved
5
Addressing the Data Duplication Challenge (2/2)
Approach 2: Match and rewrite records to create a golden copy (Preferred Option)
 Implement complex logic for identifying and rewriting records
 Depending on the dataset and cluster size the time taken by the process varies
 Creates a non-ambiguous golden copy of dataset for further analysis.
Pros
 Heavy processing for de-duplication will be part of batch processing
 Faster query response and scalable for joining tables
 Data is stored on HDFS (Hadoop Distributed File System)
 No concept of RegionServer instances which makes it cost effective to use
 Concept of partitioning helps in segregating data
 Support for file formats like parquet enables faster query response
 Support for append and overwrite features on tables and partitions
 Apache Hive is mainly used for heavy batch processing and analytical queries
Cons
 Batch processing may take some time to complete
 One-time coding effort
6
Approach 2: High Level Architecture (1/2)
ETL
Hadoop Big Data LakeData Sources
Relational
Sources
MDM
Unstructured
Data
Landing
Zone
Raw
Zone
Refined
Zone
De-duplicated
Data Mart
Ad-hoc
Querying
Applications
Data
Visualization
Self-Service
Tool
Data Analysis
Golden
Record
7
Approach 2: High Level Architecture (2/2)
Component Description
Landing Zone  Data from source is loaded in the Landing zone and then compared with Raw zone during
processing. For example, to identify changed dataset or to perform data quality
Raw Zone  Raw zone will have the relational data from the Landing zone and may be stored in partitions.
All the incremental data will be appended to Raw zone. Raw zone will also store the
unstructured/semi-structured data from respective sources. User can perform raw analytics
on Raw zone
ETL  ETL framework picks up the data from Raw zone and applies transformations. For example,
mapping to target model / reconciliation, parsing unstructured/semi-structured data,
extracting specified elements and storing it in tabular format
Refined Zone  Data from Raw zone is reconciled / standardized / cleansed and de-duplicated in Refined zone
 Easy and proven 3-step approach to create refined deduped dataset in Hive using Spark/Hive
QL
 This will be a perfect use case for Spark jobs / Hive queries depending upon the complexity
 Comparing records based on keys and surviving records with the latest timestamp can be the
most effective way of de-duplication
 Hadoop / HDFS is known to be efficient for saving data in Append mode. Handling data
updates in Hadoop is challenging & there is no bulletproof solution to handle it
8
Implementing the Solution: Technology Options (1/2)
Use Hive as the
processing engine
Use HBase as data store for
de-duplication zone
Use Spark Based
processing engine
OptionsDescription
 Hive uses MapReduce engine
for any SQL processing.
 Leverage MapReduce jobs
spawned by Hive SQL to
identify updates and rewrite
updated datasets.
 Use Hive query to find out
incremental updates and write
new files.
 Compare incremental data
with existing data using Where
clause and get a list of all the
affected partitions.
 Use HQL to find latest records
and rewrite affected
partitions.
 HBase handles updates
efficiently on predefined Row
key which acts as primary key
to the table.
 This approach helps in
building the reconciled table
without having to explicitly
write code for de-duplicating
the data.
 Use Spark engine to implement
complex logic for identifying
and rewriting records.
 Spark APIs are available in Java,
Scala, and Python. It also
includes Spark SQL for easy
data transformations
operations.
 Use Hive context in Spark to
find incremental updates and
write new files.
 Compare incremental data with
existing data using Where
clause and get a list of all the
affected partitions.
 Use Spark to find latest records
and rewrite affected partitions
9
Implementing the Solution: Technology Options (2/2)
Use Hive as the
processing engine
Use HBase as data store for
de-duplication zone
Use Spark Based
processing engine
OptionsPros
 MapReduce distributed engine
can handle huge volume of
data
 SQL makes it easy to write
logic instead of writing
complex MapReduce codes
 Records can be retrieved in a
fraction of a second if searched
using row key.
 HBase handles updates
efficiently on predefined Row
key which acts as primary key
 Transactional processing and
real-time querying
 100x faster than MapReduce
 Relatively simpler to code
compared to MapReduce
 Spark SQL, Data Frames, and Data
Sets API are readily available
 Processing happens in-memory
and supports overflow to disk
Cons
 MapReduce processing is
very slow
 NoSQL makes it difficult to join
tables
 High volume data ingestions can
be as slow as 5000 records/second
 Data is stored in-memory on
HBase RegionServer instances
which requires more memory and
in turn increases cost
 Ad hoc querying will perform full
table scanning which is not a
feasible approach
 Infrastructure cost may go up
due to higher memory (RAM)
requirements due to in-
memory analytics
10
Spark provides complete
processing stack for batch
processing, standard SQL based
processing, Machine Learning,
and stream processing.
However, memory requirement
increases with increase in
workload, infrastructure cost
may not go up drastically due to
decline in memory price.
Recommended Option: Spark Based Processing Engine
Solution Overview
 Tables with data de-duplication need to be partitioned by the
appropriate attributes so that the data will be evenly distributed
 Depending on use case, deduped tables may or may not host
semi-structured or unstructured data with unique key identifiers
 Identify unique records in a given table. These attributes will be
used during de-duplication process
 Incremental dataset must have a key to identify affected
partitions
 Identify new records (records previously not present in data lake)
from incremental dataset
 Insert new records in a temp table
 Identify affected partitions containing records to be updated
 Apply de-duplication logic to select only latest data from
incremental data and refined zone data
 Overwrite only affected partitions in de-duplicated zone with the
latest data for updated records
 Append new records from the temp table to refined de-
duplicated zone
11
References
Data Lake
http://www.pentaho.com/blog/5-keys-creating-killer-data-lake
https://www.searchtechnologies.com/blog/search-data-lake-with-big-data
https://knowledgent.com/whitepaper/design-successful-data-lake/
Hive Transaction Management
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-
ConfigurationValuestoSetforINSERT,UPDATE,DELETE
12
Keywords
 Data Lake
 Data Lake Strategies
 Refined Zone
 Big Accurate Data
 Golden Record
13
Thank You
Author:
Sagar Engineer
Technical Lead
thoughtleaders@citiustech.com
About CitiusTech
2,900+
Healthcare IT professionals worldwide
1,200+
Healthcare software engineering
700+
HL7 certified professionals
30%+
CAGR over last 5 years
80+
Healthcare customers
 Healthcare technology companies
 Hospitals, IDNs & medical groups
 Payers and health plans
 ACO, MCO, HIE, HIX, NHIN and RHIO
 Pharma & Life Sciences companies

More Related Content

What's hot

Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesCitiusTech
 
ORCID for researchers: What, why, how?
ORCID for researchers: What, why, how?ORCID for researchers: What, why, how?
ORCID for researchers: What, why, how?ORCID, Inc
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Capgemini
 
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachAutomated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachDatabricks
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehousesDhani Ahmad
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Data Warehouse Architecture.pptx
Data Warehouse Architecture.pptxData Warehouse Architecture.pptx
Data Warehouse Architecture.pptx22PCS007ANBUF
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 

What's hot (20)

Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data storage
Big data storageBig data storage
Big data storage
 
ORCID for researchers: What, why, how?
ORCID for researchers: What, why, how?ORCID for researchers: What, why, how?
ORCID for researchers: What, why, how?
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
 
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachAutomated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
 
NoSql
NoSqlNoSql
NoSql
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Data Warehouse Architecture.pptx
Data Warehouse Architecture.pptxData Warehouse Architecture.pptx
Data Warehouse Architecture.pptx
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 

Similar to De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing Frameworks

Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platformSurinder Mehra
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)ruchabhandiwad
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.pptpadalamail
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET Journal
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree AnikeyRoy
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtreesamirandev1
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtreedevraajsingh
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog sameerroshan
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyDonna Guazzaloca-Zehl
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 

Similar to De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing Frameworks (20)

Dremel Paper Review
Dremel Paper ReviewDremel Paper Review
Dremel Paper Review
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtree
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 

More from CitiusTech

Member Engagement Using Sentiment Analysis for Health Plans
Member Engagement Using Sentiment Analysis for Health PlansMember Engagement Using Sentiment Analysis for Health Plans
Member Engagement Using Sentiment Analysis for Health PlansCitiusTech
 
Evolving Role of Digital Biomarkers in Healthcare
Evolving Role of Digital Biomarkers in HealthcareEvolving Role of Digital Biomarkers in Healthcare
Evolving Role of Digital Biomarkers in HealthcareCitiusTech
 
Virtual Care: Key Challenges & Opportunities for Payer Organizations
Virtual Care: Key Challenges & Opportunities for Payer Organizations Virtual Care: Key Challenges & Opportunities for Payer Organizations
Virtual Care: Key Challenges & Opportunities for Payer Organizations CitiusTech
 
Provider-led Health Plans (Payviders)
Provider-led Health Plans (Payviders)Provider-led Health Plans (Payviders)
Provider-led Health Plans (Payviders)CitiusTech
 
CMS Medicare Advantage 2021 Star Ratings: An Analysis
CMS Medicare Advantage 2021 Star Ratings: An AnalysisCMS Medicare Advantage 2021 Star Ratings: An Analysis
CMS Medicare Advantage 2021 Star Ratings: An AnalysisCitiusTech
 
Accelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOpsAccelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOpsCitiusTech
 
FHIR for Life Sciences
FHIR for Life SciencesFHIR for Life Sciences
FHIR for Life SciencesCitiusTech
 
Leveraging Analytics to Identify High Risk Patients
Leveraging Analytics to Identify High Risk PatientsLeveraging Analytics to Identify High Risk Patients
Leveraging Analytics to Identify High Risk PatientsCitiusTech
 
FHIR Adoption Framework for Payers
FHIR Adoption Framework for PayersFHIR Adoption Framework for Payers
FHIR Adoption Framework for PayersCitiusTech
 
Payer-Provider Engagement
Payer-Provider Engagement Payer-Provider Engagement
Payer-Provider Engagement CitiusTech
 
COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021
COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021
COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021CitiusTech
 
Demystifying Robotic Process Automation (RPA) & Automation Testing
Demystifying Robotic Process Automation (RPA) & Automation TestingDemystifying Robotic Process Automation (RPA) & Automation Testing
Demystifying Robotic Process Automation (RPA) & Automation TestingCitiusTech
 
Progressive Web Apps in Healthcare
Progressive Web Apps in HealthcareProgressive Web Apps in Healthcare
Progressive Web Apps in HealthcareCitiusTech
 
RPA in Healthcare
RPA in HealthcareRPA in Healthcare
RPA in HealthcareCitiusTech
 
6 Epilepsy Use Cases for NLP
6 Epilepsy Use Cases for NLP6 Epilepsy Use Cases for NLP
6 Epilepsy Use Cases for NLPCitiusTech
 
Opioid Epidemic - Causes, Impact and Future
Opioid Epidemic - Causes, Impact and FutureOpioid Epidemic - Causes, Impact and Future
Opioid Epidemic - Causes, Impact and FutureCitiusTech
 
Rising Importance of Health Economics & Outcomes Research
Rising Importance of Health Economics & Outcomes ResearchRising Importance of Health Economics & Outcomes Research
Rising Importance of Health Economics & Outcomes ResearchCitiusTech
 
ICD 11: Impact on Payer Market
ICD 11: Impact on Payer MarketICD 11: Impact on Payer Market
ICD 11: Impact on Payer MarketCitiusTech
 
Testing Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on HadoopTesting Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on HadoopCitiusTech
 
Driving Home Health Efficiency through Data Analytics
Driving Home Health Efficiency through Data AnalyticsDriving Home Health Efficiency through Data Analytics
Driving Home Health Efficiency through Data AnalyticsCitiusTech
 

More from CitiusTech (20)

Member Engagement Using Sentiment Analysis for Health Plans
Member Engagement Using Sentiment Analysis for Health PlansMember Engagement Using Sentiment Analysis for Health Plans
Member Engagement Using Sentiment Analysis for Health Plans
 
Evolving Role of Digital Biomarkers in Healthcare
Evolving Role of Digital Biomarkers in HealthcareEvolving Role of Digital Biomarkers in Healthcare
Evolving Role of Digital Biomarkers in Healthcare
 
Virtual Care: Key Challenges & Opportunities for Payer Organizations
Virtual Care: Key Challenges & Opportunities for Payer Organizations Virtual Care: Key Challenges & Opportunities for Payer Organizations
Virtual Care: Key Challenges & Opportunities for Payer Organizations
 
Provider-led Health Plans (Payviders)
Provider-led Health Plans (Payviders)Provider-led Health Plans (Payviders)
Provider-led Health Plans (Payviders)
 
CMS Medicare Advantage 2021 Star Ratings: An Analysis
CMS Medicare Advantage 2021 Star Ratings: An AnalysisCMS Medicare Advantage 2021 Star Ratings: An Analysis
CMS Medicare Advantage 2021 Star Ratings: An Analysis
 
Accelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOpsAccelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOps
 
FHIR for Life Sciences
FHIR for Life SciencesFHIR for Life Sciences
FHIR for Life Sciences
 
Leveraging Analytics to Identify High Risk Patients
Leveraging Analytics to Identify High Risk PatientsLeveraging Analytics to Identify High Risk Patients
Leveraging Analytics to Identify High Risk Patients
 
FHIR Adoption Framework for Payers
FHIR Adoption Framework for PayersFHIR Adoption Framework for Payers
FHIR Adoption Framework for Payers
 
Payer-Provider Engagement
Payer-Provider Engagement Payer-Provider Engagement
Payer-Provider Engagement
 
COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021
COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021
COVID19: Impact & Mitigation Strategies for Payer Quality Improvement 2021
 
Demystifying Robotic Process Automation (RPA) & Automation Testing
Demystifying Robotic Process Automation (RPA) & Automation TestingDemystifying Robotic Process Automation (RPA) & Automation Testing
Demystifying Robotic Process Automation (RPA) & Automation Testing
 
Progressive Web Apps in Healthcare
Progressive Web Apps in HealthcareProgressive Web Apps in Healthcare
Progressive Web Apps in Healthcare
 
RPA in Healthcare
RPA in HealthcareRPA in Healthcare
RPA in Healthcare
 
6 Epilepsy Use Cases for NLP
6 Epilepsy Use Cases for NLP6 Epilepsy Use Cases for NLP
6 Epilepsy Use Cases for NLP
 
Opioid Epidemic - Causes, Impact and Future
Opioid Epidemic - Causes, Impact and FutureOpioid Epidemic - Causes, Impact and Future
Opioid Epidemic - Causes, Impact and Future
 
Rising Importance of Health Economics & Outcomes Research
Rising Importance of Health Economics & Outcomes ResearchRising Importance of Health Economics & Outcomes Research
Rising Importance of Health Economics & Outcomes Research
 
ICD 11: Impact on Payer Market
ICD 11: Impact on Payer MarketICD 11: Impact on Payer Market
ICD 11: Impact on Payer Market
 
Testing Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on HadoopTesting Strategies for Data Lake Hosted on Hadoop
Testing Strategies for Data Lake Hosted on Hadoop
 
Driving Home Health Efficiency through Data Analytics
Driving Home Health Efficiency through Data AnalyticsDriving Home Health Efficiency through Data Analytics
Driving Home Health Efficiency through Data Analytics
 

Recently uploaded

VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near MeVIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Memriyagarg453
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...mahaiklolahd
 
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Sheetaleventcompany
 
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreBest Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreDeny Daniel
 
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlKolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlonly4webmaster01
 
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetOzhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetnagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...mahaiklolahd
 
Dehradun Call Girls 8854095900 Call Girl in Dehradun Uttrakhand
Dehradun Call Girls 8854095900 Call Girl in Dehradun  UttrakhandDehradun Call Girls 8854095900 Call Girl in Dehradun  Uttrakhand
Dehradun Call Girls 8854095900 Call Girl in Dehradun Uttrakhandindiancallgirl4rent
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthanindiancallgirl4rent
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Sheetaleventcompany
 
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance PaymentsEscorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance PaymentsAhmedabad Call Girls
 
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.ktanvi103
 
palanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
palanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetpalanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
palanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
neemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
neemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetneemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
neemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Vip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real Meet
Vip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real MeetVip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real Meet
Vip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real MeetAhmedabad Call Girls
 
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetThoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 

Recently uploaded (20)

VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near MeVIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
VIP Call Girls Noida Sia 9711199171 High Class Call Girl Near Me
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
 
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
 
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sambalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreBest Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
 
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlKolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
 
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetBhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Bhagalpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetOzhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetnagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
 
Dehradun Call Girls 8854095900 Call Girl in Dehradun Uttrakhand
Dehradun Call Girls 8854095900 Call Girl in Dehradun  UttrakhandDehradun Call Girls 8854095900 Call Girl in Dehradun  Uttrakhand
Dehradun Call Girls 8854095900 Call Girl in Dehradun Uttrakhand
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024
 
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance PaymentsEscorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
 
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
 
palanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
palanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetpalanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
palanpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
neemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
neemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetneemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
neemuch Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Vip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real Meet
Vip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real MeetVip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real Meet
Vip Call Girls Makarba 👙 6367187148 👙 Genuine WhatsApp Number for Real Meet
 
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetThoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Thoothukudi Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 

De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing Frameworks

  • 1. This document is confidential and contains proprietary information, including trade secrets of CitiusTech. Neither the document nor any of the information contained in it may be reproduced or disclosed to any unauthorized person under any circumstances without the express written permission of CitiusTech. De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing Frameworks 3rd May, 2018 | Author: Sagar Engineer| Technical Lead CitiusTech Thought Leadership
  • 2. 2 Objective  Data preparation is a costly and complex process. Even a small error may lead to inconsistent records and incorrect insights. Rectifying data errors often involves a significant time and effort.  Veracity plays an important role in data quality. Veracity generally describes issues such as inconsistency, incompleteness, duplication and ambiguity of data; one of the important one is data duplication.  Duplicate records can cause: • Incorrect / unwanted / ambiguous reports and skewed decisions • Difficulty in creating 360-degree view for a patient • Problems in providing prompt issue resolution to customers • Inefficiency and loss of productivity • Large number of duplicate records may need more unnecessary processing power / time  Moreover, the data duplication issue becomes difficult to handle in Big Data because: • Hadoop / Big Data ecosystem only supports appending data, record level updates are not supported • Updates are only possible by rewriting the entire dataset with merged records  The objective of this document is to provide an effective approach to create de-duplicated zone in Data Lake using Big Data frameworks.
  • 3. 3 Agenda  Addressing the Data Duplication Challenge  High Level Architecture  Implementing the Solution  References
  • 4. 4 Addressing the Data Duplication Challenge (1/2) Approach 1: Keep duplicate records in Data Lake, query records using maximum of timestamp to get unique records  User needs to provide maximum timestamp as predicate in each data retrieval query  This option can cause performance issues when data increases beyond few terabytes depending on the cluster size  In order to get better performance, this option needs a powerful cluster, causing an increase in RAM / memory cost Pros  Eliminates an additional step for de-duplication using batch processing  Leverages in-memory processing logic for retrieval of the latest records  Will work for datasets up to few hundreds of terabytes depending on the cluster size Cons  Not feasible for hundreds of petabytes of data  High infrastructure cost for RAM / memory to fit in hundreds of terabytes of data  Response time for retrieval queries will be high if table joins are involved
  • 5. 5 Addressing the Data Duplication Challenge (2/2) Approach 2: Match and rewrite records to create a golden copy (Preferred Option)  Implement complex logic for identifying and rewriting records  Depending on the dataset and cluster size the time taken by the process varies  Creates a non-ambiguous golden copy of dataset for further analysis. Pros  Heavy processing for de-duplication will be part of batch processing  Faster query response and scalable for joining tables  Data is stored on HDFS (Hadoop Distributed File System)  No concept of RegionServer instances which makes it cost effective to use  Concept of partitioning helps in segregating data  Support for file formats like parquet enables faster query response  Support for append and overwrite features on tables and partitions  Apache Hive is mainly used for heavy batch processing and analytical queries Cons  Batch processing may take some time to complete  One-time coding effort
  • 6. 6 Approach 2: High Level Architecture (1/2) ETL Hadoop Big Data LakeData Sources Relational Sources MDM Unstructured Data Landing Zone Raw Zone Refined Zone De-duplicated Data Mart Ad-hoc Querying Applications Data Visualization Self-Service Tool Data Analysis Golden Record
  • 7. 7 Approach 2: High Level Architecture (2/2) Component Description Landing Zone  Data from source is loaded in the Landing zone and then compared with Raw zone during processing. For example, to identify changed dataset or to perform data quality Raw Zone  Raw zone will have the relational data from the Landing zone and may be stored in partitions. All the incremental data will be appended to Raw zone. Raw zone will also store the unstructured/semi-structured data from respective sources. User can perform raw analytics on Raw zone ETL  ETL framework picks up the data from Raw zone and applies transformations. For example, mapping to target model / reconciliation, parsing unstructured/semi-structured data, extracting specified elements and storing it in tabular format Refined Zone  Data from Raw zone is reconciled / standardized / cleansed and de-duplicated in Refined zone  Easy and proven 3-step approach to create refined deduped dataset in Hive using Spark/Hive QL  This will be a perfect use case for Spark jobs / Hive queries depending upon the complexity  Comparing records based on keys and surviving records with the latest timestamp can be the most effective way of de-duplication  Hadoop / HDFS is known to be efficient for saving data in Append mode. Handling data updates in Hadoop is challenging & there is no bulletproof solution to handle it
  • 8. 8 Implementing the Solution: Technology Options (1/2) Use Hive as the processing engine Use HBase as data store for de-duplication zone Use Spark Based processing engine OptionsDescription  Hive uses MapReduce engine for any SQL processing.  Leverage MapReduce jobs spawned by Hive SQL to identify updates and rewrite updated datasets.  Use Hive query to find out incremental updates and write new files.  Compare incremental data with existing data using Where clause and get a list of all the affected partitions.  Use HQL to find latest records and rewrite affected partitions.  HBase handles updates efficiently on predefined Row key which acts as primary key to the table.  This approach helps in building the reconciled table without having to explicitly write code for de-duplicating the data.  Use Spark engine to implement complex logic for identifying and rewriting records.  Spark APIs are available in Java, Scala, and Python. It also includes Spark SQL for easy data transformations operations.  Use Hive context in Spark to find incremental updates and write new files.  Compare incremental data with existing data using Where clause and get a list of all the affected partitions.  Use Spark to find latest records and rewrite affected partitions
  • 9. 9 Implementing the Solution: Technology Options (2/2) Use Hive as the processing engine Use HBase as data store for de-duplication zone Use Spark Based processing engine OptionsPros  MapReduce distributed engine can handle huge volume of data  SQL makes it easy to write logic instead of writing complex MapReduce codes  Records can be retrieved in a fraction of a second if searched using row key.  HBase handles updates efficiently on predefined Row key which acts as primary key  Transactional processing and real-time querying  100x faster than MapReduce  Relatively simpler to code compared to MapReduce  Spark SQL, Data Frames, and Data Sets API are readily available  Processing happens in-memory and supports overflow to disk Cons  MapReduce processing is very slow  NoSQL makes it difficult to join tables  High volume data ingestions can be as slow as 5000 records/second  Data is stored in-memory on HBase RegionServer instances which requires more memory and in turn increases cost  Ad hoc querying will perform full table scanning which is not a feasible approach  Infrastructure cost may go up due to higher memory (RAM) requirements due to in- memory analytics
  • 10. 10 Spark provides complete processing stack for batch processing, standard SQL based processing, Machine Learning, and stream processing. However, memory requirement increases with increase in workload, infrastructure cost may not go up drastically due to decline in memory price. Recommended Option: Spark Based Processing Engine Solution Overview  Tables with data de-duplication need to be partitioned by the appropriate attributes so that the data will be evenly distributed  Depending on use case, deduped tables may or may not host semi-structured or unstructured data with unique key identifiers  Identify unique records in a given table. These attributes will be used during de-duplication process  Incremental dataset must have a key to identify affected partitions  Identify new records (records previously not present in data lake) from incremental dataset  Insert new records in a temp table  Identify affected partitions containing records to be updated  Apply de-duplication logic to select only latest data from incremental data and refined zone data  Overwrite only affected partitions in de-duplicated zone with the latest data for updated records  Append new records from the temp table to refined de- duplicated zone
  • 12. 12 Keywords  Data Lake  Data Lake Strategies  Refined Zone  Big Accurate Data  Golden Record
  • 13. 13 Thank You Author: Sagar Engineer Technical Lead thoughtleaders@citiustech.com About CitiusTech 2,900+ Healthcare IT professionals worldwide 1,200+ Healthcare software engineering 700+ HL7 certified professionals 30%+ CAGR over last 5 years 80+ Healthcare customers  Healthcare technology companies  Hospitals, IDNs & medical groups  Payers and health plans  ACO, MCO, HIE, HIX, NHIN and RHIO  Pharma & Life Sciences companies