SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Hadoop Integration with
Microstrategy
Hadoop
 Hadoop is a free, Java-based programming framework that
supports the processing of large data sets in a distributed
computing environment.
 It makes it possible to run applications on systems with
thousands of nodes involving thousands of terabytes.
 Its distributed file system facilitates rapid data transfer rates
among nodes and allows the system to continue operating
uninterrupted in case of a node failure.
 This approach lowers the risk of catastrophic system failure,
even if a significant number of nodes become inoperative.
Why Hadoop?
 Scalibility
 Simply scales just by adding nodes.
 Local processing to avoid network bottlenecks.
• Flexibility
 All kinds of data.(blobs,documents,records etc).
 In all forms(structured,semi-structured,structured)
 Store anything and later analyze what you need.
• Efficiency
 Cost efficiency(<1$kb/Tb) on commodity hardware.
 Unified storage,metadata,security(no duplication or
synchronization)
Core parts of Hadoop
 Hadoop Distributed File System(HDFS)
 It is the primary storage system used by Hadoop applications.
 HDFS is a distributed file system that provides high-performance access
to data across Hadoop clusters. Like other Hadoop-related technologies,
HDFS has become a key tool for managing pools of big data and
supporting big data analytics applications.
 When HDFS takes in data, it breaks the information down into separate
pieces and distributes them to different nodes in a cluster, allowing
for parallel processing. The file system also copies each piece of data
multiple times and distributes the copies to individual nodes, placing at least
one copy on a different server rack than the others. As a result, the data on
nodes that crash can be found elsewhere within a cluster, which allows
processing to continue while the failure is resolved.
 HDFS is built to support applications with large data sets, including
individual files that reach into the terabytes. It uses a master/slave
architecture, with each cluster consisting of a single NameNode that
manages file system operations and supporting DataNodes that manage data
storage on individual compute nodes.
 MapReduce
 A MapReduce program is composed of a Map() procedure that performs
filtering and sorting (such as sorting students by first name into queues, one
queue for each name) and a Reduce() procedure that performs a summary
operation (such as counting the number of students in each queue, yielding
name frequencies).
 The "MapReduce System" (also called "infrastructure" or "framework")
orchestrates by marshalling the distributed servers, running the various tasks
in parallel, managing all communications and data transfers between the
various parts of the system, and providing for redundancy and fault tolerance.
 HDFS and MapReduce are robust. Servers in a Hadoop cluster can fail and
not abort the computation process. HDFS ensures data is replicated with
redundancy across the cluster. On completion of a calculation, a node will
write its results back into HDFS.
MicroStrategy Integration
 Cloudera and MicroStrategy have collaborated to develop a powerful and
easy-to-use BI framework for Apache Hadoop by creating a connection
between MicroStrategy 9 and CDH. This connection is established via an
Open Database Connectivity (ODBC) driver for Apache Hive and is available
as the Cloudera Connector for MicroStrategy.
 The connector allows business users to perform sophisticated point and click
analytics on data stored in Hadoop directly from MicroStrategy applications –
just as they do on data stored in data warehouses, data marts and operational
databases. MicroStrategy has developed Very Large Database Drivers
(VLDB) specifically for Cloudera that generate optimized queries for
Cloudera's Distribution including Apache Hadoop.
 The Cloudera Connector for MicroStrategy enables your enterprise users to
access Hadoop data through the Business Intelligence application
MicroStrategy 9.3.1. The driver achieves this by translating Open Database
Connectivity (ODBC) calls from MicroStrategy into SQL and passing the
SQL queries to the underlying Impala or Hive engines.
 MSTR and Cloudera together offer a connector that empowers organizations
to extract and deliver valuable insights from massive volumes of structured
and unstructured data. By providing sophisticated yet familiar reporting and
analysis tools on top of Apache Hadoop, business users can quickly and
easily unlock the potential of their data to make better business decisions.
What’s Impala
 Interactive SQL
 Typically 100x faster than Hive.
 Responses in sub-seconds.
 Nearly ANSI-92 standard SQL queries with Hive SQL
 Compatible SQL interfaces for existing Hadoop/CDH applications.
 Based on industry standard SQL.
 Natively on Hadoop/Hbase storage and metadata
 Flexibility,scale and cost advantages of Hadoop.
 No duplication/synchronization of data and metadata.
 Local processing to avoid network bottlenecks.
 Separate runtime on MapReduce
 Mapreduce is designed and great for batch.
 Impala is purpose-built for low latency SQL queries on Hadoop.
Benefits of Impala
 More and faster value from “Big Data”
 BI tools impractical on Hadoop before Impala
 Move from 10s of Hadoop users per cluster to 100s of SQL users.
 No delays from data migration
 Flexibility
 Query across existing data.
 Select best-fit file formats.
 Run multiple frameworks on the same data at the same time.
 Cost Analysis
 Reduce movement,duplicate storage & compute.
 10% to 1% the cost of analytic DBMS.
 Full Fidelity analysis
 No loss from aggregations or fixed schemas.
Project
 Integrating Hadoop-Impala with Microstrategy reporting
capabilities we developed Healthcare Management software.
 We used data stored in HDFS and Impala as Native MPP query
engine integrated in Hadoop via connector.
 Based on our requirements we made Intelligent Cubes and
directly exported to MicroStrategy.
 Using data insight visualization capabilities we are able to display
visually appealing dashboards and insightful reports.
 We have developed 3 dashboards displaying various ways of
visualizing HealthCare Management data.
Ecosystem
 Key Performance Indicator displays the total number of
issuers,employes,employers,brokers and enrollments.
 It also displays aggregated calculation of employee
income,premium/month and percentage.
 Service area displays US-statewise information of total count
using image layout widget.
 Enrollment displays heatmap of total enrollment count
corresponding to each US state.
 Employee segmentation displays grid graph display of
number of employes per segments.
Ticketing trends
 In the Ticketing dashboard,Overall Ticket Workload section
displays information about total count of support persons,open
tickets,average response days and backlog percentage.
 Open Tickets section describes waterfall widget describing total
open counts as per the issuer-type.
 It contains heatmap corresponding to average closure time and
ticket issuertype.
 It contains gauge widgets of closure time in days corresponding to
year,quarter,month and week.
 It also displays microcharts displaying count of current-status
based on issuertype.In microcharts we used sparkline and bar mode
to anaylse in different ways.
Exchange-Interactive dashboard
 It is an interactive dashboard.
 Key Performance Indicator displays information about total
service area and enrollment count corresponding to
issuername.
 By using issuername as selector it targets heat map of
enrollment displaying information of total enrollments
corresponding to each state.
 By using issuername as selector it also targets the US map
image layout widget displaying total service area count
corresponding to each state.
Stock Analysis
 Here we took the raw real-time stock data of NASDAQ and NYSE
for analysing as per our requirement.
 In the above screenshot there are 4 selectors namely
Sector,Industries,Symbol and Year.
 Industry is filtered by Sector selector and Symbol is filtered by
Sector and Industry respectively.
 All the 4 selectors will filter data to the below panel displaying
stock volatility by year,quarter,month and week.
 Panel describing grid and graph view limiting to 50 data at a time
as shown in below screenshot.
Conclusion
 User can run queries via MicroStrategy’s visual interface
without the need to write unfamiliar HiveQL or MapReduce
scripts. In essence, any user, without programming skill in
Hadoop, can ask questions against vast volumes of structured
and unstructured data to gain valuable business insights.
 It is very fast,scalable,cost effective and resilent to failure.
 Hadoop is inefficient for handling small files, and it
lacks transparent compression. As HDFS is not designed
to work well with random reads over small files due to its
optimization.
 It is used only for batch-based architecture not for real-time
data access.
 Following shared-nothing architecture so task requiring global
synchronization or sharing of mutable data doesnot fit.

Weitere ähnliche Inhalte

Was ist angesagt?

AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Cognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use CasesCognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use CasesSenturus
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationKenneth Peeples
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET Journal
 
Microstrategy for Data Engineers
Microstrategy for Data EngineersMicrostrategy for Data Engineers
Microstrategy for Data EngineersFrancesco Mucio
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
IBM - Transformation digitale et le SI des banques
IBM - Transformation digitale et le SI des banquesIBM - Transformation digitale et le SI des banques
IBM - Transformation digitale et le SI des banquesRodolphe Lezennec
 
MicroStrategy 10.2 New Features
MicroStrategy 10.2 New FeaturesMicroStrategy 10.2 New Features
MicroStrategy 10.2 New FeaturesBiBoard.Org
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTIBCO Spotfire
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services PlatformEnabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platformprajods
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRMCatherine Eibner
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z osCuneyt Goksu
 
Pentaho bi suite overview presentation
Pentaho bi suite overview   presentationPentaho bi suite overview   presentation
Pentaho bi suite overview presentationnvvrajesh
 

Was ist angesagt? (20)

Ikenstudiolive
IkenstudioliveIkenstudiolive
Ikenstudiolive
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Cognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use CasesCognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use Cases
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
JDV Big Data Webinar v2
JDV Big Data Webinar v2JDV Big Data Webinar v2
JDV Big Data Webinar v2
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
 
Microstrategy for Data Engineers
Microstrategy for Data EngineersMicrostrategy for Data Engineers
Microstrategy for Data Engineers
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
IBM - Transformation digitale et le SI des banques
IBM - Transformation digitale et le SI des banquesIBM - Transformation digitale et le SI des banques
IBM - Transformation digitale et le SI des banques
 
MicroStrategy 10.2 New Features
MicroStrategy 10.2 New FeaturesMicroStrategy 10.2 New Features
MicroStrategy 10.2 New Features
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made Easy
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services PlatformEnabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRM
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z os
 
Pentaho bi suite overview presentation
Pentaho bi suite overview   presentationPentaho bi suite overview   presentation
Pentaho bi suite overview presentation
 

Andere mochten auch

EBS and RBS in SharePoint 2010
EBS and RBS in SharePoint 2010EBS and RBS in SharePoint 2010
EBS and RBS in SharePoint 2010Chris Geier
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Ancestry Microstrategy World 2015 Presentation
Ancestry Microstrategy World 2015 PresentationAncestry Microstrategy World 2015 Presentation
Ancestry Microstrategy World 2015 PresentationDavid Sanders
 
Making Data Visualization & Analytics accessible to Business Users
Making Data Visualization & Analytics accessible to Business UsersMaking Data Visualization & Analytics accessible to Business Users
Making Data Visualization & Analytics accessible to Business UsersHaroen Vermylen
 
SharePoint Troubleshooting Tools & Techniques
SharePoint Troubleshooting Tools & TechniquesSharePoint Troubleshooting Tools & Techniques
SharePoint Troubleshooting Tools & TechniquesManuel Longo
 
2. Google Analytics New Interface - Search University 3
2. Google Analytics New Interface - Search University 32. Google Analytics New Interface - Search University 3
2. Google Analytics New Interface - Search University 3Semetis
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
MicroStrategy World 2014: Scaling MicroStrategy at eBay
MicroStrategy World 2014: Scaling MicroStrategy at eBayMicroStrategy World 2014: Scaling MicroStrategy at eBay
MicroStrategy World 2014: Scaling MicroStrategy at eBayTim Case
 
R hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveR hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveAiden Seonghak Hong
 
MicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial ServicesMicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial ServicesRK Paleru
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableTim Case
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Yale Library - Google Analytics & Tableau (5/14/2015)
Yale Library - Google Analytics & Tableau (5/14/2015)Yale Library - Google Analytics & Tableau (5/14/2015)
Yale Library - Google Analytics & Tableau (5/14/2015)Sarah Tudesco
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
A era do globalismo – resumo do livro
A era do globalismo – resumo do livroA era do globalismo – resumo do livro
A era do globalismo – resumo do livroLuci Bonini
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Bernardo Najlis
 

Andere mochten auch (18)

EBS and RBS in SharePoint 2010
EBS and RBS in SharePoint 2010EBS and RBS in SharePoint 2010
EBS and RBS in SharePoint 2010
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Ancestry Microstrategy World 2015 Presentation
Ancestry Microstrategy World 2015 PresentationAncestry Microstrategy World 2015 Presentation
Ancestry Microstrategy World 2015 Presentation
 
Making Data Visualization & Analytics accessible to Business Users
Making Data Visualization & Analytics accessible to Business UsersMaking Data Visualization & Analytics accessible to Business Users
Making Data Visualization & Analytics accessible to Business Users
 
SharePoint Troubleshooting Tools & Techniques
SharePoint Troubleshooting Tools & TechniquesSharePoint Troubleshooting Tools & Techniques
SharePoint Troubleshooting Tools & Techniques
 
2. Google Analytics New Interface - Search University 3
2. Google Analytics New Interface - Search University 32. Google Analytics New Interface - Search University 3
2. Google Analytics New Interface - Search University 3
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
MicroStrategy World 2014: Scaling MicroStrategy at eBay
MicroStrategy World 2014: Scaling MicroStrategy at eBayMicroStrategy World 2014: Scaling MicroStrategy at eBay
MicroStrategy World 2014: Scaling MicroStrategy at eBay
 
R hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhiveR hive tutorial supplement 3 - Rstudio-server setup for rhive
R hive tutorial supplement 3 - Rstudio-server setup for rhive
 
MicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial ServicesMicroStrategy Business Intelligence Solutions for Financial Services
MicroStrategy Business Intelligence Solutions for Financial Services
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner Cable
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Yale Library - Google Analytics & Tableau (5/14/2015)
Yale Library - Google Analytics & Tableau (5/14/2015)Yale Library - Google Analytics & Tableau (5/14/2015)
Yale Library - Google Analytics & Tableau (5/14/2015)
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
WISC IV
WISC IVWISC IV
WISC IV
 
A era do globalismo – resumo do livro
A era do globalismo – resumo do livroA era do globalismo – resumo do livro
A era do globalismo – resumo do livro
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 

Ähnlich wie Hadoop Integration with Microstrategy

Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdfavenkatram
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCPBlibBlobb
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 

Ähnlich wie Hadoop Integration with Microstrategy (20)

Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop paper
Hadoop paperHadoop paper
Hadoop paper
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Big data
Big dataBig data
Big data
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 

Kürzlich hochgeladen

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Kürzlich hochgeladen (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Hadoop Integration with Microstrategy

  • 2. Hadoop  Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.  It makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes.  Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure.  This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
  • 3. Why Hadoop?  Scalibility  Simply scales just by adding nodes.  Local processing to avoid network bottlenecks. • Flexibility  All kinds of data.(blobs,documents,records etc).  In all forms(structured,semi-structured,structured)  Store anything and later analyze what you need. • Efficiency  Cost efficiency(<1$kb/Tb) on commodity hardware.  Unified storage,metadata,security(no duplication or synchronization)
  • 4. Core parts of Hadoop  Hadoop Distributed File System(HDFS)  It is the primary storage system used by Hadoop applications.  HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.  When HDFS takes in data, it breaks the information down into separate pieces and distributes them to different nodes in a cluster, allowing for parallel processing. The file system also copies each piece of data multiple times and distributes the copies to individual nodes, placing at least one copy on a different server rack than the others. As a result, the data on nodes that crash can be found elsewhere within a cluster, which allows processing to continue while the failure is resolved.  HDFS is built to support applications with large data sets, including individual files that reach into the terabytes. It uses a master/slave architecture, with each cluster consisting of a single NameNode that manages file system operations and supporting DataNodes that manage data storage on individual compute nodes.
  • 5.  MapReduce  A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).  The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.  HDFS and MapReduce are robust. Servers in a Hadoop cluster can fail and not abort the computation process. HDFS ensures data is replicated with redundancy across the cluster. On completion of a calculation, a node will write its results back into HDFS.
  • 6. MicroStrategy Integration  Cloudera and MicroStrategy have collaborated to develop a powerful and easy-to-use BI framework for Apache Hadoop by creating a connection between MicroStrategy 9 and CDH. This connection is established via an Open Database Connectivity (ODBC) driver for Apache Hive and is available as the Cloudera Connector for MicroStrategy.  The connector allows business users to perform sophisticated point and click analytics on data stored in Hadoop directly from MicroStrategy applications – just as they do on data stored in data warehouses, data marts and operational databases. MicroStrategy has developed Very Large Database Drivers (VLDB) specifically for Cloudera that generate optimized queries for Cloudera's Distribution including Apache Hadoop.
  • 7.  The Cloudera Connector for MicroStrategy enables your enterprise users to access Hadoop data through the Business Intelligence application MicroStrategy 9.3.1. The driver achieves this by translating Open Database Connectivity (ODBC) calls from MicroStrategy into SQL and passing the SQL queries to the underlying Impala or Hive engines.  MSTR and Cloudera together offer a connector that empowers organizations to extract and deliver valuable insights from massive volumes of structured and unstructured data. By providing sophisticated yet familiar reporting and analysis tools on top of Apache Hadoop, business users can quickly and easily unlock the potential of their data to make better business decisions.
  • 8. What’s Impala  Interactive SQL  Typically 100x faster than Hive.  Responses in sub-seconds.  Nearly ANSI-92 standard SQL queries with Hive SQL  Compatible SQL interfaces for existing Hadoop/CDH applications.  Based on industry standard SQL.  Natively on Hadoop/Hbase storage and metadata  Flexibility,scale and cost advantages of Hadoop.  No duplication/synchronization of data and metadata.  Local processing to avoid network bottlenecks.  Separate runtime on MapReduce  Mapreduce is designed and great for batch.  Impala is purpose-built for low latency SQL queries on Hadoop.
  • 9. Benefits of Impala  More and faster value from “Big Data”  BI tools impractical on Hadoop before Impala  Move from 10s of Hadoop users per cluster to 100s of SQL users.  No delays from data migration  Flexibility  Query across existing data.  Select best-fit file formats.  Run multiple frameworks on the same data at the same time.  Cost Analysis  Reduce movement,duplicate storage & compute.  10% to 1% the cost of analytic DBMS.  Full Fidelity analysis  No loss from aggregations or fixed schemas.
  • 10. Project  Integrating Hadoop-Impala with Microstrategy reporting capabilities we developed Healthcare Management software.  We used data stored in HDFS and Impala as Native MPP query engine integrated in Hadoop via connector.  Based on our requirements we made Intelligent Cubes and directly exported to MicroStrategy.  Using data insight visualization capabilities we are able to display visually appealing dashboards and insightful reports.  We have developed 3 dashboards displaying various ways of visualizing HealthCare Management data.
  • 12.  Key Performance Indicator displays the total number of issuers,employes,employers,brokers and enrollments.  It also displays aggregated calculation of employee income,premium/month and percentage.  Service area displays US-statewise information of total count using image layout widget.  Enrollment displays heatmap of total enrollment count corresponding to each US state.  Employee segmentation displays grid graph display of number of employes per segments.
  • 14.  In the Ticketing dashboard,Overall Ticket Workload section displays information about total count of support persons,open tickets,average response days and backlog percentage.  Open Tickets section describes waterfall widget describing total open counts as per the issuer-type.  It contains heatmap corresponding to average closure time and ticket issuertype.  It contains gauge widgets of closure time in days corresponding to year,quarter,month and week.  It also displays microcharts displaying count of current-status based on issuertype.In microcharts we used sparkline and bar mode to anaylse in different ways.
  • 16.  It is an interactive dashboard.  Key Performance Indicator displays information about total service area and enrollment count corresponding to issuername.  By using issuername as selector it targets heat map of enrollment displaying information of total enrollments corresponding to each state.  By using issuername as selector it also targets the US map image layout widget displaying total service area count corresponding to each state.
  • 18.  Here we took the raw real-time stock data of NASDAQ and NYSE for analysing as per our requirement.  In the above screenshot there are 4 selectors namely Sector,Industries,Symbol and Year.  Industry is filtered by Sector selector and Symbol is filtered by Sector and Industry respectively.  All the 4 selectors will filter data to the below panel displaying stock volatility by year,quarter,month and week.  Panel describing grid and graph view limiting to 50 data at a time as shown in below screenshot.
  • 19. Conclusion  User can run queries via MicroStrategy’s visual interface without the need to write unfamiliar HiveQL or MapReduce scripts. In essence, any user, without programming skill in Hadoop, can ask questions against vast volumes of structured and unstructured data to gain valuable business insights.  It is very fast,scalable,cost effective and resilent to failure.  Hadoop is inefficient for handling small files, and it lacks transparent compression. As HDFS is not designed to work well with random reads over small files due to its optimization.  It is used only for batch-based architecture not for real-time data access.  Following shared-nothing architecture so task requiring global synchronization or sharing of mutable data doesnot fit.

Hinweis der Redaktion

  1. ope