SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Big Data
Technologies for Enterprise Analytics
Big Data
Technologies
Classification of Big Data technologies
Apache Hadoop
Pentaho & Big Data
Enterprise Analytics
About StrateBI
Big Data
Big Data
We understand Big Data as the result of the following changes
that are taking place in the data managed by organizations
The increased Volume of the data available in companies
From Terabytes (103 Gb) to Petabytes (106)
The significant increase in the Variety or heterogeneity of data
sources available
Structured, Semi structured and Unstructured data must be processed
Increased Velocity of generation and distribution of data sources
The above are the main questions to determine if we have a Big
Data scenario
Big Data
Big Data technologies
Business intelligence (BI) traditional tools and processes have
been overtaken by the nature of Big Data
This situation has led to the rise and development of a wide
range of technologies for Big Data management
Most of current Big Data technologies are Open Source
Know-How: A major problem
Which technologies use on each Big Data scenario?
How to combine them to be successful and monetize Big Data
management?
Big Data
Big Data
Classification of Big Data technologies
Big Data technologies fall into 3 groups
Big Data
Classification of Big Data technologies
Apache Hadoop:
A framework that allows for the distributed processing of Big
Data
Commodity cluster computing: It is designed to scale up
from single servers to thousands of machines
More general approach than the other Big Data
technologies:
Simple programming models for supporting a wide range of
applications: MapReduce, Tez, Hive, Pig, Spark...
Applications: Ingestion, Processing (Batch & Real Time), ETL,
SQL, Machine Learning, NoSQL, Reporting, OLAP…
Big Data
Classification of Big Data technologies
Apache Hadoop in its most basic form consists of:
HDFS: A distributed file system
YARN: A framework for job scheduling and cluster resource
management
MapReduce: A YARN-based system for parallel processing of
large data sets
Big Data
Classification of Big Data technologies
NoSQL databases
Storing and querying especially for semi-structured data
Usually they implement distributed storage and processing
Aimed to replace the operational databases in Big Data scenarios:
Less general approach than Hadoop
Some form of support for transaction management
Optimized for random reads and writes
Big Data
Classification of Big Data technologies
Extended RDBMS
Add features to traditional databases for storing and processing
huge volumes of relational information (mainly structured data)
Including libraries of advanced analytical functions and supporting User
Defined Functions (UDF)
Usually they allows for distributed storage or processing
Some of them implements columnar storage: Optimized for analytical
workload (sums, counts, averages, maximums,…)
One important subtype are MPP (Massive Parallel Processing)
databases
HP Vertica, Pivotal Greemplum
Well suited for OLAP applications
Big Data
Classification of Big Data technologies
An alternative classification: based on their role in a Big Data
architecture
Big Data
Ingestion Storage Processing Orchestration Analysis Visualization
We provide the best technology for each application
1. Enterprise Data Warehouse Extension:
Big Data scenarios in where we would like to implement low latency
analytics such as OLAP, dashboard, reporting,…
Big Data
We provide the best technology for each application
2. Website clickstream analysis :
Big Data
We provide the best technology for each application
2. Website clickstream analysis – Visualization Technologies
Apache Zeppelin
http://zeppelin-project.org/demo.html
Big Data
We provide the best technology for each application
3. Real Time analytics
Data streams processing, instead of static data sets, as in the batch
processing
Big Data
Syslog
Source
Avro Sink
Kafka
Channel
HDFS Sink
HBase Sink
Others
Sinks
Real Time
Processing
Persistence
Visualizations
for analysis
Apache
HTTP
Server 1
Apache
HTTP
Server 2
Apache
HTTP
Server N
We provide the best technology for each application
3. Real Time analytics – Processing Technologies
Big Data
Interceptor Trident API
Processing latency 0,05 a 0,5 sec 0,05 a 0,5 sec 0,5 a 30 sec 0,5 a 30 sec
Agreggations and
Windowing averages
Yes, but not Fault-
Tolerant
Not supported Yes, Faul-Tolerant Yes, Faul-Tolerant
Record level
enrichment and alerts
Yes Yes Yes Yes
Persistence of
transient data
Yes, but poor
performance
Yes, high performance
with HDFS, Hbase…
Yes, high performance
with HDFS, HBase…
Yes, high performance
with HDFS, HBase…
High-Level Functions No. It requires a lot of
code
Yes. Very simple,
configuration-based tool
Yes. Joins, aggregations,
.... Easier programming
than Storm
Yes, a lot of libraries of
functions. Easier
programming than
Storm and Trident.
Reliability Duplicates and data loss More reliable than
Storm and Trident
More reliable than
Storm
More reliable than
Storm and Trident
We provide the best technology for each application
3. Real Time analytics – Visualization Technologies
JavaScript Charts libraries (D3, Highcharts…) using Sockets connections
Big Data
We provide the best technology for each application
3. Real Time analytics – Visualization Technologies
JavaScript Charts libraries (D3, Highcharts…) using Sockets connections
Big Data
We provide the best technology for each application
3. Real Time analytics – A StrateBI case study
Wikipedia updates – Demo StrateBI
http://bigdata.stratebi.com/
Big Data
We provide the best technology for each application
3. Real Time analytics – More Technologies
Apache Hue + Solr
Big Data
Syslog
Source
Solr Sink
Kafka
Channel
Solr
Real Time
Indexing
Hue
Visualizations
for analysis
Apache
HTTP
Server 1
Apache
HTTP
Server 2
Apache
HTTP
Server N
We provide the best technology for each application
3. Real Time analytics – More Technologies
Apache Hue + Solr
Big Data
We provide the best technology for each application
4. Fraud detection system:
Big Data
Hadoop Distributions
Separately installation and maintenance of Hadoop tools may
become a serious issue
Hadoop Distributions: Software package that includes the basic
Hadoop components, along with others common and useful tools
of the current Hadoop Stack
In some cases distributions adds improvements or, even, not Open
Source tools (e.g. Cloudera Manager)
Main benefits
Packages or installer: Easy to install Hadoop on different operating
systems such as Ubuntu, CentOS, Debian, Windows Server ...
Easy patch management
Big Data
Hadoop distributions recommended by StrateBI
Hortonworks HDP: http://hortonworks.com/
The only 100% Open Source Hadoop Distribution
Only includes the latest stable versions of Hadoop stack tools
Big Data
Hadoop distributions recommended by StrateBI
Cloudera: http://www.cloudera.com
Express (free) and Enterprise (comercial) versions
They include tools improvements that have not yet been
incorporated into Apache open source projects
Cloudera Manager: A proprietary tool for Hadoop cluster
management and monitoring
Quite good and very reliable tool
In its free version it does not support some features that Apache
Ambari does support for cluster management in Hortonworks
Users and roles definition, LDAP integration, management of
some Hadoop services (Impala, Spark, etc ...), hot updates of
cluster tools...
Big Data
Pentaho & Big Data
The suite of Business Intelligence Pentaho has added improved
support for Big Data management, processing and visualization
Pentaho Data Integration
Visual and powerful ETL design and execution tool
Pentaho Reporting Designer
For creating static and parametrized reports
Pentaho Metadata Editor
To define metadata for Ad-Hoc reporting applications (e.g. STReport)
Pentaho BI Server
For developing and sharing reports, dashboards (e.g. STDashboard) and
OLAP Analysis (e.g. STPivot)
Big Data
Big Data
Pentaho & Big Data
Pentaho Data Integration 6.X
Fully integration with most common Hadoop Distributions
Cloudera 5.X, Hortonworks 2.X, Map R
Functionalities
ETL in-cluster execution: Pentaho automatically generates and launches
MapReduce code in the cluster
Reading, processing and writing data and files from and to HDFS
Processes Orchestration: MapReduce, Pig, Sqoop, Spark, Oozie
JDBC Connection with Apache and Apache Hive Impala
PDI has also support for NoSQL databases
Hbase, Mongo DB, Cassandra (up to version 2.1)
Big Data
Big Data
Hadoop cluster
connections
management
Transformations Steps
for data movement and
transformations
Jobs Entries for
Orchestration
Big Data
Some Big Data success stories:
Democratic Party presidential campaigns (Barack Obama)
Data integration from surveys, social networks, members database..
High accuracy in forecasting results per geographic area (> 99%)
Better management of campaign events, advertising placement ...
They won presidential elections in 2008 and 2012
Amazon recommendation system
Big Data
Some Big Data success stories:
Banks and insurance companies as Morgan Stanley and ING
Direct have adopted Big Data:
Fraud detection, risk analysis in loans and insurance, customer churn
prevention, ...
The UPS package delivery company invests $ 1 million a year in
Big Data
Uses the data generated by the sensors installed in their vehicles to optimize
the route / fuel consumption, maintenance, CO2 emissions ...
UPS saves 50 million dollars in gasoline a year through its management of
Big Data
Big Data
Some Big Data success stories:
T-Mobile USA uses Big Data to reduce churn rate
By integrating data from billing, calls and social networks
All raw data is being stored in a Hadoop Data Lake
Generates a 360 degree view of each customer used to attack
customer dissatisfaction
“Tribal” customer model
Identifying people who have high influence on others due to their large
social network  If this client switches telecom provider, it could
cause a domino effect
Customer Lifetime Value is calculated for each of these customers
Big Data
Some Big Data success stories:
T-Mobile USA uses Big Data to reduce churn rate
Churn expectancy of a customer is based on different analyses
Billing analysis: Where and how long a user calls or text with whom.
Calls going to different provider could indicate that social network of
the customer is switching
Drop call analysis: For example, proactively detect if the user has
limited coverage is his geographical area of usual movement to offer
solutions, such a new phone or a femtocell to extend coverage in
indoors locations
Sentiment analysis: Social network data combined with other data
collect from customer such as surveys or previous client complains
As a result, T- Mobile down churn rates by 50% in just one
quarter
Big Data
StrateBI & Big Data success stories:
StrateBI has successfully applied the previously discussed Big Data
technologies:
Big Data analysis for decision making in agriculture
Real time data generated by sensors installed in farms is ingested and
integrated with weather data sources, in order to generate alerts and
obtaining predictions
Social Network analysis
Technological surveillance for a security company
Detection and prevention of attacks or dangerous scenarios, by
analyzing data from social networks combined with customer data
Detecting trends in social networking for business digital content
management
Intelligent publishing content
Big Data
Real time analysis of Big Data for decision making in agriculture
Big Data
Analysis of data generated by a field of solar panels
Big Data
Detecting trends in social networking
Big Data
Why StrateBI for Big Data projects?
Big Data recognized specialists in Spain (Hadoop, Spark, Hive,
Flume, Hortonworks, Cloudera, Cassandra, HP Vertica…)
Backed by our projects and training performed with companies
such as Boeing, Telefónica Educación Digital (TED), Gobierno de
España, Schibsted Group, Prosegur, INCIBE (National Institute of
Cybersecurity)…
Spanish leaders of Open Source BI (Pentaho, Talend,
Mondrian, Ctools, Saiku…)
StrateBI has lead to production a hundreds of Business
Intelligence systems with Pentaho for large companies such as
BBVA, Telefónica, Globalia, Prosegur, ALD, Gobiernos de La
Rioja, Extremadura, Baleares, Eroski, Equifax, Unilever, Amnistía
Internacional, Caixa De Enginyers, Schibsted, etc…
About Us
Private Sector
About Us
Public Sector
About Us
www.TodoBI.com
info@stratebi.com
www.stratebi.com
More Info
Tel: 91.788.34.10
Madrid: Avenida de Brasil, 17, Planta 16
Barcelona: C/ Valencia, 63
Brasil: Av. Paulista, 37 4 andar
About Us

Weitere ähnliche Inhalte

Was ist angesagt?

BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUBAhmed Salman
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
 
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Gihan Wikramanayake
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 

Was ist angesagt? (20)

BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
Bigdata
Bigdata Bigdata
Bigdata
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 

Andere mochten auch

Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningStratebi
 
Referencias Stratebi
Referencias StratebiReferencias Stratebi
Referencias StratebiStratebi
 
53 Claves para conocer Machine Learning
53 Claves para conocer Machine Learning53 Claves para conocer Machine Learning
53 Claves para conocer Machine LearningStratebi
 
Introduccion a Machine Learning
Introduccion a Machine LearningIntroduccion a Machine Learning
Introduccion a Machine LearningStratebi
 
Big Data para Dummies
Big Data para DummiesBig Data para Dummies
Big Data para DummiesStratebi
 
69 claves para conocer Big Data
69 claves para conocer Big Data69 claves para conocer Big Data
69 claves para conocer Big DataStratebi
 

Andere mochten auch (6)

Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine Learning
 
Referencias Stratebi
Referencias StratebiReferencias Stratebi
Referencias Stratebi
 
53 Claves para conocer Machine Learning
53 Claves para conocer Machine Learning53 Claves para conocer Machine Learning
53 Claves para conocer Machine Learning
 
Introduccion a Machine Learning
Introduccion a Machine LearningIntroduccion a Machine Learning
Introduccion a Machine Learning
 
Big Data para Dummies
Big Data para DummiesBig Data para Dummies
Big Data para Dummies
 
69 claves para conocer Big Data
69 claves para conocer Big Data69 claves para conocer Big Data
69 claves para conocer Big Data
 

Ähnlich wie Stratebi Big Data

Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRushtempledf
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 

Ähnlich wie Stratebi Big Data (20)

Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
paper
paperpaper
paper
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop
HadoopHadoop
Hadoop
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Hadoop
HadoopHadoop
Hadoop
 

Mehr von Stratebi

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentesStratebi
 
Azure Synapse
Azure SynapseAzure Synapse
Azure SynapseStratebi
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with PythonStratebi
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with PythonStratebi
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasStratebi
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup SpainStratebi
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)Stratebi
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integrationStratebi
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data MarketingStratebi
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works Stratebi
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data AnalyticsStratebi
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosStratebi
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports AnalyticsStratebi
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme AnalysisStratebi
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIStratebi
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overviewStratebi
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalleStratebi
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con TalendStratebi
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend IntroducionStratebi
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent AnalyticsStratebi
 

Mehr von Stratebi (20)

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentes
 
Azure Synapse
Azure SynapseAzure Synapse
Azure Synapse
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with Python
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with Python
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicas
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup Spain
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integration
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data Marketing
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y Cursos
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports Analytics
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme Analysis
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBI
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalle
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con Talend
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend Introducion
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent Analytics
 

Kürzlich hochgeladen

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 

Kürzlich hochgeladen (16)

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 

Stratebi Big Data

  • 1. Big Data Technologies for Enterprise Analytics
  • 2. Big Data Technologies Classification of Big Data technologies Apache Hadoop Pentaho & Big Data Enterprise Analytics About StrateBI Big Data
  • 3. Big Data We understand Big Data as the result of the following changes that are taking place in the data managed by organizations The increased Volume of the data available in companies From Terabytes (103 Gb) to Petabytes (106) The significant increase in the Variety or heterogeneity of data sources available Structured, Semi structured and Unstructured data must be processed Increased Velocity of generation and distribution of data sources The above are the main questions to determine if we have a Big Data scenario Big Data
  • 4. Big Data technologies Business intelligence (BI) traditional tools and processes have been overtaken by the nature of Big Data This situation has led to the rise and development of a wide range of technologies for Big Data management Most of current Big Data technologies are Open Source Know-How: A major problem Which technologies use on each Big Data scenario? How to combine them to be successful and monetize Big Data management? Big Data
  • 6. Classification of Big Data technologies Big Data technologies fall into 3 groups Big Data
  • 7. Classification of Big Data technologies Apache Hadoop: A framework that allows for the distributed processing of Big Data Commodity cluster computing: It is designed to scale up from single servers to thousands of machines More general approach than the other Big Data technologies: Simple programming models for supporting a wide range of applications: MapReduce, Tez, Hive, Pig, Spark... Applications: Ingestion, Processing (Batch & Real Time), ETL, SQL, Machine Learning, NoSQL, Reporting, OLAP… Big Data
  • 8. Classification of Big Data technologies Apache Hadoop in its most basic form consists of: HDFS: A distributed file system YARN: A framework for job scheduling and cluster resource management MapReduce: A YARN-based system for parallel processing of large data sets Big Data
  • 9. Classification of Big Data technologies NoSQL databases Storing and querying especially for semi-structured data Usually they implement distributed storage and processing Aimed to replace the operational databases in Big Data scenarios: Less general approach than Hadoop Some form of support for transaction management Optimized for random reads and writes Big Data
  • 10. Classification of Big Data technologies Extended RDBMS Add features to traditional databases for storing and processing huge volumes of relational information (mainly structured data) Including libraries of advanced analytical functions and supporting User Defined Functions (UDF) Usually they allows for distributed storage or processing Some of them implements columnar storage: Optimized for analytical workload (sums, counts, averages, maximums,…) One important subtype are MPP (Massive Parallel Processing) databases HP Vertica, Pivotal Greemplum Well suited for OLAP applications Big Data
  • 11. Classification of Big Data technologies An alternative classification: based on their role in a Big Data architecture Big Data Ingestion Storage Processing Orchestration Analysis Visualization
  • 12. We provide the best technology for each application 1. Enterprise Data Warehouse Extension: Big Data scenarios in where we would like to implement low latency analytics such as OLAP, dashboard, reporting,… Big Data
  • 13. We provide the best technology for each application 2. Website clickstream analysis : Big Data
  • 14. We provide the best technology for each application 2. Website clickstream analysis – Visualization Technologies Apache Zeppelin http://zeppelin-project.org/demo.html Big Data
  • 15. We provide the best technology for each application 3. Real Time analytics Data streams processing, instead of static data sets, as in the batch processing Big Data Syslog Source Avro Sink Kafka Channel HDFS Sink HBase Sink Others Sinks Real Time Processing Persistence Visualizations for analysis Apache HTTP Server 1 Apache HTTP Server 2 Apache HTTP Server N
  • 16. We provide the best technology for each application 3. Real Time analytics – Processing Technologies Big Data Interceptor Trident API Processing latency 0,05 a 0,5 sec 0,05 a 0,5 sec 0,5 a 30 sec 0,5 a 30 sec Agreggations and Windowing averages Yes, but not Fault- Tolerant Not supported Yes, Faul-Tolerant Yes, Faul-Tolerant Record level enrichment and alerts Yes Yes Yes Yes Persistence of transient data Yes, but poor performance Yes, high performance with HDFS, Hbase… Yes, high performance with HDFS, HBase… Yes, high performance with HDFS, HBase… High-Level Functions No. It requires a lot of code Yes. Very simple, configuration-based tool Yes. Joins, aggregations, .... Easier programming than Storm Yes, a lot of libraries of functions. Easier programming than Storm and Trident. Reliability Duplicates and data loss More reliable than Storm and Trident More reliable than Storm More reliable than Storm and Trident
  • 17. We provide the best technology for each application 3. Real Time analytics – Visualization Technologies JavaScript Charts libraries (D3, Highcharts…) using Sockets connections Big Data
  • 18. We provide the best technology for each application 3. Real Time analytics – Visualization Technologies JavaScript Charts libraries (D3, Highcharts…) using Sockets connections Big Data
  • 19. We provide the best technology for each application 3. Real Time analytics – A StrateBI case study Wikipedia updates – Demo StrateBI http://bigdata.stratebi.com/ Big Data
  • 20. We provide the best technology for each application 3. Real Time analytics – More Technologies Apache Hue + Solr Big Data Syslog Source Solr Sink Kafka Channel Solr Real Time Indexing Hue Visualizations for analysis Apache HTTP Server 1 Apache HTTP Server 2 Apache HTTP Server N
  • 21. We provide the best technology for each application 3. Real Time analytics – More Technologies Apache Hue + Solr Big Data
  • 22. We provide the best technology for each application 4. Fraud detection system: Big Data
  • 23. Hadoop Distributions Separately installation and maintenance of Hadoop tools may become a serious issue Hadoop Distributions: Software package that includes the basic Hadoop components, along with others common and useful tools of the current Hadoop Stack In some cases distributions adds improvements or, even, not Open Source tools (e.g. Cloudera Manager) Main benefits Packages or installer: Easy to install Hadoop on different operating systems such as Ubuntu, CentOS, Debian, Windows Server ... Easy patch management Big Data
  • 24. Hadoop distributions recommended by StrateBI Hortonworks HDP: http://hortonworks.com/ The only 100% Open Source Hadoop Distribution Only includes the latest stable versions of Hadoop stack tools Big Data
  • 25. Hadoop distributions recommended by StrateBI Cloudera: http://www.cloudera.com Express (free) and Enterprise (comercial) versions They include tools improvements that have not yet been incorporated into Apache open source projects Cloudera Manager: A proprietary tool for Hadoop cluster management and monitoring Quite good and very reliable tool In its free version it does not support some features that Apache Ambari does support for cluster management in Hortonworks Users and roles definition, LDAP integration, management of some Hadoop services (Impala, Spark, etc ...), hot updates of cluster tools... Big Data
  • 26. Pentaho & Big Data The suite of Business Intelligence Pentaho has added improved support for Big Data management, processing and visualization Pentaho Data Integration Visual and powerful ETL design and execution tool Pentaho Reporting Designer For creating static and parametrized reports Pentaho Metadata Editor To define metadata for Ad-Hoc reporting applications (e.g. STReport) Pentaho BI Server For developing and sharing reports, dashboards (e.g. STDashboard) and OLAP Analysis (e.g. STPivot) Big Data
  • 28. Pentaho & Big Data Pentaho Data Integration 6.X Fully integration with most common Hadoop Distributions Cloudera 5.X, Hortonworks 2.X, Map R Functionalities ETL in-cluster execution: Pentaho automatically generates and launches MapReduce code in the cluster Reading, processing and writing data and files from and to HDFS Processes Orchestration: MapReduce, Pig, Sqoop, Spark, Oozie JDBC Connection with Apache and Apache Hive Impala PDI has also support for NoSQL databases Hbase, Mongo DB, Cassandra (up to version 2.1) Big Data
  • 29. Big Data Hadoop cluster connections management Transformations Steps for data movement and transformations Jobs Entries for Orchestration
  • 31. Some Big Data success stories: Democratic Party presidential campaigns (Barack Obama) Data integration from surveys, social networks, members database.. High accuracy in forecasting results per geographic area (> 99%) Better management of campaign events, advertising placement ... They won presidential elections in 2008 and 2012 Amazon recommendation system Big Data
  • 32. Some Big Data success stories: Banks and insurance companies as Morgan Stanley and ING Direct have adopted Big Data: Fraud detection, risk analysis in loans and insurance, customer churn prevention, ... The UPS package delivery company invests $ 1 million a year in Big Data Uses the data generated by the sensors installed in their vehicles to optimize the route / fuel consumption, maintenance, CO2 emissions ... UPS saves 50 million dollars in gasoline a year through its management of Big Data Big Data
  • 33. Some Big Data success stories: T-Mobile USA uses Big Data to reduce churn rate By integrating data from billing, calls and social networks All raw data is being stored in a Hadoop Data Lake Generates a 360 degree view of each customer used to attack customer dissatisfaction “Tribal” customer model Identifying people who have high influence on others due to their large social network  If this client switches telecom provider, it could cause a domino effect Customer Lifetime Value is calculated for each of these customers Big Data
  • 34. Some Big Data success stories: T-Mobile USA uses Big Data to reduce churn rate Churn expectancy of a customer is based on different analyses Billing analysis: Where and how long a user calls or text with whom. Calls going to different provider could indicate that social network of the customer is switching Drop call analysis: For example, proactively detect if the user has limited coverage is his geographical area of usual movement to offer solutions, such a new phone or a femtocell to extend coverage in indoors locations Sentiment analysis: Social network data combined with other data collect from customer such as surveys or previous client complains As a result, T- Mobile down churn rates by 50% in just one quarter Big Data
  • 35. StrateBI & Big Data success stories: StrateBI has successfully applied the previously discussed Big Data technologies: Big Data analysis for decision making in agriculture Real time data generated by sensors installed in farms is ingested and integrated with weather data sources, in order to generate alerts and obtaining predictions Social Network analysis Technological surveillance for a security company Detection and prevention of attacks or dangerous scenarios, by analyzing data from social networks combined with customer data Detecting trends in social networking for business digital content management Intelligent publishing content Big Data
  • 36. Real time analysis of Big Data for decision making in agriculture Big Data
  • 37. Analysis of data generated by a field of solar panels Big Data
  • 38. Detecting trends in social networking Big Data
  • 39. Why StrateBI for Big Data projects? Big Data recognized specialists in Spain (Hadoop, Spark, Hive, Flume, Hortonworks, Cloudera, Cassandra, HP Vertica…) Backed by our projects and training performed with companies such as Boeing, Telefónica Educación Digital (TED), Gobierno de España, Schibsted Group, Prosegur, INCIBE (National Institute of Cybersecurity)… Spanish leaders of Open Source BI (Pentaho, Talend, Mondrian, Ctools, Saiku…) StrateBI has lead to production a hundreds of Business Intelligence systems with Pentaho for large companies such as BBVA, Telefónica, Globalia, Prosegur, ALD, Gobiernos de La Rioja, Extremadura, Baleares, Eroski, Equifax, Unilever, Amnistía Internacional, Caixa De Enginyers, Schibsted, etc… About Us
  • 42. www.TodoBI.com info@stratebi.com www.stratebi.com More Info Tel: 91.788.34.10 Madrid: Avenida de Brasil, 17, Planta 16 Barcelona: C/ Valencia, 63 Brasil: Av. Paulista, 37 4 andar About Us

Hinweis der Redaktion

  1. NoSQL: Bases de datos para el almacenamiento y consulta de datos, principalmente semi estructurados Soporte para transacciones y optimizada para lecturas y escrituras aleatorias  Aplicaciones operacionales
  2. http://5.196.203.197:8080
  3. Referencias y Datos de Contacto