SlideShare a Scribd company logo
1 of 28
Download to read offline
Big Trends
in

Big Data
2013 AITP Region-5 Technical Conference

-Naresh Chintalcheru
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Apache Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Batch to Real Time

Changing image of Big Data from Batch to Real Time
Hadoop + MapReduce = Batch Processing
Batch to Real Time
● Companies need real time processing of Big Data for
various applications including online Fraud Detection,
CEP (Complex Event Processing) and more.
● Emerging new frameworks, architectures and tools are
making the real time processing dream come true.
Big Data Real-Time Computing Systems
● Twitter’s Storm is an open source, distributed, faulttolerant and real time computation system.
○ Storm is a stream processing system
○ Unlike Hadoop jobs Strom jobs never stop continue
to process data as it arrives
● Other Real Time systems include Streambase,
HStreaming, Apache S4, Dempsy and Esper.
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data Sql Tools
Big Data Processing include ...
● Writing complex Java MapReduce Jobs
● Apache Pig Latin scripting
● Slow Sql processing from Apache Hive
Big Data Sql Tools
Inspired with Google’s Dremel paper now many vendors
offer faster SQL based tools
● Google BigQuery
● Cloudera Impala
● IBM BigSql
● Greenplum HAWQ
● Hortonworks Stinger (Improve Hive Sql by x100)
● Apache Drill
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data And Cloud
Big Data needs many computing nodes for Data Storage
and Data Processing which are elastic in nature …
● Cloud VM based computing is a perfect solution for
Big Data infrastructure
● Public Cloud MegaStar Amazon AWS announced
support for Hadoop, which means spin off Hadoop
installed VM with basic configuration in 10mins
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Hadoop 2.0
New in Hadoop 2x
● Improved Performance with YARN aka MapReduce 2.0
● Improved Scalability with HDFS Federation
● Support for Microsoft Windows
● Improved Security
● HDFS Snapshots
Hadoop 2.0 - Performance
Improved Performance with YARN aka MapReduce 2.0
● MapReduce JobTracker managed both Resource
management and App Job life-cycle together before.
● Now two functions are divided into separate
components.
● Application Master negotiates with global Resource
Manager for various Job requests
Hadoop 2.0 - Scalability
HDFS Federation
● No more single NameNode(NN) and SNN.
● HDFS Federation supports multiple independent
NameNodes and Namespaces.
● Each DataNode(DN) registers with all the NameNodes in
the cluster. DN sends periodic heartbeats & block
reports and handle commands from all NN.
Hadoop 2.0 - Security
Improved Security
● Enforcement of HDFS file permission by NN and Access
Control List (ACL) of users and groups
● Block Access Tokens for access control to Data block.
● Job Tokens to enforce Task authorization
● Network Encryption & Kerberos RPC. Now HDFS file
transfer can be configured for encryption
Hadoop 2.0 - HDFS Snapshots
Improved Backup & Disaster Recovery
● HDFS Snapshots are read-only point-in-time copies of
the file system.
● Snapshots can be taken on a subtree or entire file
system.
● Useful for data backup, protection against user errors
and disaster recovery
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data Applications
● Infrastructure layer of Big Data is largely solved (.........
secret Hadoop)
● Now the future innovation is focused on applications and
analytics
Big Data Analytic Applications
Pattern Discovery and Sense-Making based analytic
applications.
● Wibi Data: Lessons learned and predictive apps
● Recorded Future: Web intelligence for Business decisions
● Nutonian: Uncovers relationships hidden with in complex
data
● R Studio: Data analysis tool
Big Data - Visualization Applications
Sophisticated Big Data Visualization tools.
● IBM BigSheets
● D3.js
● Fathom
● Processing.org
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data & Business Intelligence
Support from various BI vendors IBM Cognos, SAP Business
Objects & Oracle Hyperion to connect directly to Hadoop Data
using Apache Hive connectors.
Big Data & Data Warehouse
Challenge of new multiple unstructured data sources such as
Clickstreams, Social media, Mobile, Sensors and Web Logs
requires massive processing and traditional data warehouse
cost to scale.
The Big question is data warehouse survive the Big Data ?
More on this in my next presentation :)
Agenda - Big Data Trends
●
●
●
●

Batch to Real Time
Sql, Sql, Sql …
Cloud Platform Support
Hadoop 2.0
○
○
○

Improved Performance
Improved Scalability
Improved Security

● Applications
○
○

Pattern Discovery Analytics
Sophisticated Visualization

● BI & Data Warehouse
● Big Data Vision
Big Data Vision

Big Data requires a Big Vision
Big Data requires Big Vision
● Unlike Business Intelligence, Big Data is an innovation
originated from the IT side.
● The Business departments, which should come up with Big
Data usage requirements needs constant coaching on the
potential of the Big Data intelligence and successful
stories.
Thank You
Feedback appreciated
Nash Chintalcheru
Chintal75@gmail.com
309-242-1615
Presentation pdf : www.slideshare.net/chintal75

More Related Content

What's hot

Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and RoadmapImply
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorImply
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot InstancesImply
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataImply
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18Imply
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid Matt Sarrel
 
Druid meetup 2018-03-13
Druid meetup 2018-03-13Druid meetup 2018-03-13
Druid meetup 2018-03-13gianmerlino
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseRidwan Fadjar
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...
OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...
OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...NETWAYS
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsDatabricks
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Edwin Poot
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and TricksImply
 

What's hot (20)

Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested DataZeotap: Data Modeling in Druid for Non temporal and Nested Data
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 
Druid meetup 2018-03-13
Druid meetup 2018-03-13Druid meetup 2018-03-13
Druid meetup 2018-03-13
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...
OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...
OSMC 2009 | Implementing a large monitoring infrastructure with Nagios and Ga...
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
 
Druid Adoption Tips and Tricks
Druid Adoption Tips and TricksDruid Adoption Tips and Tricks
Druid Adoption Tips and Tricks
 

Viewers also liked

Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...
Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...
Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...Naresh Chintalcheru
 
3rd Generation Web Application Platforms
3rd Generation Web Application Platforms3rd Generation Web Application Platforms
3rd Generation Web Application PlatformsNaresh Chintalcheru
 
Object-Oriented Polymorphism Unleashed
Object-Oriented Polymorphism UnleashedObject-Oriented Polymorphism Unleashed
Object-Oriented Polymorphism UnleashedNaresh Chintalcheru
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code ExamplesNaresh Chintalcheru
 
Asynchronous Processing in Java/JEE/Spring
Asynchronous Processing in Java/JEE/SpringAsynchronous Processing in Java/JEE/Spring
Asynchronous Processing in Java/JEE/SpringNaresh Chintalcheru
 

Viewers also liked (6)

Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...
Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...
Lie Cheat & Steal to build Hyper-Fast Applications using Event-Driven Archite...
 
3rd Generation Web Application Platforms
3rd Generation Web Application Platforms3rd Generation Web Application Platforms
3rd Generation Web Application Platforms
 
Object-Oriented Polymorphism Unleashed
Object-Oriented Polymorphism UnleashedObject-Oriented Polymorphism Unleashed
Object-Oriented Polymorphism Unleashed
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code Examples
 
Asynchronous Processing in Java/JEE/Spring
Asynchronous Processing in Java/JEE/SpringAsynchronous Processing in Java/JEE/Spring
Asynchronous Processing in Java/JEE/Spring
 
Mule ESB Fundamentals
Mule ESB FundamentalsMule ESB Fundamentals
Mule ESB Fundamentals
 

Similar to Big Trends in Big Data

Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlSascha Oehl
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 
Srikanth hadoop hyderabad_3.4yeras - copy
Srikanth hadoop hyderabad_3.4yeras - copySrikanth hadoop hyderabad_3.4yeras - copy
Srikanth hadoop hyderabad_3.4yeras - copysrikanth K
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
 

Similar to Big Trends in Big Data (20)

Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha Oehl
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Srikanth hadoop hyderabad_3.4yeras - copy
Srikanth hadoop hyderabad_3.4yeras - copySrikanth hadoop hyderabad_3.4yeras - copy
Srikanth hadoop hyderabad_3.4yeras - copy
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
 

More from Naresh Chintalcheru

Bimodal IT for Speed and Innovation
Bimodal IT for Speed and InnovationBimodal IT for Speed and Innovation
Bimodal IT for Speed and InnovationNaresh Chintalcheru
 
Introduction to Node.js Platform
Introduction to Node.js PlatformIntroduction to Node.js Platform
Introduction to Node.js PlatformNaresh Chintalcheru
 
Problems opening SOA to the Online Web Applications
Problems opening SOA to the Online Web ApplicationsProblems opening SOA to the Online Web Applications
Problems opening SOA to the Online Web ApplicationsNaresh Chintalcheru
 
Design & Develop Batch Applications in Java/JEE
Design & Develop Batch Applications in Java/JEEDesign & Develop Batch Applications in Java/JEE
Design & Develop Batch Applications in Java/JEENaresh Chintalcheru
 
Building Next Generation Real-Time Web Applications using Websockets
Building Next Generation Real-Time Web Applications using WebsocketsBuilding Next Generation Real-Time Web Applications using Websockets
Building Next Generation Real-Time Web Applications using WebsocketsNaresh Chintalcheru
 
Automation Testing using Selenium
Automation Testing using SeleniumAutomation Testing using Selenium
Automation Testing using SeleniumNaresh Chintalcheru
 
Design & Development of Web Applications using SpringMVC
Design & Development of Web Applications using SpringMVC Design & Development of Web Applications using SpringMVC
Design & Development of Web Applications using SpringMVC Naresh Chintalcheru
 

More from Naresh Chintalcheru (10)

Cars.com Journey to AWS Cloud
Cars.com Journey to AWS CloudCars.com Journey to AWS Cloud
Cars.com Journey to AWS Cloud
 
Bimodal IT for Speed and Innovation
Bimodal IT for Speed and InnovationBimodal IT for Speed and Innovation
Bimodal IT for Speed and Innovation
 
Reactive systems
Reactive systemsReactive systems
Reactive systems
 
Introduction to Node.js Platform
Introduction to Node.js PlatformIntroduction to Node.js Platform
Introduction to Node.js Platform
 
Problems opening SOA to the Online Web Applications
Problems opening SOA to the Online Web ApplicationsProblems opening SOA to the Online Web Applications
Problems opening SOA to the Online Web Applications
 
Design & Develop Batch Applications in Java/JEE
Design & Develop Batch Applications in Java/JEEDesign & Develop Batch Applications in Java/JEE
Design & Develop Batch Applications in Java/JEE
 
Building Next Generation Real-Time Web Applications using Websockets
Building Next Generation Real-Time Web Applications using WebsocketsBuilding Next Generation Real-Time Web Applications using Websockets
Building Next Generation Real-Time Web Applications using Websockets
 
Automation Testing using Selenium
Automation Testing using SeleniumAutomation Testing using Selenium
Automation Testing using Selenium
 
Design & Development of Web Applications using SpringMVC
Design & Development of Web Applications using SpringMVC Design & Development of Web Applications using SpringMVC
Design & Development of Web Applications using SpringMVC
 
Android Platform Architecture
Android Platform ArchitectureAndroid Platform Architecture
Android Platform Architecture
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Big Trends in Big Data

  • 1. Big Trends in Big Data 2013 AITP Region-5 Technical Conference -Naresh Chintalcheru
  • 2. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Apache Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 3. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 4. Batch to Real Time Changing image of Big Data from Batch to Real Time Hadoop + MapReduce = Batch Processing
  • 5. Batch to Real Time ● Companies need real time processing of Big Data for various applications including online Fraud Detection, CEP (Complex Event Processing) and more. ● Emerging new frameworks, architectures and tools are making the real time processing dream come true.
  • 6. Big Data Real-Time Computing Systems ● Twitter’s Storm is an open source, distributed, faulttolerant and real time computation system. ○ Storm is a stream processing system ○ Unlike Hadoop jobs Strom jobs never stop continue to process data as it arrives ● Other Real Time systems include Streambase, HStreaming, Apache S4, Dempsy and Esper.
  • 7. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 8. Big Data Sql Tools Big Data Processing include ... ● Writing complex Java MapReduce Jobs ● Apache Pig Latin scripting ● Slow Sql processing from Apache Hive
  • 9. Big Data Sql Tools Inspired with Google’s Dremel paper now many vendors offer faster SQL based tools ● Google BigQuery ● Cloudera Impala ● IBM BigSql ● Greenplum HAWQ ● Hortonworks Stinger (Improve Hive Sql by x100) ● Apache Drill
  • 10. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 11. Big Data And Cloud Big Data needs many computing nodes for Data Storage and Data Processing which are elastic in nature … ● Cloud VM based computing is a perfect solution for Big Data infrastructure ● Public Cloud MegaStar Amazon AWS announced support for Hadoop, which means spin off Hadoop installed VM with basic configuration in 10mins
  • 12. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 13. Hadoop 2.0 New in Hadoop 2x ● Improved Performance with YARN aka MapReduce 2.0 ● Improved Scalability with HDFS Federation ● Support for Microsoft Windows ● Improved Security ● HDFS Snapshots
  • 14. Hadoop 2.0 - Performance Improved Performance with YARN aka MapReduce 2.0 ● MapReduce JobTracker managed both Resource management and App Job life-cycle together before. ● Now two functions are divided into separate components. ● Application Master negotiates with global Resource Manager for various Job requests
  • 15. Hadoop 2.0 - Scalability HDFS Federation ● No more single NameNode(NN) and SNN. ● HDFS Federation supports multiple independent NameNodes and Namespaces. ● Each DataNode(DN) registers with all the NameNodes in the cluster. DN sends periodic heartbeats & block reports and handle commands from all NN.
  • 16. Hadoop 2.0 - Security Improved Security ● Enforcement of HDFS file permission by NN and Access Control List (ACL) of users and groups ● Block Access Tokens for access control to Data block. ● Job Tokens to enforce Task authorization ● Network Encryption & Kerberos RPC. Now HDFS file transfer can be configured for encryption
  • 17. Hadoop 2.0 - HDFS Snapshots Improved Backup & Disaster Recovery ● HDFS Snapshots are read-only point-in-time copies of the file system. ● Snapshots can be taken on a subtree or entire file system. ● Useful for data backup, protection against user errors and disaster recovery
  • 18. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 19. Big Data Applications ● Infrastructure layer of Big Data is largely solved (......... secret Hadoop) ● Now the future innovation is focused on applications and analytics
  • 20. Big Data Analytic Applications Pattern Discovery and Sense-Making based analytic applications. ● Wibi Data: Lessons learned and predictive apps ● Recorded Future: Web intelligence for Business decisions ● Nutonian: Uncovers relationships hidden with in complex data ● R Studio: Data analysis tool
  • 21. Big Data - Visualization Applications Sophisticated Big Data Visualization tools. ● IBM BigSheets ● D3.js ● Fathom ● Processing.org
  • 22. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 23. Big Data & Business Intelligence Support from various BI vendors IBM Cognos, SAP Business Objects & Oracle Hyperion to connect directly to Hadoop Data using Apache Hive connectors.
  • 24. Big Data & Data Warehouse Challenge of new multiple unstructured data sources such as Clickstreams, Social media, Mobile, Sensors and Web Logs requires massive processing and traditional data warehouse cost to scale. The Big question is data warehouse survive the Big Data ? More on this in my next presentation :)
  • 25. Agenda - Big Data Trends ● ● ● ● Batch to Real Time Sql, Sql, Sql … Cloud Platform Support Hadoop 2.0 ○ ○ ○ Improved Performance Improved Scalability Improved Security ● Applications ○ ○ Pattern Discovery Analytics Sophisticated Visualization ● BI & Data Warehouse ● Big Data Vision
  • 26. Big Data Vision Big Data requires a Big Vision
  • 27. Big Data requires Big Vision ● Unlike Business Intelligence, Big Data is an innovation originated from the IT side. ● The Business departments, which should come up with Big Data usage requirements needs constant coaching on the potential of the Big Data intelligence and successful stories.
  • 28. Thank You Feedback appreciated Nash Chintalcheru Chintal75@gmail.com 309-242-1615 Presentation pdf : www.slideshare.net/chintal75