SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Demo
BIGDATA Hadoop
eConvergence Inc.
www.econvergenceinc.com
Rajesh Kalasapati PMP
Data Architect
raj@econvergenceinc.com
What you learn from TODAY BIGDATA DEMO
 Evolution of Business Analytics
 Evolution of Data processing and Traditional RDBMS
systems
 Company data volumes and storage
 Architecture of Traditional RDBMS
 What is Hadoop?
 Why to use Hadoop?
 Hadoop Architecture
 Hadoop USE Cases
Analytics Evolution
 Questions
 Is Hadoop a replacement or compliment for R-DBMS today?
 How does this change in future?
 How does this differ from current RDBMS and MPP systems
Analytics…
Analytics…
Company Average Data volumes..
Data size table
 1,024 (e.g. one Kilobyte = 1,024 bytes).
 Bit = 1 bit
 Byte = 8 bits
 Kilobyte = 1024 bytes
 Megabyte = 1024 kilobytes
 Gigabyte = 1024 megabytes
 Terabyte = 1024 gigabytes
 Petabyte = 1,048,576 gigabytes
 Exabyte = 1,073,741,824 gigabytes
 Zettabyte = 1,099,511,627,776 gigabytes
DW Architecture
Analytics…
RDBMS Products Architecture
Data
filesData
files
Data
files
Node 1
Process area
Background
processes
Node 2
Process area
Background
processes
Node N
Process area
Background
processes
Data
filesData
files
Data
files
Logical and Physical ..
Database
Logical Physical
Tablespace Data file
OS block
Segment
Extent
Oracle data
block
Schema
 Yes
 Relational
Databases
(RDBMSs) have
been around for
ages
 MySQL is the most
popular among
them
 Data stored in
tables
 Schema-based,
i.e., structured
tables
 Queried using SQL
SQL queries: SELECT user_id from users WHERE
username = “jbellis”
Example’s Source
Data Growth
What is Hadoop?
 Hadoop is a data storage and processing system
 Scalable, fault-tolerant and distributed
 Able to store structured or unstructured data
 Stores terabytes, petabytes of data inexpensively
 Robust and reliable and handles hardware and system failures automatically
WITHOUT loosing data or interrupting data analysis
 Hadoop runs on clusters of commodity servers. Each of those servers has local
CPU and storage
Why Hadoop?
Hadoop Cluster
What is Commodity Hardware?
 In general:
 Average amount of computing resources.
 Does not imply low quality, but rather affordability
 Feature of commodity hardware is that over time it is widely used in roles for which it was not specifically
designed, as opposed to purpose-built hardware
 In context of Hadoop
 Hadoop clusters are run on servers
 Most commodity servers used in Hadoop clusters have average ratio of disk space to memory, as opposed to
being specialized servers with massively high memory or CPU
 Examples of commodity hardware
 1 TB hard disks in a JBOD(Just Bunch of Disk) configuration
 Two quad core CPUs, running at least 2-2.5 GHz
 16-24GBs of RAMs (24-32 GB if you are using Hbase)
 1 GB ethernet
 Powerful cluster
 Six 2TB hard disks with RAID 1 across two of the disks
 Two quad core CPUs
 32-64 GBs of ECC(Error correcting Code) RAM
 2-4 GB of ethernet
Goals of Hadoop-HDFS
 Very large distributed file system
 10K Nodes, 100 million files, 10 PB size
 Assumes Commodity Hardware
 Files are replicated to handle hardware failures
 Detect failures and recovers from them
 Optimized Batch processing
 Process will move to data storage than the data move to process in traditional DBs
 Provides very high aggregate bandwidth
HDFS Architecture
Cat
Bat
Dog
Other
Words
(size:
TByte)
map
map
map
map
split
split
split
split
combine
combine
combine
reduce
reduce
reduce
part0
part1
part2
MapReduce
09/29/13
20
Comparisions
Strengths…
Traditional Data Flows
With Hadoop..
Analytics Life Cycle
Traditional DWH Eco System
HR, Sales etc..
Information Sources
Operational DBs
(RDBMS)
Tools for extraction
Informatica / DataStage etc..
Oracle,
Teradata,
SQL Server,
Etc..
EDW
Data Marts
OLAP Servers
Supports
OLAP Tools
for queries
Data Mining
Client Tools
(Micro Strategy,
SAS,
SAP BO,
OBIEE,
Hyperian
Etc..)
Hadoop Eco System
HR, Sales etc..
Information Sources
Operational DBs
(RDBMS)
Oracle,
Teradata,
SQL Server
Etc..
EDW
Tools for extraction
Informatica / DataStage
Pig, Sqoop etc..
Datameer
Karmasphere
MicroStrategy
Platfora
Qlikview
Tableau
Tresata
BigData: Future Value
BigData: Future Value
BigData: Future Value
Analytics…
BigData: Use Cases 1 – Risk Modeling
 How can banks better understand customers and markets?
 The Summary
 A large bank took separate data warehouses from multiple departments and combined them into a single
global repository in Hadoop for analysis. The bank used the Hadoop cluster to construct a new and more
accurate score of the risk in its customer portfolios. The more accurate score allowed the bank to manage its
exposure better and to offer each customer better products and advice. Hadoop increased revenue and
improved customer satisfaction.
 The Challenge
 A very large bank with several consumer lines of business needed to analyze customer activity across
multiple products to predict credit risk with greater accuracy. Over the years, the bank had acquired a
number of regional banks. Each of those banks had a checking and savings business, a home mortgage
business, credit card offerings and other financial products. Those applications generally ran in separate
silos—each used its own database and application software. As a result, over the years the bank had built up
a large number of independent systems that could not share data easily. With the economic downturn of
2008, the bank had significant exposure in its mortgage business to defaults by its borrowers. Understanding
that risk required the bank to build a comprehensive picture of its customers. A customer whose direct
deposits to checking had stopped, and who was buying more on credit cards, was likely to have lost a job
recently. That customer was at higher risk of default on outstanding loans as a result.
BigData: Use Cases 2 – CUSTOMER CHURN ANALYSIS
 Why do companies really lose customers?
 The Summary
 A large telecommunications provider analyzed call logs and complex data from multiple sources. It used
sophisticated predictive models across that data to predict the likelihood that any particular customer would
leave. Hadoop helped the telecommunications company build more valuable customer relationships and
reduce churn.
 The Challenge
 A large mobile carrier needed to analyze multiple data sources to understand how and why customers
decided to terminate their service contracts. Were customers actually leaving, or were they merely trading
one service plan for another? Were they leaving the company entirely and moving to a competitor? Were
pricing, coverage gaps, or device issues a factor? What other issues were important, and how could the
provider improve satisfaction and retain customers?
BigData: Use Cases 3 – RECOMMENDATION ENGINE
 How can companies predict customer preferences?
 The Summary
 A leading online dating service uses sophisticated analyses to measure the compatibility between individual
members, so that it can suggest good matches for a potential relationship. Hadoop helped customers find
romance.
 The Challenge
 When users sign up for the dating service, they fill in surveys that describe themselves and what they look
for in a romantic partner. The company combined that information with demographic and web activity to build
a comprehensive picture of its customers. The data included a mix of complex and structured information,
and the scoring and matching algorithms that used it were complex. Customers naturally wanted better
recommendations over time, so the analytical system had to evolve continually with new techniques for
assessing romantic fit. As the company added new subscribers, the amount of data it managed grew, and
the difficulty of comparing every possible pair of romantic partners in the network grew even faster.
BigData: Use Cases 4 – AD TARGETING
 How can companies increase campaign efficiency?
 The Summary
 Two leading advertising networks use Hadoop to choose the best ad to show to any given user.
 The Challenge
 Advertisement targeting is a special kind of recommendation engine. It selects ads best suited to a particular
visitor. There is, though, an additional twist: each advertiser is willing to pay a certain amount to have its ad
seen. Advertising networks auction ad space, and advertisers want their ads shown to the people most likely
to buy their products. This creates a complex optimization challenge.
BigData: Use Cases 5 – POINT-OF-SALE TRANSACTION ANALYSIS
 How do retailers target promotions guaranteed to make you buy?
 The Summary
 A large retailer doing Point-of-Sale transactional analysis needed to combine larger
quantities of PoS transaction analysis data with new and interesting data sources to forecast
demand and improve the return that it got on its promotional campaigns. The retailer built a
Hadoop cluster to understand its customers better and increased its revenues.
 The Challenge
 Retail analytics has been a core part of the data warehousing industry and has helped to
drive its growth. Today, retailers are able to collect much more data about their customers,
both in stores and online. Many want to combine this new information with recent and
historical sales data from PoS systems to increase sales and improve margins. Legacy data
warehousing systems are an expensive place to store complex data from new sources. They
do not, generally, support the kind of sophisticated analyses - sentiment, language
processing and others—that apply to this new data.
BigData: Use Cases 6 – ANALYZING NETWORK DATA TO PREDICT
FAILURE
 How can organizations use machine generated data to identify potential trouble?
 The Summary
 A very large public power company combined sensor data from the smart grid with a map of
the network to predict which generators in the grid were likely to fail, and how that failure
would affect the network as a whole.
 The Challenge
 Utilities run big, expensive and complicated systems to generate power. Each of the
generators includes sophisticated sensors that monitor voltage, current, frequency and other
important operating characteristics. Operating a single generator means paying careful
attention to all of the data streaming off of the sensors attached to it.
 Utilities operate many of these generators spread across multiple locations. The locations are
connected to one another, and then each utility is connected to the public power grid.
Monitoring the health of the entire grid requires capture and analysis of data from every
utility, and even from every generator, in the grid.
 The volume of data is enormous. A clear picture of the health of the grid depends on both
real-time and after-the-fact forensic analysis of all of it. Spotting facilities at risk of failure
early, and doing preventive maintenance or separating them from the grid, is critical to
preventing costly outages.
BigData: Use Cases 7 – THREAT ANALYSIS
 How can companies detect threats and fraudulent activity?
 The Summary
 Businesses have struggled with theft, fraud and abuse since long before computers existed.
Computers and on-line systems create new opportunities for criminals to act swiftly,
efficiently and anonymously. On-line businesses use Hadoop to monitor and combat criminal
behavior.
 The Challenge
 Online criminals write viruses and malware to take over individual computers and steal
valuable data. They buy and sell using fraudulent identities and use scams to steal money or
goods. They lure victims into scams by sending email or other spam over networks. In “pay-
per-click” systems like online advertising, they use networks of compromised computers to
automate fraudulent activity, bilking money from advertisers or ad networks.
 Online businesses must capture, store and analyze both the content and the pattern of
messages that flow through the network to tell the difference between a legitimate
transaction and fraudulent activity by criminals.
BigData: Use Cases 8 – TRADE SURVEILLANCE
 How can a bank spot the rogue trader?
 The Summary
 A large investment bank combines data about the parties that participate in a trade with the
complex data that describes relationships among those parties and how they interact with
one another. The combination allows the bank to recognize unusual trading activity and to
flag it for human review. Hadoop allows the bank to spot and prevent suspect trading activity.
 The Challenge
 The bank already captured trading activity and used that data to assess, predict, and
manage risk for both regulatory and non-regulatory purposes. The very large volume of data,
however, made it difficult to monitor trades for compliance, and virtually impossible to catch
“rogue” traders, who engage in trades that violate policies or expose the bank to too much
risk.
 The risk is enormous. At Barings Bank in 1995, a single trader named Nick Leeson made an
escalating series of money-losing trades in an attempt to cover losses from earlier ones. The
final cost to the bank, at nearly $1.3Bn, forced Barings out of business.
BigData: Use Cases 9 – SEARCH QUALITY
 What’s in your search?
 The Summary
 A leading online commerce company uses Hadoop to analyze and index its data and to
deliver more relevant, useful search results to its customers.
 The Challenge
 Good search tools have been a boon to web users. As the amount of data available online
has grown, organizing it has become increasingly difficult. Users today are more likely to
search for information with keywords than to browse through folders looking for what they
need.
 Good search tools are hard to build. They must store massive amounts of information, much
of it complex text or multimedia files. They must be able to process those files to extract
keywords or other attributes for searches. The amount of data and its complexity demand a
scalable and flexible platform for indexing.
 Besides the difficulty of handling the data, a good search engine must be able to assess the
intent and interests of the user when a search query arrives. The word “chips” in a query may
refer to fried food or to electronic components. Delivering meaningful results requires that the
system make a good guess between the two. Looking at the user’s recent activity and history
can help.
BigData: Use Cases 10 – Data sandbox
 What can you do with new data?
 The Summary
 Companies--even those with established enterprise data warehouses—often need a cost-
effective, flexible way to store, explore and analyze new types of complex data. Many
companies have created “data sandboxes” using Hadoop, where users can play with data,
decide what to do with it and determine whether it should be added to the data warehouse.
Analysts can look for new relationships in the data, mine it and use techniques like machine
learning and natural language processing on it to get new insights.
 The Challenge
 The variety, complexity and volume of data available to enterprises today are changing the
way that those enterprises think. There is value and insight locked up in complex data. The
best companies in the world are unlocking that value by building new analytical capacity.
 With shifting and complex data and emerging analytics, enterprises need a new platform to
store and explore the information they collect. Cost, scalability and flexibility are all forcing a
move away from the single-architecture data warehouse of the past, toward a more flexible
and comprehensive data management infrastructure.
When to use Hadoop?
Project handled……!!!!!!!!!!!
eConvergence Inc. 44 09/29/13
eConvergence, Inc
eConvergence Inc. is a software
development company and provides
cost-effective customer-centric IT
consulting services for customers
across the globe.
eConvergence Inc. 45 09/29/13
Strengths @ eConvergence
 Proven and experienced management team.
 Expertise in business transformation
outsourcing.
 Extensive industry expertise.
 Broad and evolving service offerings.
eConvergence Inc. 46 09/29/13
Key Domains/ Technology Expertise
 Data Warehousing
 Teradata
 Web Developments
 Business Intelligence
 Mobile Developments (iOS and Android)
 Manual & Automation testing
 Business Analyst
 Data Stage/Informatica
 BIGDATA/Hadoop
 PMP
eConvergence Inc. 47 09/29/13
Training Mode
 Onsite
 Online
 Batch size is limited
 Sessions are Live
 Sessions > Interactive
eConvergence Inc. 48 09/29/13
Batch Timings
Generally, the trainer covers 6 hrs/week
Weekdays
 9:00-11:00pm EST (6:00-8:00pm PST)
 Three days a week
Weekends
 3 hours each on Sat & Sunday
Based on Course, it can be 6-8Weeks
eConvergence Inc. 49 09/29/13
Technologies to Assist…
We use all the latest Training Methodologies to
make you learn in easier and best way:

GotoMeeting/Webex Conf. Software.
 Daily PPTs/Exercises forwarded after
the session.
 Recording of the sessions.
 Queries to be posted in Yahoo group.
eConvergence Inc. 50 09/29/13
Installation/Softwares
Our technical team will assist you in installation
of software required for training. :
 Installation in Desktop/Laptop.
 Server support for certain courses.
 Hands on Assistant.
 Technical Assistance.
eConvergence Inc. 51 09/29/13
Post training

Resume Preparation
 Interview Preparation
 Marketing Support
 Post Placement Support
eConvergence Inc. 52 09/29/13
Immigration Support
We provide assistance in all immigration
activities such as:
 H1 Transfers
 New H1s
 GC Processing
 eVerify
 F1 OPTs
Course Contents
 Session- 1
 DW Definition, DW Architecture
 Operation Data Bases(ODB)
 Data Modeling 3NF and Dimensional Modeling
 OLTP and OLAP, ETL concepts
 Top Down/Bottom up approaches
 Bill Inmon approach – advantages and
disadvantages
 Ralph Kimbell approach – advantages and
disadvantages
 Star and Snowflake schemas
 Dimension modeling design considerations
 Normalization techniques with live examples
 Data Mart project examples
 Customer, Products, Geo dimensional concepts
 Hierarchy structures
 Master Data Management systems
 Information Management systems
 NO SQL – BIG Data, ACID Model, CAP Model
 Session- 2
 HADOOP Architecture
 HDFS Architecture
 HDFS Features
 Intro Name node & Data Node
 File storage & Replication
 Build HADOOP Cluster–EC2/AWS
 Hadoop Configuration
 Session- 3
 MapReduce
 MapReduce Features
 MapReduce Job recovery
 MapReduce Job Check
 Cluster Rebalancing
 Secondary Name Node features
 Practice Hadoop Commands
Course Contents
 Session- 4
 Introduction to Hive
 Installation of Hive
 Hive SQLs
 Hive internal and external tables
 Hive Partitions
 Introduction to SQOOP
 Installation of Sqoop
 Sqoop practice with Hadoop and HBase
 Session- 5
 Introduction to Pig
 Installation of Pig
 Pig Relations, Bags, Tuples, Fields
 Pig- expressions
 Pig- Schemas
 Pig- Join and Split Optimization
 Pig- JSON
 Pig-Co Groups
 Session- 6
 Introduction to HBASE
 Architecture
 Install Hbase
 Region Servers , Master
 Hbase with Hive
 Hbase with Sqoop
 Hbase with PIG
 Hbase practice
 Session- 7
 Installation on VM Ware
 Installation of CDH4
 Introduction to R-Language and Revo Analytics
 Session- 8
 Performance Tuning
 Certification discussion – CCA-410
 Practical Example

Weitere ähnliche Inhalte

Was ist angesagt?

Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 
Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data AnalyticsVijay Rao
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsGord Sissons
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentationMassTLC
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Banalytics - Monetizing corporate big data | Instarea
Banalytics - Monetizing corporate big data | InstareaBanalytics - Monetizing corporate big data | Instarea
Banalytics - Monetizing corporate big data | InstareaMatej Misik
 
Big Data Commercialization and associated IoT Platform Implications by Ramnik...
Big Data Commercialization and associated IoT Platform Implications by Ramnik...Big Data Commercialization and associated IoT Platform Implications by Ramnik...
Big Data Commercialization and associated IoT Platform Implications by Ramnik...Data Con LA
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIMC Institute
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 

Was ist angesagt? (20)

Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data Analytics
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentation
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
Banalytics - Monetizing corporate big data | Instarea
Banalytics - Monetizing corporate big data | InstareaBanalytics - Monetizing corporate big data | Instarea
Banalytics - Monetizing corporate big data | Instarea
 
Big Data Commercialization and associated IoT Platform Implications by Ramnik...
Big Data Commercialization and associated IoT Platform Implications by Ramnik...Big Data Commercialization and associated IoT Platform Implications by Ramnik...
Big Data Commercialization and associated IoT Platform Implications by Ramnik...
 
National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Big data
Big dataBig data
Big data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 

Andere mochten auch

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 
BigData Overview
BigData OverviewBigData Overview
BigData OverviewHoryun Lee
 
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Prof. Dr. Diego Kuonen
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler Shengwen HOU(侯圣文)
 
Intro to Bits, Bytes, and Storage
Intro to Bits, Bytes, and StorageIntro to Bits, Bytes, and Storage
Intro to Bits, Bytes, and StorageJohn Goldsworthy
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraMatthias Broecheler
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 

Andere mochten auch (20)

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
BigData Overview
BigData OverviewBigData Overview
BigData Overview
 
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
Overview of Big Data, Data Science and Statistics, along with Digitalisation,...
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
 
Intro to Bits, Bytes, and Storage
Intro to Bits, Bytes, and StorageIntro to Bits, Bytes, and Storage
Intro to Bits, Bytes, and Storage
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Ähnlich wie Hadoop Demo eConvergence

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesaziksa
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaData Con LA
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workPrincipled Technologies
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTKiththi Perera
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardKiththi Perera
 
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxC21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxAdityaDeshpande674450
 
Hortonworks hadoop big data_retail__white_paper
Hortonworks hadoop big data_retail__white_paperHortonworks hadoop big data_retail__white_paper
Hortonworks hadoop big data_retail__white_paperShyam Babu
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 

Ähnlich wie Hadoop Demo eConvergence (20)

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forward
 
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptxC21027_Aditya_Big Data Analytics In Baking Sector.pptx
C21027_Aditya_Big Data Analytics In Baking Sector.pptx
 
Hortonworks hadoop big data_retail__white_paper
Hortonworks hadoop big data_retail__white_paperHortonworks hadoop big data_retail__white_paper
Hortonworks hadoop big data_retail__white_paper
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Combining hadoop with big data analytics
Combining hadoop with big data analyticsCombining hadoop with big data analytics
Combining hadoop with big data analytics
 
Big Data
Big DataBig Data
Big Data
 

Kürzlich hochgeladen

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Hadoop Demo eConvergence

  • 1. Demo BIGDATA Hadoop eConvergence Inc. www.econvergenceinc.com Rajesh Kalasapati PMP Data Architect raj@econvergenceinc.com
  • 2. What you learn from TODAY BIGDATA DEMO  Evolution of Business Analytics  Evolution of Data processing and Traditional RDBMS systems  Company data volumes and storage  Architecture of Traditional RDBMS  What is Hadoop?  Why to use Hadoop?  Hadoop Architecture  Hadoop USE Cases
  • 3. Analytics Evolution  Questions  Is Hadoop a replacement or compliment for R-DBMS today?  How does this change in future?  How does this differ from current RDBMS and MPP systems
  • 7. Data size table  1,024 (e.g. one Kilobyte = 1,024 bytes).  Bit = 1 bit  Byte = 8 bits  Kilobyte = 1024 bytes  Megabyte = 1024 kilobytes  Gigabyte = 1024 megabytes  Terabyte = 1024 gigabytes  Petabyte = 1,048,576 gigabytes  Exabyte = 1,073,741,824 gigabytes  Zettabyte = 1,099,511,627,776 gigabytes
  • 10. RDBMS Products Architecture Data filesData files Data files Node 1 Process area Background processes Node 2 Process area Background processes Node N Process area Background processes Data filesData files Data files
  • 11. Logical and Physical .. Database Logical Physical Tablespace Data file OS block Segment Extent Oracle data block Schema
  • 12.  Yes  Relational Databases (RDBMSs) have been around for ages  MySQL is the most popular among them  Data stored in tables  Schema-based, i.e., structured tables  Queried using SQL SQL queries: SELECT user_id from users WHERE username = “jbellis” Example’s Source
  • 14. What is Hadoop?  Hadoop is a data storage and processing system  Scalable, fault-tolerant and distributed  Able to store structured or unstructured data  Stores terabytes, petabytes of data inexpensively  Robust and reliable and handles hardware and system failures automatically WITHOUT loosing data or interrupting data analysis  Hadoop runs on clusters of commodity servers. Each of those servers has local CPU and storage
  • 17. What is Commodity Hardware?  In general:  Average amount of computing resources.  Does not imply low quality, but rather affordability  Feature of commodity hardware is that over time it is widely used in roles for which it was not specifically designed, as opposed to purpose-built hardware  In context of Hadoop  Hadoop clusters are run on servers  Most commodity servers used in Hadoop clusters have average ratio of disk space to memory, as opposed to being specialized servers with massively high memory or CPU  Examples of commodity hardware  1 TB hard disks in a JBOD(Just Bunch of Disk) configuration  Two quad core CPUs, running at least 2-2.5 GHz  16-24GBs of RAMs (24-32 GB if you are using Hbase)  1 GB ethernet  Powerful cluster  Six 2TB hard disks with RAID 1 across two of the disks  Two quad core CPUs  32-64 GBs of ECC(Error correcting Code) RAM  2-4 GB of ethernet
  • 18. Goals of Hadoop-HDFS  Very large distributed file system  10K Nodes, 100 million files, 10 PB size  Assumes Commodity Hardware  Files are replicated to handle hardware failures  Detect failures and recovers from them  Optimized Batch processing  Process will move to data storage than the data move to process in traditional DBs  Provides very high aggregate bandwidth
  • 26. Traditional DWH Eco System HR, Sales etc.. Information Sources Operational DBs (RDBMS) Tools for extraction Informatica / DataStage etc.. Oracle, Teradata, SQL Server, Etc.. EDW Data Marts OLAP Servers Supports OLAP Tools for queries Data Mining Client Tools (Micro Strategy, SAS, SAP BO, OBIEE, Hyperian Etc..)
  • 27. Hadoop Eco System HR, Sales etc.. Information Sources Operational DBs (RDBMS) Oracle, Teradata, SQL Server Etc.. EDW Tools for extraction Informatica / DataStage Pig, Sqoop etc.. Datameer Karmasphere MicroStrategy Platfora Qlikview Tableau Tresata
  • 32. BigData: Use Cases 1 – Risk Modeling  How can banks better understand customers and markets?  The Summary  A large bank took separate data warehouses from multiple departments and combined them into a single global repository in Hadoop for analysis. The bank used the Hadoop cluster to construct a new and more accurate score of the risk in its customer portfolios. The more accurate score allowed the bank to manage its exposure better and to offer each customer better products and advice. Hadoop increased revenue and improved customer satisfaction.  The Challenge  A very large bank with several consumer lines of business needed to analyze customer activity across multiple products to predict credit risk with greater accuracy. Over the years, the bank had acquired a number of regional banks. Each of those banks had a checking and savings business, a home mortgage business, credit card offerings and other financial products. Those applications generally ran in separate silos—each used its own database and application software. As a result, over the years the bank had built up a large number of independent systems that could not share data easily. With the economic downturn of 2008, the bank had significant exposure in its mortgage business to defaults by its borrowers. Understanding that risk required the bank to build a comprehensive picture of its customers. A customer whose direct deposits to checking had stopped, and who was buying more on credit cards, was likely to have lost a job recently. That customer was at higher risk of default on outstanding loans as a result.
  • 33. BigData: Use Cases 2 – CUSTOMER CHURN ANALYSIS  Why do companies really lose customers?  The Summary  A large telecommunications provider analyzed call logs and complex data from multiple sources. It used sophisticated predictive models across that data to predict the likelihood that any particular customer would leave. Hadoop helped the telecommunications company build more valuable customer relationships and reduce churn.  The Challenge  A large mobile carrier needed to analyze multiple data sources to understand how and why customers decided to terminate their service contracts. Were customers actually leaving, or were they merely trading one service plan for another? Were they leaving the company entirely and moving to a competitor? Were pricing, coverage gaps, or device issues a factor? What other issues were important, and how could the provider improve satisfaction and retain customers?
  • 34. BigData: Use Cases 3 – RECOMMENDATION ENGINE  How can companies predict customer preferences?  The Summary  A leading online dating service uses sophisticated analyses to measure the compatibility between individual members, so that it can suggest good matches for a potential relationship. Hadoop helped customers find romance.  The Challenge  When users sign up for the dating service, they fill in surveys that describe themselves and what they look for in a romantic partner. The company combined that information with demographic and web activity to build a comprehensive picture of its customers. The data included a mix of complex and structured information, and the scoring and matching algorithms that used it were complex. Customers naturally wanted better recommendations over time, so the analytical system had to evolve continually with new techniques for assessing romantic fit. As the company added new subscribers, the amount of data it managed grew, and the difficulty of comparing every possible pair of romantic partners in the network grew even faster.
  • 35. BigData: Use Cases 4 – AD TARGETING  How can companies increase campaign efficiency?  The Summary  Two leading advertising networks use Hadoop to choose the best ad to show to any given user.  The Challenge  Advertisement targeting is a special kind of recommendation engine. It selects ads best suited to a particular visitor. There is, though, an additional twist: each advertiser is willing to pay a certain amount to have its ad seen. Advertising networks auction ad space, and advertisers want their ads shown to the people most likely to buy their products. This creates a complex optimization challenge.
  • 36. BigData: Use Cases 5 – POINT-OF-SALE TRANSACTION ANALYSIS  How do retailers target promotions guaranteed to make you buy?  The Summary  A large retailer doing Point-of-Sale transactional analysis needed to combine larger quantities of PoS transaction analysis data with new and interesting data sources to forecast demand and improve the return that it got on its promotional campaigns. The retailer built a Hadoop cluster to understand its customers better and increased its revenues.  The Challenge  Retail analytics has been a core part of the data warehousing industry and has helped to drive its growth. Today, retailers are able to collect much more data about their customers, both in stores and online. Many want to combine this new information with recent and historical sales data from PoS systems to increase sales and improve margins. Legacy data warehousing systems are an expensive place to store complex data from new sources. They do not, generally, support the kind of sophisticated analyses - sentiment, language processing and others—that apply to this new data.
  • 37. BigData: Use Cases 6 – ANALYZING NETWORK DATA TO PREDICT FAILURE  How can organizations use machine generated data to identify potential trouble?  The Summary  A very large public power company combined sensor data from the smart grid with a map of the network to predict which generators in the grid were likely to fail, and how that failure would affect the network as a whole.  The Challenge  Utilities run big, expensive and complicated systems to generate power. Each of the generators includes sophisticated sensors that monitor voltage, current, frequency and other important operating characteristics. Operating a single generator means paying careful attention to all of the data streaming off of the sensors attached to it.  Utilities operate many of these generators spread across multiple locations. The locations are connected to one another, and then each utility is connected to the public power grid. Monitoring the health of the entire grid requires capture and analysis of data from every utility, and even from every generator, in the grid.  The volume of data is enormous. A clear picture of the health of the grid depends on both real-time and after-the-fact forensic analysis of all of it. Spotting facilities at risk of failure early, and doing preventive maintenance or separating them from the grid, is critical to preventing costly outages.
  • 38. BigData: Use Cases 7 – THREAT ANALYSIS  How can companies detect threats and fraudulent activity?  The Summary  Businesses have struggled with theft, fraud and abuse since long before computers existed. Computers and on-line systems create new opportunities for criminals to act swiftly, efficiently and anonymously. On-line businesses use Hadoop to monitor and combat criminal behavior.  The Challenge  Online criminals write viruses and malware to take over individual computers and steal valuable data. They buy and sell using fraudulent identities and use scams to steal money or goods. They lure victims into scams by sending email or other spam over networks. In “pay- per-click” systems like online advertising, they use networks of compromised computers to automate fraudulent activity, bilking money from advertisers or ad networks.  Online businesses must capture, store and analyze both the content and the pattern of messages that flow through the network to tell the difference between a legitimate transaction and fraudulent activity by criminals.
  • 39. BigData: Use Cases 8 – TRADE SURVEILLANCE  How can a bank spot the rogue trader?  The Summary  A large investment bank combines data about the parties that participate in a trade with the complex data that describes relationships among those parties and how they interact with one another. The combination allows the bank to recognize unusual trading activity and to flag it for human review. Hadoop allows the bank to spot and prevent suspect trading activity.  The Challenge  The bank already captured trading activity and used that data to assess, predict, and manage risk for both regulatory and non-regulatory purposes. The very large volume of data, however, made it difficult to monitor trades for compliance, and virtually impossible to catch “rogue” traders, who engage in trades that violate policies or expose the bank to too much risk.  The risk is enormous. At Barings Bank in 1995, a single trader named Nick Leeson made an escalating series of money-losing trades in an attempt to cover losses from earlier ones. The final cost to the bank, at nearly $1.3Bn, forced Barings out of business.
  • 40. BigData: Use Cases 9 – SEARCH QUALITY  What’s in your search?  The Summary  A leading online commerce company uses Hadoop to analyze and index its data and to deliver more relevant, useful search results to its customers.  The Challenge  Good search tools have been a boon to web users. As the amount of data available online has grown, organizing it has become increasingly difficult. Users today are more likely to search for information with keywords than to browse through folders looking for what they need.  Good search tools are hard to build. They must store massive amounts of information, much of it complex text or multimedia files. They must be able to process those files to extract keywords or other attributes for searches. The amount of data and its complexity demand a scalable and flexible platform for indexing.  Besides the difficulty of handling the data, a good search engine must be able to assess the intent and interests of the user when a search query arrives. The word “chips” in a query may refer to fried food or to electronic components. Delivering meaningful results requires that the system make a good guess between the two. Looking at the user’s recent activity and history can help.
  • 41. BigData: Use Cases 10 – Data sandbox  What can you do with new data?  The Summary  Companies--even those with established enterprise data warehouses—often need a cost- effective, flexible way to store, explore and analyze new types of complex data. Many companies have created “data sandboxes” using Hadoop, where users can play with data, decide what to do with it and determine whether it should be added to the data warehouse. Analysts can look for new relationships in the data, mine it and use techniques like machine learning and natural language processing on it to get new insights.  The Challenge  The variety, complexity and volume of data available to enterprises today are changing the way that those enterprises think. There is value and insight locked up in complex data. The best companies in the world are unlocking that value by building new analytical capacity.  With shifting and complex data and emerging analytics, enterprises need a new platform to store and explore the information they collect. Cost, scalability and flexibility are all forcing a move away from the single-architecture data warehouse of the past, toward a more flexible and comprehensive data management infrastructure.
  • 42. When to use Hadoop?
  • 44. eConvergence Inc. 44 09/29/13 eConvergence, Inc eConvergence Inc. is a software development company and provides cost-effective customer-centric IT consulting services for customers across the globe.
  • 45. eConvergence Inc. 45 09/29/13 Strengths @ eConvergence  Proven and experienced management team.  Expertise in business transformation outsourcing.  Extensive industry expertise.  Broad and evolving service offerings.
  • 46. eConvergence Inc. 46 09/29/13 Key Domains/ Technology Expertise  Data Warehousing  Teradata  Web Developments  Business Intelligence  Mobile Developments (iOS and Android)  Manual & Automation testing  Business Analyst  Data Stage/Informatica  BIGDATA/Hadoop  PMP
  • 47. eConvergence Inc. 47 09/29/13 Training Mode  Onsite  Online  Batch size is limited  Sessions are Live  Sessions > Interactive
  • 48. eConvergence Inc. 48 09/29/13 Batch Timings Generally, the trainer covers 6 hrs/week Weekdays  9:00-11:00pm EST (6:00-8:00pm PST)  Three days a week Weekends  3 hours each on Sat & Sunday Based on Course, it can be 6-8Weeks
  • 49. eConvergence Inc. 49 09/29/13 Technologies to Assist… We use all the latest Training Methodologies to make you learn in easier and best way:  GotoMeeting/Webex Conf. Software.  Daily PPTs/Exercises forwarded after the session.  Recording of the sessions.  Queries to be posted in Yahoo group.
  • 50. eConvergence Inc. 50 09/29/13 Installation/Softwares Our technical team will assist you in installation of software required for training. :  Installation in Desktop/Laptop.  Server support for certain courses.  Hands on Assistant.  Technical Assistance.
  • 51. eConvergence Inc. 51 09/29/13 Post training  Resume Preparation  Interview Preparation  Marketing Support  Post Placement Support
  • 52. eConvergence Inc. 52 09/29/13 Immigration Support We provide assistance in all immigration activities such as:  H1 Transfers  New H1s  GC Processing  eVerify  F1 OPTs
  • 53. Course Contents  Session- 1  DW Definition, DW Architecture  Operation Data Bases(ODB)  Data Modeling 3NF and Dimensional Modeling  OLTP and OLAP, ETL concepts  Top Down/Bottom up approaches  Bill Inmon approach – advantages and disadvantages  Ralph Kimbell approach – advantages and disadvantages  Star and Snowflake schemas  Dimension modeling design considerations  Normalization techniques with live examples  Data Mart project examples  Customer, Products, Geo dimensional concepts  Hierarchy structures  Master Data Management systems  Information Management systems  NO SQL – BIG Data, ACID Model, CAP Model  Session- 2  HADOOP Architecture  HDFS Architecture  HDFS Features  Intro Name node & Data Node  File storage & Replication  Build HADOOP Cluster–EC2/AWS  Hadoop Configuration  Session- 3  MapReduce  MapReduce Features  MapReduce Job recovery  MapReduce Job Check  Cluster Rebalancing  Secondary Name Node features  Practice Hadoop Commands
  • 54. Course Contents  Session- 4  Introduction to Hive  Installation of Hive  Hive SQLs  Hive internal and external tables  Hive Partitions  Introduction to SQOOP  Installation of Sqoop  Sqoop practice with Hadoop and HBase  Session- 5  Introduction to Pig  Installation of Pig  Pig Relations, Bags, Tuples, Fields  Pig- expressions  Pig- Schemas  Pig- Join and Split Optimization  Pig- JSON  Pig-Co Groups  Session- 6  Introduction to HBASE  Architecture  Install Hbase  Region Servers , Master  Hbase with Hive  Hbase with Sqoop  Hbase with PIG  Hbase practice  Session- 7  Installation on VM Ware  Installation of CDH4  Introduction to R-Language and Revo Analytics  Session- 8  Performance Tuning  Certification discussion – CCA-410  Practical Example