SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Webinar: Make Big Data Easy
with the Right tools and talent
October 2012
- MetaScale Expertise and Kognitio Analytics
Accelerate Hadoop for Organizations Large and Small
Today’s webinar
• 45 minutes with 15 minutes Q&A
• We will email you a link to the slides
• Feel free to use the Q & A feature
Agenda
• Opening introduction
• MetaScale Expertise
– Case study – Sears
Holdings
• Kognitio Analytics
– Hadoop acceleration
explained
• Summary
• Q&A
Michael Hiskey
VP Marketing & Business
Development
Kognitio
Dr. Phil Shelley
CEO, MetaScale
CTO, Sears Holdings
Presenters
Host
Roger Gaskell
CTO
Kognitio
Big Data < > Hadoop
Big Data is high volume, velocity and variety
information assets that demand cost-effective,
innovative forms of information processing for
enhanced insight and decision-making
 Volume (not only) size
 Velocity (speed of Input / Output)
 Variety (lots of data sources)
 Value – not the SIZE of your data,
but what you can DO with it!
OK, so you’ve decided to put data in Hadoop...
Now what?
Dr. Phil Shelley
CEO – MetaScale
CTO Sears Holdings
Where Did We Start at Sears?
Where Did We Start?
 Issues with meeting production schedules
 Multiple copies of data, no single point of truth
 ETL complexity, cost of software and cost to manage
 Time take to setup ETL data sources for projects
 Latency in data, up to weeks in some cases
 Enterprise Data Warehouses unable to handle load
 Mainframe workload over consuming capacity
 IT Budgets not growing – BUT data volumes escalating
Why Hadoop?
Traditional
Databases &
Warehouses
Hadoop
An Ecosystem
 Data Sourcing
 Connecting to Legacy source systems
 Loaders and tools (speed considerations)
 Batch or near-real time
 Enterprise Data Model
 Establish a model and enterprise data strategy early
 Data Transformations
 The End of ETL as we know it
 Data re-use
 Drive re-use of data
 Single point of truth is now a possibility
 Data Consumption and user Interaction
 Consume data in-place wherever possible
 Move data only if you have to
 Exporting to legacy systems can be done, but it duplicates data
 Loaders and tools (speed considerations)
 How will your users interact with the data
Enterprise Integration
Rethink Everything
The way you capture data
The way you store data
The structure of your data
The way you analyze data
The costs of data storage
The size of your data
What you can analyze
The speed of analysis
The skills of your team
The way user interact with data
The Learning from our Journey
• Big Data tools are here and ready for the Enterprise
• An Enterprise Data Architecture model is essential
• Hadoop can handle Enterprise workload
 To reduce strain on legacy platforms
 To reduce cost
 To bring new business opportunities
• Must be part of an overall data strategy
• Not to be underestimated
• The solution must be an Eco-System
 There has to be a simple way to consume the data
Page 12
Hadoop Strengths & Weaknesses?
• Cost effective platform
• Powerful / fast data processing environment
• Good at standard reporting
• Flexibility: Programmable, Any data type
• Huge scalability
• Barriers to entry: lots of engineering and coding
• High on-going coding requirements
• Difficult to access with standard BI/analytical tools
• Ad hoc complex analytics difficult
• Too slow for interactive analytics
Reference Architecture
What is an “In-memory” Analytical Platform?
• DBMS where all of the data of interest or specific portions of the data
have been permanently pre-loaded into random access memory (RAM)
• Not a large cache
– Data is held in structures that take advantage of the properties of
RAM – NOT copies of frequently used disk blocks
– The databases query optimiser knows at all times exactly which
data is in memory (and which is not)
In-Memory Analytical Database Mangement
Not a large cache:
• No disk access during query execution
– Temporary tables in RAM
– Results sets in RAM
• In-Memory means in high speed RAM
– NOT slow flash-based SSDs that mimic
mechanical disks
For more information:
• Gartner: “Who's Who in In-Memory DBMSs”
Roxanne Edjlali, Donald Feinberg
10 Sept 2012 www.gartner.com/id=2151315
Why In-memory: RAM is Faster Than Disk (Really!)
Actually, this only part of the story
Analytics completely change the workload
characteristics on the database
workload
Simple reporting and transactional processing
is all about “filtering” the data of interest
filtering
Analytics is all about complex “crunching”
of the data once it is filtered
crunching
Crunching needs processing power and
consumes CPU cycles
CPU
cycles
Storing data on physical disks severely limits the
rate at which data can be provided to the CPUs
storing
Accessing data directly from RAM allows
much more CPU power to be deployed
access
Analytics is about through Data
• To understand what is happening in the data
“CRUNCHING”
Joins
Sorts
Aggregations
Grouping
Analytical
Functions
Analytical
Functions
CPU cycle-intensive & CPU-bound
• Analytical platforms are therefore CPU-bound
– Assume disk I/O speeds not a bottleneck
– In-memory removes the disk I/O bottleneck
More complex
analytics
More pronounced
this becomes
=
For Analytics, the CPU is King
• The key metric of any analytical platform should be GB/CPU
– It needs to effectively utilize all available cores
– Hyper threads are NOT the equivalent of cores
• Interactive/adhoc analytics:
– THINK data to core ratios ≈ 10GB data per CPU core
• Every cycle is precious – CPU cores need to used efficiently
– Techniques such as “dynamic machine code generation”
Makes in-memory databases go slower
Makes disk-based databases go faster
Careful – performance impact of compression:
Speed & Scale are the Requirements
• Memory & CPU on an individual server = NOWHERE near enough for big data
– Moore’s Law – The power of a processor doubles every two years
– Data volumes – Double every year!!
• Every CPU core in
• Every server needs to efficiently involved in
• Every query
Every
– Data is split across all the CPU cores
– All database operations need to be parallelised with no points of
serialisation – This is true MPP
• Combine the RAM of many individual servers
• many CPU cores spread across
• many CPUs, housed in
• many individual computers
Many
• The only way to keep up is to parallelise or scale-out
Hadoop Connectivity
Kognitio - External Tables
– Data held on disk in other systems can be seen as non-memory
resident tables by Kognitio users.
– Users can select which data they wish to “suck” into memory.
• Using GUI or scripts
– Kognitio seamlessly sucks data out of the source system
into Kognitio memory.
– All managed via SQL
Kognitio - Hadoop Connectors
– Two types
• HDFS Connector
• Filter Agent Connector
– Designed for high speed
• Multiple parallel load streams
• Demonstrable 14TB+/hour load rates
Tight Hadoop integration
HDFS Connector
• Connector defines access to hdfs file
system
• External table accesses row-based data
in hdfs
• Dynamic access or “pin” data into
memory
• Complete hdfs file is loaded into memory
Filter Agent Connector
• Connector uploads agent to Hadoop
nodes
• Query passes selections and relevant
predicates to agent
• Data filtering and projection takes
place locally on each Hadoop node
• Only data of interest in loaded into
memory via parallel load streams
Not Only SQL
Kognitio V8 External Scripts
– Run third party scripts embedded within SQL
• Perl, Python, Java, R, SAS, etc.
• One-to-many rows in, zero-to-many rows out, one to one
create interpreter perlinterp
command '/usr/bin/perl' sends 'csv' receives 'csv' ;
select top 1000 words, count(*)
from (external script using environment perlinterp
receives (txt varchar(32000))
sends (words varchar(100))
script S'endofperl(
while(<>)
{
chomp();
s/[,.!_]//g;
foreach $c (split(/ /))
{ if($c =~ /^[a-zA-Z]+$/) { print "$cn”} }
}
)endofperl'
from (select comments from customer_enquiry))dt
group by 1
order by 2 desc;
This reads long comments
text from customer enquiry
table, in line perl converts
long text into output
stream of words (one word
per row), query selects top
1000 words by frequency
using standard SQL
aggregation
Hardware Requirements for
In-memory Platforms
• Hadoop = industry standard servers
• Careful to avoid vendor lock-in
• Off the shelf, low cost, servers match
neatly with Hadoop
– Intel or AMD CPU (x86)
– No special components
• Ethernet network
• Standard OS
Benefits of an In-memory Analytical Platform
• A seamless in-memory analytical layer on top of your data
persistence layer(s):
Analytical queries that used to run in hours and minutes, now
run in minutes and seconds (often sub-second)
High query throughput = massively higher concurrency
Flexibility
• Enables greater query complexity
• Users freely interact with data
• Use preferred BI Tools (relational or OLAP)
Reduced complexity
• Administration de-skilled
• Reduced data duplication
• Big Data tools are here and ready for the Enterprise
• An Enterprise Data Architecture model is essential
• Hadoop can handle Enterprise workload
 To reduce strain on legacy platforms
 To reduce cost
 To bring new business opportunities
• Must be part of an overall data strategy
• Not to be underestimated
• The solution must be an Eco-System
 There has to be a simple way to consume the data
Page 26
The Learning from our Journey
www.kognitio.com
kognitio.com/blog
twitter.com/kognitio
linkedin.com/companies/kognitio
facebook.com/kognitio
youtube.com/user/kognitio
Dr. Phil Shelley
CEO – MetaScale
CTO Sears Holdings
Michael Hiskey
Vice President
Marketing & Business Development
Michael.hiskey@kognitio.com
Phone: +1 (855) KOGNITIO
Upcoming Web Briefings: kognitio.com/briefings
connect contact

Weitere ähnliche Inhalte

Was ist angesagt?

Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
MapR Technologies
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills
 

Was ist angesagt? (20)

Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
Oracle migrations and upgrades
Oracle migrations and upgradesOracle migrations and upgrades
Oracle migrations and upgrades
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 

Andere mochten auch

Everyclick kognitio cloud expo jan 2013
Everyclick kognitio cloud expo  jan 2013Everyclick kognitio cloud expo  jan 2013
Everyclick kognitio cloud expo jan 2013
Michael Hiskey
 
Kognitio spark modern data platform print
Kognitio spark modern data platform printKognitio spark modern data platform print
Kognitio spark modern data platform print
Michael Hiskey
 
Big data and the bi wild west kognitio hiskey mar 2013
Big data and the bi wild west kognitio hiskey mar 2013Big data and the bi wild west kognitio hiskey mar 2013
Big data and the bi wild west kognitio hiskey mar 2013
Michael Hiskey
 

Andere mochten auch (8)

Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
 
Everyclick kognitio cloud expo jan 2013
Everyclick kognitio cloud expo  jan 2013Everyclick kognitio cloud expo  jan 2013
Everyclick kognitio cloud expo jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio spark modern data platform print
Kognitio spark modern data platform printKognitio spark modern data platform print
Kognitio spark modern data platform print
 
Big data and the bi wild west kognitio hiskey mar 2013
Big data and the bi wild west kognitio hiskey mar 2013Big data and the bi wild west kognitio hiskey mar 2013
Big data and the bi wild west kognitio hiskey mar 2013
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Ähnlich wie Meta scale kognitio hadoop webinar

Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Ähnlich wie Meta scale kognitio hadoop webinar (20)

Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Big Data
Big DataBig Data
Big Data
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 

KĂźrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂźrzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Meta scale kognitio hadoop webinar

  • 1. Webinar: Make Big Data Easy with the Right tools and talent October 2012 - MetaScale Expertise and Kognitio Analytics Accelerate Hadoop for Organizations Large and Small
  • 2. Today’s webinar • 45 minutes with 15 minutes Q&A • We will email you a link to the slides • Feel free to use the Q & A feature
  • 3. Agenda • Opening introduction • MetaScale Expertise – Case study – Sears Holdings • Kognitio Analytics – Hadoop acceleration explained • Summary • Q&A Michael Hiskey VP Marketing & Business Development Kognitio Dr. Phil Shelley CEO, MetaScale CTO, Sears Holdings Presenters Host Roger Gaskell CTO Kognitio
  • 4. Big Data < > Hadoop Big Data is high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making  Volume (not only) size  Velocity (speed of Input / Output)  Variety (lots of data sources)  Value – not the SIZE of your data, but what you can DO with it!
  • 5. OK, so you’ve decided to put data in Hadoop... Now what? Dr. Phil Shelley CEO – MetaScale CTO Sears Holdings
  • 6. Where Did We Start at Sears?
  • 7. Where Did We Start?  Issues with meeting production schedules  Multiple copies of data, no single point of truth  ETL complexity, cost of software and cost to manage  Time take to setup ETL data sources for projects  Latency in data, up to weeks in some cases  Enterprise Data Warehouses unable to handle load  Mainframe workload over consuming capacity  IT Budgets not growing – BUT data volumes escalating
  • 10.  Data Sourcing  Connecting to Legacy source systems  Loaders and tools (speed considerations)  Batch or near-real time  Enterprise Data Model  Establish a model and enterprise data strategy early  Data Transformations  The End of ETL as we know it  Data re-use  Drive re-use of data  Single point of truth is now a possibility  Data Consumption and user Interaction  Consume data in-place wherever possible  Move data only if you have to  Exporting to legacy systems can be done, but it duplicates data  Loaders and tools (speed considerations)  How will your users interact with the data Enterprise Integration
  • 11. Rethink Everything The way you capture data The way you store data The structure of your data The way you analyze data The costs of data storage The size of your data What you can analyze The speed of analysis The skills of your team The way user interact with data
  • 12. The Learning from our Journey • Big Data tools are here and ready for the Enterprise • An Enterprise Data Architecture model is essential • Hadoop can handle Enterprise workload  To reduce strain on legacy platforms  To reduce cost  To bring new business opportunities • Must be part of an overall data strategy • Not to be underestimated • The solution must be an Eco-System  There has to be a simple way to consume the data Page 12
  • 13. Hadoop Strengths & Weaknesses? • Cost effective platform • Powerful / fast data processing environment • Good at standard reporting • Flexibility: Programmable, Any data type • Huge scalability • Barriers to entry: lots of engineering and coding • High on-going coding requirements • Difficult to access with standard BI/analytical tools • Ad hoc complex analytics difficult • Too slow for interactive analytics
  • 15. What is an “In-memory” Analytical Platform? • DBMS where all of the data of interest or specific portions of the data have been permanently pre-loaded into random access memory (RAM) • Not a large cache – Data is held in structures that take advantage of the properties of RAM – NOT copies of frequently used disk blocks – The databases query optimiser knows at all times exactly which data is in memory (and which is not)
  • 16. In-Memory Analytical Database Mangement Not a large cache: • No disk access during query execution – Temporary tables in RAM – Results sets in RAM • In-Memory means in high speed RAM – NOT slow flash-based SSDs that mimic mechanical disks For more information: • Gartner: “Who's Who in In-Memory DBMSs” Roxanne Edjlali, Donald Feinberg 10 Sept 2012 www.gartner.com/id=2151315
  • 17. Why In-memory: RAM is Faster Than Disk (Really!) Actually, this only part of the story Analytics completely change the workload characteristics on the database workload Simple reporting and transactional processing is all about “filtering” the data of interest filtering Analytics is all about complex “crunching” of the data once it is filtered crunching Crunching needs processing power and consumes CPU cycles CPU cycles Storing data on physical disks severely limits the rate at which data can be provided to the CPUs storing Accessing data directly from RAM allows much more CPU power to be deployed access
  • 18. Analytics is about through Data • To understand what is happening in the data “CRUNCHING” Joins Sorts Aggregations Grouping Analytical Functions Analytical Functions CPU cycle-intensive & CPU-bound • Analytical platforms are therefore CPU-bound – Assume disk I/O speeds not a bottleneck – In-memory removes the disk I/O bottleneck More complex analytics More pronounced this becomes =
  • 19. For Analytics, the CPU is King • The key metric of any analytical platform should be GB/CPU – It needs to effectively utilize all available cores – Hyper threads are NOT the equivalent of cores • Interactive/adhoc analytics: – THINK data to core ratios ≈ 10GB data per CPU core • Every cycle is precious – CPU cores need to used efficiently – Techniques such as “dynamic machine code generation” Makes in-memory databases go slower Makes disk-based databases go faster Careful – performance impact of compression:
  • 20. Speed & Scale are the Requirements • Memory & CPU on an individual server = NOWHERE near enough for big data – Moore’s Law – The power of a processor doubles every two years – Data volumes – Double every year!! • Every CPU core in • Every server needs to efficiently involved in • Every query Every – Data is split across all the CPU cores – All database operations need to be parallelised with no points of serialisation – This is true MPP • Combine the RAM of many individual servers • many CPU cores spread across • many CPUs, housed in • many individual computers Many • The only way to keep up is to parallelise or scale-out
  • 21. Hadoop Connectivity Kognitio - External Tables – Data held on disk in other systems can be seen as non-memory resident tables by Kognitio users. – Users can select which data they wish to “suck” into memory. • Using GUI or scripts – Kognitio seamlessly sucks data out of the source system into Kognitio memory. – All managed via SQL Kognitio - Hadoop Connectors – Two types • HDFS Connector • Filter Agent Connector – Designed for high speed • Multiple parallel load streams • Demonstrable 14TB+/hour load rates
  • 22. Tight Hadoop integration HDFS Connector • Connector defines access to hdfs file system • External table accesses row-based data in hdfs • Dynamic access or “pin” data into memory • Complete hdfs file is loaded into memory Filter Agent Connector • Connector uploads agent to Hadoop nodes • Query passes selections and relevant predicates to agent • Data filtering and projection takes place locally on each Hadoop node • Only data of interest in loaded into memory via parallel load streams
  • 23. Not Only SQL Kognitio V8 External Scripts – Run third party scripts embedded within SQL • Perl, Python, Java, R, SAS, etc. • One-to-many rows in, zero-to-many rows out, one to one create interpreter perlinterp command '/usr/bin/perl' sends 'csv' receives 'csv' ; select top 1000 words, count(*) from (external script using environment perlinterp receives (txt varchar(32000)) sends (words varchar(100)) script S'endofperl( while(<>) { chomp(); s/[,.!_]//g; foreach $c (split(/ /)) { if($c =~ /^[a-zA-Z]+$/) { print "$cn”} } } )endofperl' from (select comments from customer_enquiry))dt group by 1 order by 2 desc; This reads long comments text from customer enquiry table, in line perl converts long text into output stream of words (one word per row), query selects top 1000 words by frequency using standard SQL aggregation
  • 24. Hardware Requirements for In-memory Platforms • Hadoop = industry standard servers • Careful to avoid vendor lock-in • Off the shelf, low cost, servers match neatly with Hadoop – Intel or AMD CPU (x86) – No special components • Ethernet network • Standard OS
  • 25. Benefits of an In-memory Analytical Platform • A seamless in-memory analytical layer on top of your data persistence layer(s): Analytical queries that used to run in hours and minutes, now run in minutes and seconds (often sub-second) High query throughput = massively higher concurrency Flexibility • Enables greater query complexity • Users freely interact with data • Use preferred BI Tools (relational or OLAP) Reduced complexity • Administration de-skilled • Reduced data duplication
  • 26. • Big Data tools are here and ready for the Enterprise • An Enterprise Data Architecture model is essential • Hadoop can handle Enterprise workload  To reduce strain on legacy platforms  To reduce cost  To bring new business opportunities • Must be part of an overall data strategy • Not to be underestimated • The solution must be an Eco-System  There has to be a simple way to consume the data Page 26 The Learning from our Journey
  • 27. www.kognitio.com kognitio.com/blog twitter.com/kognitio linkedin.com/companies/kognitio facebook.com/kognitio youtube.com/user/kognitio Dr. Phil Shelley CEO – MetaScale CTO Sears Holdings Michael Hiskey Vice President Marketing & Business Development Michael.hiskey@kognitio.com Phone: +1 (855) KOGNITIO Upcoming Web Briefings: kognitio.com/briefings connect contact