SlideShare ist ein Scribd-Unternehmen logo
1 von 38
SKILLWISE-BIG DATA
IBM Big Data Platform
Overview
Big Data is a Hot Topic Because Technology Makes it Possible to Analyze
ALL Available Data
Cost effectively manage and analyze
all available data in its native form
unstructured, structured, streaming
ERP
CRM RFID
Website
Network Switches
Social Media
Billing
BIG DATA is not just HADOOP
Manage & store huge
volume of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation
Business-Centric Big Data Enables You to Start With a Critical Business Pain and Expand the
Foundation for Future Requirements
 “Big data” isn’t just a
technology—it’s a business
strategy for capitalizing on
information resources
 Getting started is crucial
 Success at each entry point is
accelerated by products within
the Big Data platform
 Build the foundation for future
requirements by expanding
further into the big data platform
1 – Unlock Big Data
Customer need
• Understand existing data sources
• Search and navigate data within existing
systems
• No copying of data
Value statement
• Get up and running quickly
• Discover and retrieve big data
• Work even with big data sources – by
business users
Solution
• Vivisimo Velocity renamed to
• IBM InfoSphere DataDiscovery
2 – Analyze Raw Data
Customer need
• Ingest data as-is into Hadoop
• Combine it with data from DWH
• Process very large volume of data
Value statement
• Gain new insight
• Overcome the high cost of converting
data from unstructured to structured
format
• Experiment with analysis on different
data and combine them with other
sources
Solution
• IBM InfoSphere BigInsights
Merging the Traditional and Big Data
Approaches
IT
Structures the
data to answer
that question
IT
Delivers a platform to
enable creative
discovery
Business
Explores what questions
could be asked
Business Users
Determine what
question to ask
Monthly sales reports
Profitability analysis
Customer surveys
Brand sentiment
Product strategy
Maximum asset utilization
Big Data Approach
Iterative & Exploratory Analysis
Traditional Approach
Structured & Repeatable Analysis
InfoSphere BigInsights is more than
just HADOOP
IBM InfoSphere Big Insights
• Is much more than HADOOP
IBM Big data platform
• Includes much more than
IBM InfoSphere Big Insights
Hadoop
 Open-source software framework from Apache
 Inspired by
 Google MapReduce
 GFS (Google File System)
 HDFS
 Map/Reduce
InfoSphere BigInsightsPlatform for volume, variety,
velocity
 Enhanced Hadoop
foundation
Analytics
 Text analytics & tooling
 Application accelerators
Usability
 Web console
 Spreadsheet-style tool
 Ready-made “apps”
Enterprise Class
 Storage, security, cluster
management
Integration
 Connectivity to Netezza,
DB2, JDBC databases, etc
Apache
Hadoop
Basic Edition
Enterprise Edition
Licensed
Application accelerators
Pre-built applications
Text analytics
Spreadsheet-style tool
RDBMS, warehouse connectivity
Administrative tools, security
Eclipse development tools
Performance enhancements
. . . .
Free download
Integrated install
Online InfoCenter
BigData Univ.
Breadth of capabilities
Enterpriseclass
Can run also on top of
Spreadsheet-style Analysis
Web-based analysis
and visualization
Spreadsheet-like
interface
 Define and manage
long running data
collection jobs
 Analyze content of
the text on the
pages that have
been retrieved
Build a Big Data Program – MapReduce example
Eclipse tools
For Jaql, Hive, Pig Java MapReduce, BigSheets
plug-ins, text analytics, etc.
JAQL – IBM’s programming language in hadoop world
• Jaql is a complete solutions environment
supporting all other BigInsights components Integration point for
various analytics
– Text analytics
– Statistical analysis
– Machine learning
– Ad-hoc analysis
 Integration point for
various data sources
– Local and distributed file
systems
– NoSQL data bases
– Content repositories
– Relational sources
(Warehouses, operational
data bases)
BigInsightsText
Analytics
StatisticalAnalysis
(Rmodule)
Machinelearning
(SystemML)
Ad-Hocanalysis
(BigSheets)
(Integration)DB2,
Netezza,Streams,
…
Jaql
Jaql I/O Jaql Core
Operators
Jaql Modules
DFS NoSQL RDBMS File System
BigInsights
Data warehouse
Traditional
analytic
tools
Big Data
analytic
applications
Filter Transform Aggregate
BigInsights and the data warehouse
3 – Simplify your warehouse
Customer need – SIGNIFICANTLY
• Make performance of DWH better
• Reduce DWH administration costs
Value statement
• Speed: 10 – 100x better performance
• Simplicity: Administration costs reduced by 75% - 90%
• Scalability
• Smart system
• In-database analytics
• Out-of-the box integration with SPSS
Solution
• IBM Netezza renamed to
• PureData System for Analytics
Analyst
IT
I need to evaluate the possible
relationship between client salary
and overdrafts
OK. We have to evaluate a lot of
statistics, set the correct db
indexes and db partitioning. It will
take us 5 days.
Analyst IT
Great. Thanks a lot.
I’m going to check the results.
Done. You can run your analytical
query.
Analyst IT
Great. I can see here some nice
correlations. Now I need to look at it
from the different perspective.
Ohhh, welcome dear friend.
Understand. So, it’s …. another 5
days of our work
Noooo!!!
It’s not possible to work
here!
And now with Netezza ...
Analyst
IT
I need to evaluate the possible
relationship between client salary
and overdrafts.
I will use Netezza.
Analyst IT
Great. I can see here some nice
correlations. Now I need to look at it from
the different perspective.
With Netezza I can run the query
immediately. The response will be in the
same time
IT can do something else
– much more useful
Built-In Expertise Makes This as Simple as an Appliance
2
 Dedicated device
 Optimized for purpose
 Complete solution
 Fast installation
 Very easy operation
 Standard interfaces
 Low cost
IBM Netezza was renamed to IBM PureData System for Analytics
In October 2012
Netezza
Genesis in T-Mobile CZ
Proof-Of-Concept Project
–New EnterpriseDataWarehouse platform selection
–Comparison of existing and other platforms
–Selection Criteria
• Performance
• Operational Savings
….and the winner was: Netezza
Netezza Genesis in T-Mobile CZ
Expectations
Significant response improvement:
Faster platform means better reports response
Direct Data Availability
Higher trust in data , one version of truth
Aggregation reduction
Any attribute available
Operational Benefits
Storage savings (no data replicas)
Administration costs reduction(DBA)
Infrastructure Simplification
Lower environment complexity
Netezza Genesis in T-Mobile CZ
Project Implementation
–EDW platform migration
•Netezza platform implementation
•ETL graphs/processes redesign
–BI Front-End Tool Migration
•SAP Business Object implementation
•All reports redesign
Main Integration Partner: T-System CZ
Netezza Genesis in T-Mobile CZ
Actual Status
All relevant ETL procecessing redesigned
Actual parallel run to Original and Netezza platform finished
Netezza as only primary platform
Original
Platform
Netezza
Workflow Reporting 2 hours 1 minute
Invoicing and Payments reporting
Payment discipline of current month invoices 33 minutes 17 seconds
Overdue Debt of Invoices – in Current Month 10 hours 23 seconds
Average Monthly Invoice Figures 50 minutes 38 seconds
RESPONSE TIME MASSIVELY IMPROVED
Real Netezza experience from T-Mobile Czech Rep.
4 – Reduce costs with Hadoop
Customer need – SIGNIFICANTLY
• Too much data => Too expensive to store and to maintain
• Big portion is used “just in case”
• Data amount is still growing => it’s more expensive
• => too expensive to have all data in standard DWH
Value statement
• Leverage the architecture of parallel processing in Hadoop
• Hadoop uses cheap commodity HW
• Enable business users still work in the same or similar way
Solution
• IBM InfoSphere BigInsights
BigInsights and the data warehouse
BigInsights
• Query-ready archive for “cold” warehouse data
Data Warehouse
Big Data
analytic
applications
Traditional
analytic
tools From Cognos BI
via Hive JDBC
Application
SQL interface Engine
InfoSphere BigInsights
HiveTables HBase tables CSV Files
Data Sources
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Future: The SQL interface . . . .
• Rich SQL query capabilities
– SQL '92 and 2011 features
– Correlated subqueries
– Windowed aggregates
• SQL access to all data stored in
InfoSphere BigInsights
• Robust JDBC/ODBC support
• Take advantage of key features of
each data source
• Leverage MapReduce parallelism
OR
achieving low-latency
5 – Analyze Streaming Data
Customer need
• Process and leverage streaming data
• Select valuable data from data stream for
future processing
• Quickly process data going to be useless if it’s
not processed immediately
Value statement
• React in real-time to take an oppurtinity
before it expires
• Periodically adjust streaming models based
on analysis on data at rest
Solution
• IBM InfoSphere Streams
Streams Computing
Streaming Data
Sources
ACTION
Why and when to use InfoSphere
Streams?
Sensors
 Environmental, Industrial, GPS, …
 Images, Videos, …
Data Exhaust
 Network data
 system logs (web server, app server), …
High-rate transaction data
 Financial transactions
 CDRs
Isolation
 Processing in isolation
 … or in limited windows (time / nr. Of records)
Non-traditional formats included  Spatial data, images, text, voice, …
Integration challenges
 Different connection methods
 Different data rates
 Different processing requirements
Multiple processing nodes  Volume / rate very high => scalability required
Sub-millisecond latency  Immediate analysis and response
Store & mine approach doesn’t work  Because of very high volume of data (and its rates)
At least 2 criteria from the list bellow should be fulfilled
Applications needing on-fly processing, filtering and analyzing streaming data
Streams and BigInsights - Integrated Analytics on Data in Motion &
Data at Rest
1. Data Ingest
Data Integration,
data mining,
machine learning,
statistical modeling
Visualization of real-
time and historical
insights
3. Adaptive Analytics Model
Data ingest,
preparation, online
analysis, model
validation
Data
2. Bootstrap/Enrich
Contro
l flow
InfoSphere
BigInsights,
Database &
Warehouse
InfoSphere
Streams
The Platform Advantage
BI /
Reporting
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
BENEFITS IN DETAIL
Increase over
time
 By moving from entry to a 2nd
and 3rd project
Lowering
deployment costs
 Shared components
 Integration
Points of leverage  Shared text analytics for
Streams and BigInsights
 HDFS connectors (data
integration (ETL, …),
Streams)
 Accelerators
 Build across multiple
engines
Skillwise Big Data part 2

Weitere ähnliche Inhalte

Was ist angesagt?

2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
jdijcks
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016
bzigman
 

Was ist angesagt? (20)

2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered Analytics
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital Transformation
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
 
Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...Integrating BigInsights and Puredata system for analytics with query federati...
Integrating BigInsights and Puredata system for analytics with query federati...
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 

Andere mochten auch (7)

SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Skillwise-IMS DB
Skillwise-IMS DBSkillwise-IMS DB
Skillwise-IMS DB
 
Skillwise Elementary Java Programming
Skillwise Elementary Java ProgrammingSkillwise Elementary Java Programming
Skillwise Elementary Java Programming
 
DB2 on Mainframe
DB2 on MainframeDB2 on Mainframe
DB2 on Mainframe
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
Advanced REXX Programming Techniques
Advanced REXX Programming TechniquesAdvanced REXX Programming Techniques
Advanced REXX Programming Techniques
 
SKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPTSKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPT
 

Ähnlich wie Skillwise Big Data part 2

Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Ähnlich wie Skillwise Big Data part 2 (20)

Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 

Mehr von Skillwise Group

Mehr von Skillwise Group (20)

Skillwise Consulting New updated
Skillwise Consulting New updatedSkillwise Consulting New updated
Skillwise Consulting New updated
 
Email Etiquette
Email Etiquette Email Etiquette
Email Etiquette
 
Healthcare profile
Healthcare profileHealthcare profile
Healthcare profile
 
Manufacturing courses
Manufacturing coursesManufacturing courses
Manufacturing courses
 
Retailing & logistics profile
Retailing & logistics profileRetailing & logistics profile
Retailing & logistics profile
 
Skillwise orientation
Skillwise orientationSkillwise orientation
Skillwise orientation
 
Overview- Skillwise Consulting
Overview- Skillwise Consulting Overview- Skillwise Consulting
Overview- Skillwise Consulting
 
Skillwise corporate presentation
Skillwise corporate presentationSkillwise corporate presentation
Skillwise corporate presentation
 
Skillwise Profile
Skillwise ProfileSkillwise Profile
Skillwise Profile
 
Skillwise Softskill Training Workshop
Skillwise Softskill Training WorkshopSkillwise Softskill Training Workshop
Skillwise Softskill Training Workshop
 
Skillwise Insurance profile
Skillwise Insurance profileSkillwise Insurance profile
Skillwise Insurance profile
 
Skillwise Train and Hire Services
Skillwise Train and Hire ServicesSkillwise Train and Hire Services
Skillwise Train and Hire Services
 
Skillwise Digital Technology
Skillwise Digital Technology Skillwise Digital Technology
Skillwise Digital Technology
 
Skillwise Boot Camp Training
Skillwise Boot Camp TrainingSkillwise Boot Camp Training
Skillwise Boot Camp Training
 
Skillwise Academy Profile
Skillwise Academy ProfileSkillwise Academy Profile
Skillwise Academy Profile
 
Skillwise Overview
Skillwise OverviewSkillwise Overview
Skillwise Overview
 
Skillwise - Business writing
Skillwise - Business writing Skillwise - Business writing
Skillwise - Business writing
 
Imc.ppt
Imc.pptImc.ppt
Imc.ppt
 
Skillwise cics part 1
Skillwise cics part 1Skillwise cics part 1
Skillwise cics part 1
 
Skillwise AML
Skillwise AMLSkillwise AML
Skillwise AML
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Skillwise Big Data part 2

  • 2. IBM Big Data Platform Overview
  • 3. Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data Cost effectively manage and analyze all available data in its native form unstructured, structured, streaming ERP CRM RFID Website Network Switches Social Media Billing
  • 4. BIG DATA is not just HADOOP Manage & store huge volume of any data Hadoop File System MapReduce Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Data WarehousingStructure and control data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM Understand and navigate federated big data sources Federated Discovery and Navigation
  • 5. Business-Centric Big Data Enables You to Start With a Critical Business Pain and Expand the Foundation for Future Requirements  “Big data” isn’t just a technology—it’s a business strategy for capitalizing on information resources  Getting started is crucial  Success at each entry point is accelerated by products within the Big Data platform  Build the foundation for future requirements by expanding further into the big data platform
  • 6. 1 – Unlock Big Data Customer need • Understand existing data sources • Search and navigate data within existing systems • No copying of data Value statement • Get up and running quickly • Discover and retrieve big data • Work even with big data sources – by business users Solution • Vivisimo Velocity renamed to • IBM InfoSphere DataDiscovery
  • 7. 2 – Analyze Raw Data Customer need • Ingest data as-is into Hadoop • Combine it with data from DWH • Process very large volume of data Value statement • Gain new insight • Overcome the high cost of converting data from unstructured to structured format • Experiment with analysis on different data and combine them with other sources Solution • IBM InfoSphere BigInsights
  • 8. Merging the Traditional and Big Data Approaches IT Structures the data to answer that question IT Delivers a platform to enable creative discovery Business Explores what questions could be asked Business Users Determine what question to ask Monthly sales reports Profitability analysis Customer surveys Brand sentiment Product strategy Maximum asset utilization Big Data Approach Iterative & Exploratory Analysis Traditional Approach Structured & Repeatable Analysis
  • 9. InfoSphere BigInsights is more than just HADOOP IBM InfoSphere Big Insights • Is much more than HADOOP IBM Big data platform • Includes much more than IBM InfoSphere Big Insights
  • 10. Hadoop  Open-source software framework from Apache  Inspired by  Google MapReduce  GFS (Google File System)  HDFS  Map/Reduce
  • 11. InfoSphere BigInsightsPlatform for volume, variety, velocity  Enhanced Hadoop foundation Analytics  Text analytics & tooling  Application accelerators Usability  Web console  Spreadsheet-style tool  Ready-made “apps” Enterprise Class  Storage, security, cluster management Integration  Connectivity to Netezza, DB2, JDBC databases, etc Apache Hadoop Basic Edition Enterprise Edition Licensed Application accelerators Pre-built applications Text analytics Spreadsheet-style tool RDBMS, warehouse connectivity Administrative tools, security Eclipse development tools Performance enhancements . . . . Free download Integrated install Online InfoCenter BigData Univ. Breadth of capabilities Enterpriseclass Can run also on top of
  • 12. Spreadsheet-style Analysis Web-based analysis and visualization Spreadsheet-like interface  Define and manage long running data collection jobs  Analyze content of the text on the pages that have been retrieved
  • 13. Build a Big Data Program – MapReduce example Eclipse tools For Jaql, Hive, Pig Java MapReduce, BigSheets plug-ins, text analytics, etc.
  • 14. JAQL – IBM’s programming language in hadoop world • Jaql is a complete solutions environment supporting all other BigInsights components Integration point for various analytics – Text analytics – Statistical analysis – Machine learning – Ad-hoc analysis  Integration point for various data sources – Local and distributed file systems – NoSQL data bases – Content repositories – Relational sources (Warehouses, operational data bases) BigInsightsText Analytics StatisticalAnalysis (Rmodule) Machinelearning (SystemML) Ad-Hocanalysis (BigSheets) (Integration)DB2, Netezza,Streams, … Jaql Jaql I/O Jaql Core Operators Jaql Modules DFS NoSQL RDBMS File System
  • 16. 3 – Simplify your warehouse Customer need – SIGNIFICANTLY • Make performance of DWH better • Reduce DWH administration costs Value statement • Speed: 10 – 100x better performance • Simplicity: Administration costs reduced by 75% - 90% • Scalability • Smart system • In-database analytics • Out-of-the box integration with SPSS Solution • IBM Netezza renamed to • PureData System for Analytics
  • 17. Analyst IT I need to evaluate the possible relationship between client salary and overdrafts OK. We have to evaluate a lot of statistics, set the correct db indexes and db partitioning. It will take us 5 days.
  • 18. Analyst IT Great. Thanks a lot. I’m going to check the results. Done. You can run your analytical query.
  • 19. Analyst IT Great. I can see here some nice correlations. Now I need to look at it from the different perspective. Ohhh, welcome dear friend. Understand. So, it’s …. another 5 days of our work Noooo!!! It’s not possible to work here!
  • 20. And now with Netezza ...
  • 21. Analyst IT I need to evaluate the possible relationship between client salary and overdrafts. I will use Netezza.
  • 22. Analyst IT Great. I can see here some nice correlations. Now I need to look at it from the different perspective. With Netezza I can run the query immediately. The response will be in the same time IT can do something else – much more useful
  • 23.
  • 24. Built-In Expertise Makes This as Simple as an Appliance 2  Dedicated device  Optimized for purpose  Complete solution  Fast installation  Very easy operation  Standard interfaces  Low cost
  • 25. IBM Netezza was renamed to IBM PureData System for Analytics In October 2012
  • 26. Netezza Genesis in T-Mobile CZ Proof-Of-Concept Project –New EnterpriseDataWarehouse platform selection –Comparison of existing and other platforms –Selection Criteria • Performance • Operational Savings ….and the winner was: Netezza
  • 27. Netezza Genesis in T-Mobile CZ Expectations Significant response improvement: Faster platform means better reports response Direct Data Availability Higher trust in data , one version of truth Aggregation reduction Any attribute available Operational Benefits Storage savings (no data replicas) Administration costs reduction(DBA) Infrastructure Simplification Lower environment complexity
  • 28. Netezza Genesis in T-Mobile CZ Project Implementation –EDW platform migration •Netezza platform implementation •ETL graphs/processes redesign –BI Front-End Tool Migration •SAP Business Object implementation •All reports redesign Main Integration Partner: T-System CZ
  • 29. Netezza Genesis in T-Mobile CZ Actual Status All relevant ETL procecessing redesigned Actual parallel run to Original and Netezza platform finished Netezza as only primary platform
  • 30. Original Platform Netezza Workflow Reporting 2 hours 1 minute Invoicing and Payments reporting Payment discipline of current month invoices 33 minutes 17 seconds Overdue Debt of Invoices – in Current Month 10 hours 23 seconds Average Monthly Invoice Figures 50 minutes 38 seconds RESPONSE TIME MASSIVELY IMPROVED Real Netezza experience from T-Mobile Czech Rep.
  • 31. 4 – Reduce costs with Hadoop Customer need – SIGNIFICANTLY • Too much data => Too expensive to store and to maintain • Big portion is used “just in case” • Data amount is still growing => it’s more expensive • => too expensive to have all data in standard DWH Value statement • Leverage the architecture of parallel processing in Hadoop • Hadoop uses cheap commodity HW • Enable business users still work in the same or similar way Solution • IBM InfoSphere BigInsights
  • 32. BigInsights and the data warehouse BigInsights • Query-ready archive for “cold” warehouse data Data Warehouse Big Data analytic applications Traditional analytic tools From Cognos BI via Hive JDBC
  • 33. Application SQL interface Engine InfoSphere BigInsights HiveTables HBase tables CSV Files Data Sources SQL Language JDBC / ODBC Driver JDBC / ODBC Server Future: The SQL interface . . . . • Rich SQL query capabilities – SQL '92 and 2011 features – Correlated subqueries – Windowed aggregates • SQL access to all data stored in InfoSphere BigInsights • Robust JDBC/ODBC support • Take advantage of key features of each data source • Leverage MapReduce parallelism OR achieving low-latency
  • 34. 5 – Analyze Streaming Data Customer need • Process and leverage streaming data • Select valuable data from data stream for future processing • Quickly process data going to be useless if it’s not processed immediately Value statement • React in real-time to take an oppurtinity before it expires • Periodically adjust streaming models based on analysis on data at rest Solution • IBM InfoSphere Streams Streams Computing Streaming Data Sources ACTION
  • 35. Why and when to use InfoSphere Streams? Sensors  Environmental, Industrial, GPS, …  Images, Videos, … Data Exhaust  Network data  system logs (web server, app server), … High-rate transaction data  Financial transactions  CDRs Isolation  Processing in isolation  … or in limited windows (time / nr. Of records) Non-traditional formats included  Spatial data, images, text, voice, … Integration challenges  Different connection methods  Different data rates  Different processing requirements Multiple processing nodes  Volume / rate very high => scalability required Sub-millisecond latency  Immediate analysis and response Store & mine approach doesn’t work  Because of very high volume of data (and its rates) At least 2 criteria from the list bellow should be fulfilled Applications needing on-fly processing, filtering and analyzing streaming data
  • 36. Streams and BigInsights - Integrated Analytics on Data in Motion & Data at Rest 1. Data Ingest Data Integration, data mining, machine learning, statistical modeling Visualization of real- time and historical insights 3. Adaptive Analytics Model Data ingest, preparation, online analysis, model validation Data 2. Bootstrap/Enrich Contro l flow InfoSphere BigInsights, Database & Warehouse InfoSphere Streams
  • 37. The Platform Advantage BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse BENEFITS IN DETAIL Increase over time  By moving from entry to a 2nd and 3rd project Lowering deployment costs  Shared components  Integration Points of leverage  Shared text analytics for Streams and BigInsights  HDFS connectors (data integration (ETL, …), Streams)  Accelerators  Build across multiple engines