Bp presentation business intelligence and advanced data analytics september 25 2012 icpas, v2, 20130915
1. BUSINESS INTELLIGENCE
& ADVANCED ANALYTICS
The Search for Patterns,
Waldo, and Black Swans
Barrett Peterson, C.P.A.
ICPAS Chicago Metro Chapter, September 25, 2013
ICPAS Metro Chapter Barrett Peterson September 25, 2013 1
3. BIG DATA AND ANALYTICS - WHY
PREDICTION and
PATTERN
IDENTIFICATION
ICPAS Metro Chapter Barrett Peterson September 25, 2013 3
4. • Digitization Datafication
• Correlation, more that causality
• Reduced emphasis on sampling
• “Messy” data usable for many
applications, but not all
BIG DATA AND ANALYTICS –
CRITICAL ATTRIBUTES
ICPAS Metro Chapter Barrett Peterson September 25, 2013 4
5. • Reduced privacy and handling “private” data
• Over reliance on, and over confidence in. data
and analysis
• Currency – correlations can change over time
• Predictions are hard to make, especially about
the future. - Niels Bohr [Not Yogi Berra].
BIG DATA AND ANALYTICS
- RISKS
ICPAS Metro Chapter Barrett Peterson September 25, 2013 5
7. • Computer based business
intelligence systems is an idea
that is middle aged – about 40 .
Previously described as:
– Decision support systems [DSS]
– Executive information systems [EIS]
– Management information systems [MIS]
A LITTLE BACKGROUND
HISTORY
A trip down
memory
lane
ICPAS Metro Chapter Barrett Peterson September 25, 2013 7
8. • Internet Development
– ARAPNET and others – 1960s
– Internet Protocols – 1982, presumably by Al Gore
• IBM researcher Edgar Codd credited with development of
relational data base theory in 1970.
• IBM’s Donald Chamberlin and Raymond Boyce develop
structured query language [SQL] in the early 1970s to
manipulate and retrieve data from IBM’s early relational
data base management system
• World Wide Web and 1st web browser invented by Tim
Berners-Lee in 1990 by combining the internet, hypertext
mark-up language, and Uniform Resource Locator [URL]
system. Became Nexus.
• Mosaic, designed by Marc Andressen became the first
commercial web browser [Netscape].
• Development of big data enabling database designs and
high speed processing during the last 15 years.
A LITTLE BACKGROUND
History
Important
Technology
Inventions
ICPAS Metro Chapter Barrett Peterson September 25, 2013 8
9. • Development of the primary infrastructure
– Database design
– Processing and Storage Hardware
– Server Development and Massively Parallel Processing
• Improved telecommunications speed
• Hardware miniaturization, capacity, and speed
– Memory [RAM] capacity
– Storage capacity and transfer speed
– Bus speed
– Video processing capacity and speed
• Increased hardware speed and capacity
• Digital formats for sensors, cameras, RFID, and other data
collection sources
• Mobile computing
• “Cloud” capability exploits many of these developments
A LITTLE BACKGROUND
History
Drivers
Enabling
BI and
Advanced
Analytics
ICPAS Metro Chapter Barrett Peterson September 25, 2013 9
10. • Analytics
• Business Intelligence
• Knowledge Management
• Content Management
• Data Mining
• Big Data
• Data Integration
• Datafication
• Gameification
• Blob [Binary Large Object]
A LITTLE BACKGROUND
TERMINOLOGY
A consultant’s
collection of
confusing names -
a sampler
ICPAS Metro Chapter Barrett Peterson September 25, 2013 10
11. • CPU speed and power
– Moore’s law
– Multi-core chips
– Solid State Memory
• Storage improvement and cost reduction
– Greatly increased capacity – petabytes and more;
IBM’s first hard drive in 1958 was 3.75MB
– Greatly increased access/transfer speed
– Greatly reduced cost
• Data collection from a wide range of devices
• Data communications – speed and volume
• Database management techniques and
software
• Application speed and power
A LITTLE BACKGROUND
Drivers
And
Enablers
of
Big
Data
ICPAS Metro Chapter Barrett Peterson September 25, 2013 11
13. A system comprised of “computer”
hardware, storage hardware,
operating system, database software,
file systems, and application software
to:
• Collect, “clean”, filter, “tag”, and integrate
data
• Store data [hardware and software]
• Provide knowledge management, analytical ,
and presentation tools to translate data into
decision useful information
TONIGHT’S CRITICAL DEFINITIONS
Business
Intelligence
ICPAS Metro Chapter Barrett Peterson September 25, 2013 13
14. • Prehistoric – Mainframe Era
– DSS, EIS, MIS
– Hierarchical Master Data Files
• The Current Era [Primarily] – Business Intelligence
– Primarily “structured” data [data that can be
represented in relational /dimensional tables or flat
files], and BLOB [binary large object] formats
– Analysis of “known”, defined ,patterns
– Presented in tables, simple charts, and dashboards
• Emerging – Big Data and Advanced Analytics
– to discover new, changing, or variable patterns
– A wide variety of “unstructured” digital data
formats added to “structured” data
– Emerging storage structures
– “Exploratory” analytics
– Zoomable User Interface [ZUIs]
– Solid State Memory and Solid-State Drives
TONIGHT’S CRITICAL DEFINITIONS
Business
Intelligence
Generations
ICPAS Metro Chapter Barrett Peterson September 25, 2013 14
16. • Computer – CPU, Memory, and Operating System Software
• Data Collection
– Master Data Management
– Collection Processes and Devices
– Data Cleansing Processes and Software
• Data Storage – Petabyte capable
– Physical Devices and Storage Management Software
– Data Management and Integration
– Database Software Storage
• Relational – Traditional ERP/Transaction systems
• Dimensional – Traditional Data Warehouse, including
associated BLOB
• Distributed , Multiple Server, Storage Systems
• NoSQL [Not Only SQL] Distributed Operational Stores
• Apache Hadoop for Highly Parallel Processing and certain
Intensive Data Analytics Applications
• DBMS System: Apache Cassandra; Amazon Dynamo
• Middleware Software
• High Speed Data Communications – Petaflop capable
• Business Intelligence Application Software
– OLAP, Dashboard, and Chart Reports
– Statistical Analysis and Presentation Tools
BUSINESS INTELLIGENCE ELEMENTS
Principal
Components
for
Maximum
Application
ICPAS Metro Chapter Barrett Peterson September 25, 2013 16
17. • Data Governance and Management
– Uniform terminology
– Uniform meaning
– Uniform units of measure
– Metadata
• Data Structure and Attributes
– Structured - Relational/Dimensional
– Unstructured
– Rate of change, context, and other attributes
• Data Collection and Preparation
– Filtering, particularly “Big Data”, and “tagging”
– Extract, Transform, Load [ETL] for “structured data
• Data Base File Systems
• Data Storage and Retrieval
– Capacity
– Access/Retrieval speed
BUSINESS INTELLIGENCE ELEMENTS
DATA
ISSUES:
THE
CORNERESTONE
ICPAS Metro Chapter Barrett Peterson September 25, 2013 17
18. • Metadata management
– Business definitions , rules, sources
– Technical attributes, such as type, scale,
transformation methods
– Processing requirements – filtering, tagging, ETL,
aggregation, summarization
• Data Definitions and data dictionaries
– Name
– Unit(s) of measure
• Data collection and filtering or transforming requirements
– Sources – internal and external
– Context addition/filtering requirements
• Data integration specifications
– Multiple platforms and applications
– Mapping to intermediate data marts
• Privacy requirements
– Personal Identifying data
– Laws: HIPPA, Privacy act
BUSINESS INTELLIGENCE ELEMENTS
MASTER
DATA
GOVERNANCE
AND
MANAGEMENT
ICPAS Metro Chapter Barrett Peterson September 25, 2013 18
19. • Data Structures
– “Structured” Data , principally text and
numbers capable of incorporation in relational
or dimensional tables
– “Unstructured” Data, not suitable for relational
tables, many in newer data formats, including
images
• Big Data Attributes
– Both “structured” and “unstructured”
– The four major “Vs” of big data
• Volume - huge
• Velocity – fast changing, unlike structured
• Variety – format and content
• Variability – lacks the consistency, and perhaps
precision, of structured data
BUSINESS INTELLIGENCE ELEMENTS
Data
Structures
and
Attributes
Are Critical
Drivers
ICPAS Metro Chapter Barrett Peterson September 25, 2013 19
20. • Content Structure – Traditional Financial Data
– Numerical
– Sign/Debit or Credit
– Text Descriptions
• Database Management Structures
– Legacy Systems: Hierarchical and Network
– Transaction Systems: Relational
• Relations [Tables]. Attribute [columns], Instance [Rows]
• Rules: no duplicate rows; single value for attributes
– Warehouse Systems: Dimensional
• Facts [data items, usually a dollar amount or unit count]
• Measures – dollar or count for facts
• Dimensions – groups of hierarchies and descriptors of
various aspects or context for the facts/measures
– Big Data Databases Unstructured
• Microsoft Office and Similar File Formats
• Photography and Art
BUSINESS INTELLIGENCE ELEMENTS
Data
Structures
IT
Lingo
ICPAS Metro Chapter Barrett Peterson September 25, 2013 20
21. RELATIONAL
TABLE
ILLUSTRATION
“Tuple” is borrowed from mathematics
and set theory and is used in database
design to refer to the attributes of an
“item” or “value” [row], the subject or
title of the table. Value examples include
customers, vendors, orders, product SKUs
Business Intelligence Elements
ICPAS Metro Chapter Barrett Peterson September 25, 2013 21
23. • Numbers and words/letters
– Relational/Dimensional
– Spreadsheets
– Word Processing documents
• Sound and Music
• Photo
• Video
• Video Game
• CAD Design
• Graphical
– PDF
– Raster, Vector Graphics
– Statistical Visualization
• Scientific
• Signal
• XML [Web based mark-up formats]
• Geo-Location
• Web Logs
BUSINESS INTELLIGENCE ELEMENTS
DATA
FILE
TYPE
CATEGORIES,
ALMOST
ENGLISH
ICPAS Metro Chapter Barrett Peterson September 25, 2013 23
24. • Collection
– Company transaction/ERP systems
– Purchased, such as Nielsen, IRI
– Vendor supplied, such as bank transactions
– Sensor readings
– Cameras
– Mobile device traffic – Phones, Tablets
• Filtering
– Adding context such as date or location
– Eliminating “chatter” from high volume data
– Error correction
• Aggregation & Integration
BUSINESS INTELLIGENCE ELEMENTS
DATA
COLLECTION
AND
PREPARATION
ICPAS Metro Chapter Barrett Peterson September 25, 2013 24
25. DATA COLLECTION - RFID
RFID tag RFID tag reader
ICPAS Metro Chapter Barrett Peterson September 25, 2013 25
27. DATA FILTERING AND CLEANSING IS IMPORTANT
ICPAS Metro Chapter Barrett Peterson September 25, 2013 27
28. • Relational – SQL
• Dimensional – SQL, OLAP
• Binary Large Object [BLOB] – binary data, most often
photos, video, audio, or PDF files
• Massively Parallel-Processing [MPP]
• Apache Hadoopp Distributed File System [HDFS] – Java
– Google File System [GFS], used solely by Google
– Google Map Reduce
• Amazon S3 filesystem [used by Amazon]
• NoSQL, MySQL
• Storm
• Resource Description Framework [RDF] Databases, like Big Data
BUSINESS INTELLIGENCE ELEMENTS
DATA
BASE
FILE
SYSTEMS
ICPAS Metro Chapter Barrett Peterson September 25, 2013 28
29. BUSINESS INTELLIGENCE ELEMENTS
SELECT
BIG DATA
DATABASE
MANAGEMENT
SYSTEMS
• Significant Originators
– Google MapReduce
– Google File System [GFS]
– Amazon S3 filesystem
• Continuing Developments
– Apache Software Foundation
• Apache Cassandra distributed database management
system
• Apache Hadoop software framework to support data-
intensive distributed applications
• Apache Hive, a data warehouse structure built on
Hadoop
• Pig - high level programming language for creating
MapReduce programs with Hadoop
– Significant to Technology Development
• Facebook [uses MySQL as a DBMS system, with
Memcache]
• Yahoo
• LinkedIn [Project Voldemort]
ICPAS Metro Chapter Barrett Peterson September 25, 2013 29
30. • Convergence aspect of mainframes and
servers
• Massively parallel , multiple
server, distributed processing, in multiple
data centers – grid computing
• Multi-core , high capacity, lower power
consumption, CPUs
• Memory servers for RAM employing
DRAM comprised of Fully Buffered Direct
Inline Memory Modules [FBDIMM]
• Solid state flash drive storage
• Greatly improved., and less costly, hard
drive storage
BUSINESS INTELLIGENCE ELEMENTS
COMPUTER
HARDWARE
CONSIDERATIONS
ICPAS Metro Chapter Barrett Peterson September 25, 2013 30
31. BI CONFIGURATION SIZES
Small – BI, but
not Big Data
capable Medium
Large – IBM Sequoia At
Livermore Labs
ICPAS Metro Chapter Barrett Peterson September 25, 2013 31
32. • Data Storage Terminology
– Memory – CPU direct connected, often called RAM
– Storage – not directly connected to the CPU
• Data Storage Device Types
– Memory
• DRAM – based
• Flash memory – based Solid-State Drives [SSDs]
– Storage
• Hard Disk Drives [HDD]
• Optical Drives – CDs, DVDs
• Data Storage Systems
– Direct Attached
– Network Attached Storage [NAS]
– Storage Area Network [SAN]
– pNFS – Parallel Network file systems
BUSINESS INTELLIGENCE ELEMENTS
DATA
STORAGE
HARDWARE/
SOFTWARE
ICPAS Metro Chapter Barrett Peterson September 25, 2013 32
33. • Traditional Reporting Systems
– ERP systems, including extract and presentation tools
– Downloads to Excel and similar programs for analysis
using functions and pivot tables
• Presentation Tools
• Specialized Analytics
– IBM InfoSphere BigInsights and InfoSphere Streams
– IBM Netezza
– ParAccel Analytic Database
– EMC Greenplum
– SAS High Performance Computing
– Information Builders WebFocus
• Exploratory Tools, like IBM SPSS [originally Statistical Package
for the Social Sciences]
– Data mining with specialized algorithms
– Statistical analysis and related charting software
BUSINESS INTELLIGENCE ELEMENTS
BI
APPLICATION
SOFTWARE
ICPAS Metro Chapter Barrett Peterson September 25, 2013 33
34. • BI Reporting
• Predictive Analytics
• Data Exploration - correlation
• Data Visualization - graphical
• Instrumentation Analytics
• Content Analytics
• Web Analytics
• Functional Applications
• Industry Applications
• Location Tracking
BUSINESS INTELLIGENCE ELEMENTS
ADVANCED
ANALYTICS
APPLICATION
TYPES
ICPAS Metro Chapter Barrett Peterson September 25, 2013 34
39. • Sales and Operations Planning
• Financial Instruments Modeling
• Production Control
• Online Retail
• Economics and Policy Development
• Agriculture/Farming
• Weather Analysis/Prediction
• Environmental Impact Assessment
• Healthcare Diagnosis and Records Management
• Genomic Analytics and Pharmaceutical and Medical Research
• Natural Resource Exploration
• Research Physics
• Road, Rail Traffic Management
• Security Surveillance: Business, Government
• Astronomy
• Logistics Management, Including GPS Tracking
• Electrical and Telecommunications Grids Mgmt
• Social Media –
Facebook, LinkedIn, Google+, Twitter, YouTube, Pinterest
• TV shows – Star Trek, Person of Interest
• Cloud Services – computing, Storage
• Credit Scoring
SELECTED
EXAMPLES
OF USES
ICPAS Metro Chapter Barrett Peterson September 25, 2013 39
40. • Retail
– Amazon
– Dell
– Delta Sonic Car Washes
• Data Services
– IBM
– Google
– Amazon
• Financial Services
• Manufacturing
– McCain Foods – Frozen foods
– Boeing
• Transportation and Logistics
– Logistics – UPS, FedEx
– Rail – UP, CSX, TTX
– Air – United, AMR, Southwest
• Social Media
– LinkedIn
– Facebook
• Government
– NSA PRISM and Other tools
– CIA – Palantir Software
• Medicine and Health
– Center for Disease Control (CDC)
– J. Craig Venter Institute
• Science
– Livermore Labs
SELECTED
USERS
ICPAS Metro Chapter Barrett Peterson September 25, 2013 40
41. • Technical Elements
– Direct on-line access
– Amazon specialized “Big Data” database
– Distributed and extremely large data
centers
– Highly automated, high technology
warehouses
– High supplier [vendors] integration
• User Benefits
– Favorable prices
– Suggested associated purchases
– Individual interest advertising
SELECTED EXAMPLES OF USE
AMAZON
ICPAS Metro Chapter Barrett Peterson September 25, 2013 41
42. • Technical Elements
– Web driven order entry and custom
purchase configuration
– Tracking of sales correspondence with
promotional offers
– Supplier re-order integration
• User Benefits
– Ability to customize purchase
– Reasonable cost
– Prompt delivery
SELECTED EXAMPLES OF USE
DELL
ICPAS Metro Chapter Barrett Peterson September 25, 2013 42
43. • Technical components
– Shared component and assembly designs
– More detailed quality specifications and
product tolerances
– Control of assembly schedule
– “Real time” exchange of technical
information
– Dissemination of best practices
• Customer benefits
– Faster deliveries
– Increased product quality
– Reduced defects
SELECTED EXAMPLES OF USE
BOEING
ICPAS Metro Chapter Barrett Peterson September 25, 2013 43
44. • Techniques employed
– Collect cellphone and GPS signals, traffic
cameras, and roadside sensors
– Identify accidents, traffic jams, and road damage
– Emergency vehicles can be dispatched
– Update traffic websites
– Sends messages to drivers’ GPS devices and
cellphones
– Uses supercomputers running Intrix application
• Benefits
– Eliminates traffic congestion faster
– More timely relief for accident victims
– Facilitate road paving scheduling
SELECTED EXAMPLES OF USE
NEW
JERSEY
DEPARTMENT
OF
TRANSPORTATION
ICPAS Metro Chapter Barrett Peterson September 25, 2013 44
45. • Technical Elements
– General LinkedIn Structure
• Personal Profile
• Individual Connections
• Groups
• Company and Other Searches
• Endorsements
• Attached application partners
– Slideshare, Owned by LinkedIn
• User Benefits
– Networking with professional contacts
– Personal branding capabilities
– Business Development
– Job Search enhancement
SELECTED EXAMPLES OF USE
LINKEDIN
ICPAS Metro Chapter Barrett Peterson September 25, 2013 45
46. LINKEDIN PROFILE PAGE SAMPLE
ICPAS Metro Chapter Barrett Peterson September 25, 2013 46
48. TRENDS
• More, bigger, faster – big data gets bigger
• Cloud services continue to expand
• Mobile computing expands
• Hadoop becomes more common
• Interactive data visualization will expand
• Social media type platforms will increase
their prominence
• Analytics skills demands will increase
• Privacy Issues will become prominent
ICPAS Metro Chapter Barrett Peterson September 25, 2013 48
49. RESOURCES
• Books
• Competing on Analytics, Davenport & Harris
• Analytics at Work, Davenport, Harris, & Morison
• The Data Asset, Fisher
• Data Strategy, Adelman, Moss, Abai
• Big Data, Cukier, Mayer-Schonberger
• Websites
• The Data Warehouse Institute – tdwi.org
• IBM data analytics: www.ibm.com, smarter planet
ICPAS Metro Chapter Barrett Peterson September 25, 2013 49
50. SUMMARY
WHY USE BI AND ADVANCED ANALYTICS
INSIGHT
FROM
DATA
ICPAS Metro Chapter Barrett Peterson September 25, 2013 50