Bp April 12 2010 Presentation Accounting Principles Under Development What ...
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
1. BUSINESS INTELLIGENCE
& ADVANCED ANALYTICS
The Search for Patterns,
Waldo, and Black Swans
Barrett Peterson, C.P.A.
ICPAS Fox River Trail Chapter, June 28, 2012
4. A LITTLE BACKGROUND
HISTORY • Computer based business
A trip down intelligence systems is an idea
memory
lane that is middle aged – about 40 .
Previously described as:
– Decision support systems [DSS]
– Executive information systems [EIS]
– Management information systems [MIS]
5. A LITTLE BACKGROUND
• Internet Development
– ARAPNET and others – 1960s
– Internet Protocols – 1982, presumably by Al Gore
History • IBM researcher Edgar Codd credited with development
of relational data base theory in 1970.
Important • IBM’s Donald Chamberlin and Raymond Boyce develop
Technology structured query language [SQL] in the early 1970s to
Inventions manipulate and retrieve data from IBM’s early relational
data base management system
• World Wide Web and 1st web browser invented by Tim
Berners-Lee in 1990 by combining the internet,
hypertext mark-up language, and Uniform Resource
Locator [URL] system. Became Nexus.
• Mosaic, designed by Marc Andressen became the first
commercial web browser [Netscape].
• Development of big data enabling database designs and
high speed processing during the last 15 years.
6. A LITTLE BACKGROUND
• Development of the primary infrastructure
– Database design
– Processing and Storage Hardware
History – Server Development and Massively Parallel Processing
• Improved telecommunications speed
Drivers
• Hardware miniaturization, capacity, and speed
Enabling
– Memory [RAM] capacity
BI and
– Storage capacity and transfer speed
Advanced
– Bus speed
Analytics
– Video processing capacity and speed
• Increased hardware speed and capacity
• Digital formats for sensors, cameras, RFID, and other data
collection sources
• Mobile computing
• “Cloud” capability exploits many of these developments
7. A LITTLE BACKGROUND
• Analytics
TERMINOLOGY • Business Intelligence
A consultant’s
• Knowledge Management
collection of • Content Management
confusing names -
a sampler • Data Mining
• Big Data
• Data Integration
• Gameification
• Blob [Binary Large Object]
8. A LITTLE BACKGROUND
• CPU speed and power
– Moore’s law
Drivers
– Multi-core chips
And – Solid State Memory
Enablers • Storage improvement and cost reduction
– Greatly increased capacity
of – Greatly increased access/transfer speed
Big – Greatly reduced cost
• Data collection from a wide range of devices
Data
• Data communications – speed and volume
• Database management techniques and
software
• Application speed and power
10. TONIGHT’S CRITICAL DEFINITIONS
Business A system comprised of “computer”
Intelligence hardware and software to:
• Collect, “clean”, filter, and integrate data
• Store data [hardware and software]
• Provide knowledge management,
analytical , and presentation tools to
translate data into decision useful
information
11. TONIGHT’S CRITICAL DEFINITIONS
• Prehistoric – Mainframe Era
– DSS, EIS, MIS
– Hierarchical Master Data Files
Business • The Current Era [Primarily] – Business Intelligence
– Primarily “structured” data [data that can be
Intelligence represented in relational /dimensional tables or flat
files], and BLOB [binary large object] formats
Generations – Analysis of “known” patterns
– Presented in tables, simple charts, and dashboards
• Emerging – Big Data and Advanced Analytics
– to discover new, changing, or variable patterns
– A wide variety of “unstructured” digital data
formats added to “structured” data
– Emerging storage structures
– “Exploratory” analytics
– Zoomable User Interface [ZUIs]
– Solid State Memory and Solid-State Drives
13. BUSINESS INTELLIGENCE ELEMENTS
• Computer – CPU, Memory, and Operating System Software
• Data Collection
– Master Data Management
– Collection Processes and Devices
– Data Cleansing Processes and Software
Principal • Data Storage
Components – Physical Devices and Storage Management Software
– Data Management and Integration
for – Database Software Storage
• Relational – Traditional ERP/Transaction systems
Maximum • Dimensional – Traditional Data Warehouse, including
Application associated BLOB
• Distributed , Multiple Server, Storage Systems
• NoSQL [Not Only SQL] Distributed Operational Stores
• Hadoop for Highly Parallel Processing and Intensive Data
Analytics Applications
• Middleware Software
• Business Intelligence Application Software
– OLAP, Dashboard, and Chart Reports
– Statistical Analysis and Presentation Tools
14. BUSINESS INTELLIGENCE ELEMENTS
• Data Governance and Management
– Uniform terminology
– Uniform meaning
DATA – Uniform units of measure
ISSUES: – Metadata
• Data Structure and Attributes
THE
– Structured - Relational/Dimensional
CORNERESTONE – Unstructured
– Rate of change, context, and other attributes
• Data Collection and Preparation
– Filtering, particularly “Big Data”
– Extract, Transform, Load [ETL] for “structured data
• Data Base File Systems
• Data Storage and Retrieval
– Capacity
– Access/Retrieval speed
15. BUSINESS INTELLIGENCE ELEMENTS
• Metadata management
– Business definitions , rules, sources
– Technical attributes, such as type, scale,
transformation methods
MASTER – Processing requirements – filtering, ETL, aggregation,
summarization
DATA • Data Definitions and data dictionaries
GOVERNANCE – Name
– Unit(s) of measure
AND • Data collection and filtering or transforming requirements
MANAGEMENT – Sources – internal and external
– Context addition/filtering requirements
• Data integration specifications
– Multiple platforms and applications
– Mapping to intermediate data marts
• Privacy requirements
– Personal Identifying data
– Laws: HIPPA, Privacy act
16. BUSINESS INTELLIGENCE ELEMENTS
• Data Structures
– “Structured” Data , principally text and
Data numbers capable of incorporation in relational
or dimensional tables
Structures – “Unstructured” Data, not suitable for relational
and tables, many in newer data formats
Attributes • Big Data Attributes
Are Critical – Both “structured” and “unstructured”
– The four major “Vs” of big data
Drivers
• Volume - huge
• Velocity – fast changing, unlike structured
• Variety – format and content
• Variability – lacks the consistency of structured
data
17. BUSINESS INTELLIGENCE ELEMENTS
• Content Structure – Traditional Financial Data
– Numerical
Data – Sign/Debit or Credit
Structures – Text Descriptions
• Database Management Structures
IT – Legacy Systems: Hierarchical and Network
Lingo – Transaction Systems: Relational
• Relations [Tables]. Attribute [columns], Instance [Rows]
• Rules: no duplicate rows; single value for attributes
– Warehouse Systems: Dimensional
• Facts [data items, usually a dollar amount or unit count]
• Measures – dollar or count for facts
• Dimensions – groups of hierarchies and descriptors of
various aspects or context for the facts/measures
• Microsoft Office and Similar File Formats
• Photography and Art
18. Business Intelligence Elements
RELATIONAL
TABLE
ILLUSTRATION
“Tuple” is borrowed from mathematics
and set theory and is used in database
design to refer to the attributes of an
“item” or “value” [row], the subject or
title of the table. Value examples include
customers, vendors, orders, product SKUs
20. BUSINESS INTELLIGENCE ELEMENTS
• Numbers and words/letters
– Relational/Dimensional
– Spreadsheets
– Word Processing documents
DATA • Sound and Music
• Photo
FILE • Video
TYPE • Video Game
• CAD Design
CATEGORIES, • Graphical
ALMOST – PDF
– Raster, Vector Graphics
ENGLISH
– Statistical Visualization
• Scientific
• Signal
• XML [Web based mark-up formats]
• Geo-Location
• Web Logs
21. BUSINESS INTELLIGENCE ELEMENTS
• Collection
– Company transaction/ERP systems
– Purchased, such as Nielsen, IRI
DATA – Vendor supplied, such as bank transactions
COLLECTION • Filtering
AND – Adding context such as date or location
– Eliminating “chatter” from high volume data
PREPARATION – Error correction
• Aggregation & Integration
25. BUSINESS INTELLIGENCE ELEMENTS
• Relational – SQL
• Dimensional – SQL, OLAP
DATA • Binary Large Object [BLOB] – binary data,
BASE most often photos, video, audio, or PDF files
FILE • Massively Parallel-Processing [MPP]
SYSTEMS • Apache Hadoopp Distributed File System
[HDFS] – Java
– Google File System [GFS], used solely by Google
– Google Map Reduce
• Amazon S3 filesystem [used by Amazon]
• NoSQL
• Resource Description Framework [RDF]
Databases, like Big Data
26. BUSINESS INTELLIGENCE ELEMENTS
• Significant Originators
– Google MapReduce
– Google File System [GFS]
SELECT – Amazon S3 filesystem
BIG DATA • Continuing Developments
DATABASE – Apache Software Foundation
MANAGEMENT • Apache Cassandra distributed database management
system
SYSTEMS
• Apache Hadoop software framework to support
data-intensive distributed applications
• Apache Hive, a data warehouse structure built on
Hadoop
• Pig - high level programming language for creating
MapReduce programs with Hadoop
– Significant to Technology Development
• Facebook
• Yahoo
• LinkedIn [Project Voldemort]
27. BUSINESS INTELLIGENCE ELEMENTS
• Convergence aspect of mainframes and
servers
COMPUTER • Massively parallel , multiple server,
HARDWARE distributed processing, in multiple data
CONSIDERATIONS centers – grid computing
• Multi-core , high capacity, lower power
consumption, CPUs
• Memory servers for RAM employing
DRAM comprised of Fully Buffered Direct
Inline Memory Modules [FBDIMM]
• Solid state flash drive storage
• Greatly improved., and less costly, hard
drive storage
29. BUSINESS INTELLIGENCE ELEMENTS
• Data Storage Terminology
– Memory – CPU direct connected, often called RAM
– Storage – not directly connected to the CPU
DATA
• Data Storage Device Types
STORAGE – Memory
HARDWARE/ • DRAM – based
• Flash memory – based Solid-State Drives [SSDs]
SOFTWARE
– Storage
• Hard Disk Drives [HDD]
• Optical Drives – CDs, DVDs
• Data Storage Systems
– Direct Attached
– Network Attached Storage [NAS]
– Storage Area Network [SAN]
– pNFS – Parallel Network file systems
30. BUSINESS INTELLIGENCE ELEMENTS
• Traditional Reporting Systems
– ERP systems, including extract and presentation tools
– Downloads to Excel and similar programs for analysis
using functions and pivot tables
BI • Presentation Tools
APPLICATION • Specialized Analytics
SOFTWARE – IBM InfoSphere BigInsights and InfoSphere Streams
– IBM Netezza
– ParAccel Analytic Database
– EMC Greenplum
– SAS High Performance Computing
– Information Builders WebFocus
• Exploratory Tools, like IBM SPSS [originally Statistical Package
for the Social Sciences]
– Data mining with specialized algorithms
– Statistical analysis and related charting software
31. BUSINESS INTELLIGENCE ELEMENTS
• BI Reporting
• Predictive Analytics
ADVANCED
ANALYTICS • Data Exploration - correlation
APPLICATION • Data Visualization - graphical
TYPES • Instrumentation Analytics
• Content Analytics
• Web Analytics
• Functional Applications
• Industry Applications
36. • Sales and Operations Planning
• Financial Instruments Modeling
• Production Control
• Online Retail
• Economics and Policy Development
SELECTED
• Agriculture/Farming
EXAMPLES
• Weather Analysis/Prediction
OF USES
• Environmental Impact Assessment
• Healthcare Diagnosis and Records Management
• Genomic Analytics and Pharmaceutical and Medical
Research
• Natural Resource Exploration
• Research Physics
• Road, Rail Traffic Management
• Security Surveillance
• Astronomy
• Logistics Management, Including GPS Tracking
• Electrical and Telecommunications Grids Mgmt
• Social Media – Facebook, LinkedIn, Google+, Twitter,
YouTube, Pinterest
• TV shows – Star Trek, Person of Interest
37. • Retail
– Amazon
– Dell
– Delta Sonic Car Washes
• Data Services
– IBM
SELECTED – Google
USERS – Amazon
• Financial Services
• Manufacturing
– McCain Foods – Frozen foods
– Boeing
• Transportation and Logistics
– Logistics – UPS, FedEx
– Rail – UP, CSX, TTX
– Air – United, AMR, Southwest
• Social Media
– LinkedIn
– Facebook
• Medicine and Health
– Center for Disease Control (CDC)
– J. Craig Venter Institute
• Science
– Livermore Labs
38. SELECTED EXAMPLES OF USE
• Technical Elements
– Direct on-line access
AMAZON – Amazon specialized “Big Data”
database
– Distributed and extremely large data
centers
– Highly automated, high technology
warehouses
– High supplier [vendors] integration
• User Benefits
– Favorable prices
– Suggested associated purchases
– Individual interest advertising
39. SELECTED EXAMPLES OF USE
• Technical Elements
– Web driven order entry and custom
DELL purchase configuration
– Tracking of sales correspondence with
promotional offers
– Supplier re-order integration
• User Benefits
– Ability to customize purchase
– Reasonable cost
– Prompt delivery
40. SELECTED EXAMPLES OF USE
• Technical components
– Shared component and assembly designs
BOEING – More detailed quality specifications and
product tolerances
– Control of assembly schedule
– “Real time” exchange of technical
information
– Dissemination of best practices
• Customer benefits
– Faster deliveries
– Increased product quality
– Reduced defects
41. SELECTED EXAMPLES OF USE
• Techniques employed
– Collect cellphone and GPS signals, traffic
NEW cameras, and roadside sensors
JERSEY – Identify accidents, traffic jams, and road damage
DEPARTMENT
– Emergency vehicles can be dispatched
OF
– Update traffic websites
TRANSPORTATION
– Sends messages to drivers’ GPS devices and
cellphones
– Uses supercomputers running Intrix application
• Benefits
– Eliminates traffic congestion faster
– More timely relief for accident victims
– Facilitate road paving scheduling
42. SELECTED EXAMPLES OF USE
• Technical Elements
– General LinkedIn Structure
• Personal Profile
LINKEDIN • Individual Connections
• Groups
• Company Searches
• Questions and Answers
– Attached application partners
• Amazon – Reading List
• Slideshare
• User Benefits
– Networking with professional contacts
– Personal branding capabilities
– Business Development
– Job Search enhancement
45. TRENDS
• More, bigger, faster – big data gets bigger
• Cloud services continue to expand
• Mobile computing expands
• Hadoop becomes more common
• Interactive data visualization will expand
• Social media type platforms will increase
their prominence
• Analytics skills demands will increase
46. RESOURCES
• Books
• Competing on Analytics, Davenport & Harris
• Analytics at Work, Davenport, Harris, & Morison
• The Data Asset, Fisher
• Data Strategy, Adelman, Moss, Abai
• Websites
• The Data Warehouse Institute – tdwi.org
• IBM data analytics: www.ibm.com, smarter planet