6. OUR
PERSPECTIVE
Big Data is RELATIVE not ABSOLUTE
Big Data
When volume, velocity and variety of data exceeds an
organization’s storage or compute capacity for accurate
and timely decision-making
7. Going big KEY DIMENSIONS
Analytics
Which kind?
Data
Lifecycle
Speed / Granularity
@ “Scale”
Decisions
9. Analytics ESTABLISHING DIFFERENTIATION
Reactive Proactive
Alerts Optimization
OLAP Predictive Modeling
Ad Hoc Reports Forecasting
Standard Reports Statistical Analysis
10. THE ANALYTICS LIFECYCLE
IDENTIFY /
FORMULATE
BUSINESS EVALUATE / BUSINESS
PROBLEM
MANAGER MONITOR DATA ANALYST
RESULTS PREPARATION
Data Exploration
Domain Expert Data Visualization
Makes Decisions Report Creation
Evaluates Processes and ROI
DEPLOY
MODEL DATA
EXPLORATION
IT SYSTEMS / DATA
VALIDATE
MANAGEMENT SCIENTIST
MODEL TRANSFORM
Model Validation & SELECT
Model Deployment BUILD Exploratory Analysis
Model Monitoring Descriptive Analytics
MODEL Predictive Modeling
Data Preparation
11. Big Data and
Analytics KEY CONSIDERATIONS
Analytics
Data
Platforms
The “Cloud”
Mobile
High Performance Analytics
12. TRENDS IN BIG DATA, STORAGE
TRENDS AND THE COST CONSIDERATIONS
COST PER TERABYTE COST PER GIGABYTE
COST OF STORAGE, MEMORY, COMPUTING
In 2000 a GB of Disk $17 today < $0.07
In 2000 a GB of Ram $1800 today < $1
In 2009 a TB of RDBMS was $70K today < $ 20K
13. TRENDS IN PROCESSORS
Dude!
Where is my 20ghz processor?
14. TRENDS IN PLATFORM DESIGNS
Multi-socket, Commodity Chassis of blades
multi-core platform blade
Grid computing environments and multicore processors are increasingly cost effective
2 X {12,16} core 2.2 GHz processor
{64,96,128,256} GB Ram
2 X {200,300,600,900} GB drives
Rack of 48 blades, (1152, 1536} cores
Performance is gained by breaking work into tasks that can be done in parallel by nodes or processes
16. Common Factors in Analytical
Problems
Large data volumes needing
• Flexible models
• Powerful algorithms
• Effective visualization techniques
• Easy deployment to enable wider access to the power of
analytics
a) Breadth - Full range of Analytics technology and organizational competencies b) Depth - Functional business processes, industry domain knowledge and depth of algorithms/techniques – need new innovative algorithms from Wayne c) Percentage of R&D Employees with PhD: 9.3% (as of July 2012) analytics, (tools, business solutions)
Big Data constitutes: Volumes - Growing volumes of data and how much data need to be processed within a time window Variety - includes structured tables, documents, e-mail, metering data, video, image, audio, stock ticker data, and more. Velocity - How fast data is produced and processed to meet demand. Ability to respond once a problem or opportunity is detected. A data environment can become extreme along any of the dimensions or combination of two or all of them at once. Hence it is important to determine and evaluate “relevant” data to answer the complex set of questions you have before they become obsolete. SAS role here – help determine what is relevant and what is not! In any given situation whether you are looking at pole top transformers, coal fired turbines, … or oil drilling equipment 6000 below sea level, or coupon redemption rate at the local grocery store Big data in and of itself is not that interesting. Good data management practices is the answer to managing big data. But the only way you can leverage “big data” for valuable insights is by using game changing analytics from SAS. Opportunity for you to excel in your market … or a Ball and Chain to hold you back if you don’t embrace it effectively. ** relatve how the values are relative and vary by customer ** share examples of yours: Global Oil and Gas company Marketing analytics Service provider to mfg, cpg, and retail TRANSITION – so how do you thrive in big data … solid process, and leverage the right technology … ANALYTICS, ANALYTICS, ANALYTICS Text below is from the Jim Davis analytics video on YouTube Overwhelming amount of data today. And different types of data -- data structured in databases, and then unstructured like voice, video and text. Some call it the data deluge, other say we are drowning in data. We don’t need to look at it that way Look at data as opportunity Now we may be comfortable making decisions based on gut feel, but that’s not going to cut it [in the smart grid era] The stakes are much higher now. We’ve got to make decisions based on facts. How do we do that? Easy. Analytics can and should be the differentiator.
"Big data" is a popular term generally used to acknowledge the exponential growth, availability and use of information (structured and unstructured). A lot has been written lately on big data trend and how it will become a key basis of competition, innovation, and growth. How does SAS define or view the term “Big Data”? Big data is a relative term (and not an absolute term) - when an organization’s ability to handle, store and analyze data (from a volume, variety and velocity perspective) exceeds its current capacity (i.e. beyond your comfort zone) then it would qualify of having a “big data” problem. ** Don’t need to give in just because it is something outside of your comfort zone, there are plenty of practical solutions … and steps to take to get business value.
1.0 Fundamental set or types of Analytics – which are core to our business and our analytical applications 2.0 Customers use a combination of analytical techniques – for example data mining and text mining. 3.0 On the front-end Data Management is important because end users spend lot of time and effort in preparing data for analytics. 4.0 On the downstream-end, sharing of analytical insights through easy-to-use visualization/BI tools is important. 5.0 Integrated set of components
1.0 Iterative (Discover to Model to Deploy to Monitor to Discover…..) 2.0 Interactive (i.e. different types of users play important role in different stages of the life cycle) 3.0 SAS provides products to address needs in each step of the analytical life cycle 4.0 SAS helps to move customers along the analytical maturity curve (e.g. level 2 to level 3)
MM: clean up and improve
Grid computing environments and multicore processors are increasingly cost effective Performance is gained by breaking work into tasks that can be done in parallel by nodes or processes Reliance on Massively Parallel Processing (MPP) architecture Commodity Hardware In-memory processing co-located with data Typical blade server (2011) 2 x 12 core 2.2 GHz processor 64 GB Ram, 2 x 200 GB drives Rack of 48 blades, 1,152 cores 3 TB of memory, 28 TB of storage
New methods and techniques are being advanced from industry as well as academia. Solutions to these complex problems often span across multiple analytical disciplines and industry domains.
We don’t want the amount or kind of data to limit the analytics you can do
Big data: The next frontier for innovation, competition and productivity Big Value – can only be realized with Big Analytics