Enterprises are beginning to consider the deployment of data science and data warehouse platforms on hybrid (public cloud, private cloud, and on premises) infrastructure. This delivers the flexibility and freedom of choice to deploy your analytics anywhere you need it and to create an adaptable and agile analytics platform.
But the market is conspiring against customer desire for innovation...
Leading public cloud vendors are interested in pushing their new, but proprietary, analytic stacks, locking customers into subpar Analytics as a Service (AaaS) for years to come.
In tandem, Legacy Data Warehouse vendors are trying to extend the lifecycle of their costly and aging appliances with new features of marginal value, simply imitating the same limiting models of public cloud vendors.
New vendors are coming up with interesting ideas, but these ideas are often lacking critical features that don’t provide support for hybrid solutions, limiting the immediate value to users.
It is 2017—you can, in fact, have your analytics cake and eat it too! Solve your short term costs and capabilities challenges, and establish a long term hybrid data strategy by running the same open source analytics platform on your infrastructure as it exists today.
In this webinar you will learn how Pivotal can help you build a modern analytical architecture able to run on your public, private cloud, or on-premises platform of your choice, while fully leveraging proven open source technologies and supporting the needs of diverse analytical users.
Let’s have a productive discussion about how to deploy a solid cloud analytics strategy.
Presenter : Jacque Istok, Head of Data Technical Field for Pivotal
https://content.pivotal.io/webinars/jul-20-how-to-build-modern-data-architectures-both-on-premises-and-in-the-cloud
9. User Centered Design
“A design approach that supports the entire development
process with user-centered activities, in order to create a
product that is easy to use and of added value to the
intended users.”
www.usabilitynet.org
11. Users
Different Users Want Different Things
IT
● Tasked with legacy
system integration
● Controls security access
to comply with policy
and laws
● Operationalization
● Enterprise Architecture
Developers
● Build applications to
interoperate
● Develop reports and
dashboards
● Extract and Transform
data
Business Analysts
● Subject Matter
Experts
● Primary consumer of
analytical models
● SQL or BI expert
Data Scientists
● Mathematically astute
● Intellectual curiosity,
analytical exploration
● Domain Knowledge
● Communication in the
form of visualization
● SQL and analytical
libraries expert
17. Linear Systems
• Sparse and Dense Solvers
• Linear Algebra
Matrix Factorization
• Singular Value Decomposition (SVD)
• Low Rank
Generalized Linear Models
• Linear Regression
• Logistic Regression
• Multinomial Logistic Regression
• Ordinal Regression
• Cox Proportional Hazards Regression
• Elastic Net Regularization
• Robust Variance (Huber-White),
Clustered Variance, Marginal Effects
Other Machine Learning Algorithms
• Principal Component Analysis (PCA)
• Association Rules (Apriori)
• Topic Modeling (Parallel LDA)
• Decision Trees
• Random Forest
• Conditional Random Field (CRF)
• Clustering (K-means)
• Cross Validation
• Naïve Bayes
• Support Vector Machines (SVM)
• Prediction Metrics
• K-Nearest Neighbors
Descriptive Statistics
Sketch-Based Estimators
• CountMin (Cormode-Muth.)
• FM (Flajolet-Martin)
• MFV (Most Frequent Values)
Correlation and Covariance
Summary
Utility Modules
Array and Matrix Operations
Sparse Vectors
Random Sampling
Probability Functions
Data Preparation
PMML Export
Conjugate Gradient
Stemming
Sessionization
Pivot
Path Functions
Encoding Categorical Variables
Inferential Statistics
Hypothesis Tests
Time Series
• ARIMA
May 2017
Graph
• PageRank
• Single Source Shortest Path
Native Interfaces
Machine Learning, Statistical, Graph, Path Analytics
18. Designed for very large graphs
(billions of vertices/edges)
No need to move data and
transform for external graph
engine
Familiar SQL interface
Algorithms:
• All pairs shortest path*
• Breadth first traversal*
• Connected components*
• Multiple graph measures*
• PageRank
• Single source shortest path
Native Interfaces
Graph Analytics
19. Native Interfaces
Programmatic
• Current Computing Interfaces
• User Defined Types
• User Defined Functions
• User Defined Aggregates
• Foundational work for containerized
Python and R compute environments
+ +
25. Analyze, interact, and engage with diverse data sources, localities and temperatures
Real Separation of Compute and Data Source
Hadoop Data Lakes
The image
cannot be
displayed. Your
Public Cloud Data Lakes HybridLocal
Massively Parallel
Analytics Environment
26. Spring Cloud Data Flow is a Microservices
toolkit for building data integration and
real-time data processing pipelines.
The Data Flow server provides interfaces to
compose and deploy pipelines onto onto
modern runtimes such as Cloud Foundry,
Kubernetes, Apache Mesos or Apache
YARN.
Spring Cloud Data Flow (SCDF)
Ingest - Route - Filter - Enrich
27. Apache Kafka and SCDF
Data Feeds
Integrated Data Ingest layer
SCDF
(Cloud ETL 2.0)
29. Run Your Analytics Anywhere
On-Premises Private Cloud Public Cloud
• Infrastructure Agnostic: A portable, 100% software solution
• Same platform, no switching/migration cost
30. ANALYTICAL
APPLICATIONS
NATIVE INTERFACES
MULTI-
STRUCTURED DATA
SOURCES &
PIPELINES
Structured Data
JDBC, ODBC
SQL
ANSI SQL
USERS
FLEXIBLE
DEPLOYMENT
Local
Storage
Other
RDBMSes
SparkGemFire
Cloud
Object
Storage
HDFS
JSON, Apache AVRO, Apache Parquet, XML, & More
Teradata SQL
Other DB SQL
Apache MADlib
ML/Statistics/Graph
Python. R,
Java, Perl, C
Programmatic
Apache SOLR
Text
PostGIS
GeoSpatial
Custom Apps BI / Reporting Machine Learning AI
IT Dev
Business
Analysts
Data
Scientists
On-Premises
Public
Clouds
Private
Clouds
Fully
Managed
Clouds
MODERN CLOUD
ANALYTICS PLATFORM
KafkaETL
Spring
Cloud
Data Flow
Massively
Parallel
(MPP)
PostgresSQL
Kernel
Petabyte
Scale
Loading
Query
Optimizer
(GPORCA)
Workload
Manager
Polymorphic
Storage
Command
Center
SQL
Compatibility
(Hyper-Q)
Modern Cloud Analytics Platform
32. FRAUD MANAGEMENT RISK MANAGEMENT
CYBERSECURITY MANUFACTURING
PREDICTIVE MAINTENANCE
ELECTRICITY GRID
Pivotal Greenplum: Not just a Database
An Analytics Solution for every challenge
33. Pivotal Greenplum: Learn More
Find out more about Pivotal Greenplum at
https://pivotal.io/pivotal-greenplum
OR learn more about the open source at
http://greenplum.org/
OR give it a try yourself at
Amazon AWS or Microsoft Azure or via Download