SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Semantic ‘Radar’
Steers Users to Insights
in the Data Lake
By infusing information with intelligence,
users can discover meaning in the digital
data that envelops people, organizations,
processes, products and things.
2 KEEP CHALLENGING September 2014
Executive Summary
In the past several years, we’ve seen Amazon grow to nearly $75 billion in
sales without operating a single storefront, in part by using information
about customers’ past purchases to recommend new offers to them. More
recently, Airbnb managed 10 million stays in 2013 without owning a single
hotel; instead, it encourages members to offer space in their own homes to
rent out to others and publishes online ratings of guests and hosts.
These are only two examples of how businesses are transforming customer
experiences, establishing new business models and redefining their markets.
At the heart of their success is an innate ability to continuously understand
and manage the digital data generated by people, organizations, processes,
products and things1
— what we call a Code Halo™.
In an increasingly “data-rich, meaning-poor” business world, companies
everywhere are attempting to learn the value of signal and the cost of noise
when coming to grips with their big data and analytics initiatives. Extracting
meaning from this data empowers companies, brands and employers
to better understand and meet customer needs, and to create more
personalized experiences for them. Doing so requires new tools to manage
and understand the torrents of data that will come from the projected tens
of billions of devices as the “Internet of Things” links everything from cars
to clothing.2
It also requires algorithms that filter data “noise,” generate
new insights and create personalized and enriching experiences that help
Code Halo pioneers like Pandora and Netflix treat us as if we were individual
markets of one.
Combining streams of data from different sources into a “data lake”3
can
help steer business decision-makers to the insights they need more quickly
and efficiently. At its core is high-performance, low-cost data storage
technology to store and manage frequently used, large volumes of data.
Multiple analytics engines speed the delivery of a personalized experience
to each customer at the point of interaction. Semantic models standardize
the definition and mapping of internal and external data. Domain expertise,
SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 3
such as knowledge of which patterns in the data represent risk or
opportunity, is embedded in the semantic models4
to guide business users
to insights. Finally, self-service features help business users rapidly find
and analyze data without delay.
This white paper highlights the challenges that organizations face when
using today’s data architectures to find and act on business-changing
big data insights. It describes how a “data lake“ can help, and the central
role that semantic technology plays by slashing the time required to find,
integrate, share and analyze the data. It finally describes how enterprises
can begin creating their own data lakes.
4 KEEP CHALLENGING September 2014
Today’s Information Architectures:
Too Slow, Expensive and Hard to Understand
A data lake eliminates many of the roadblocks that now stand between business
users and business insights. Among these are:
•	Onboarding and integrating data is slow and expensive. Creating personalized
customer experiences and tracking complex markets requires collecting, inte-
grating, analyzing and responding to data from many different sources. However,
the ETL (extract, transform and load) function requires custom processes and
technologies for each source system, causing delays when skilled staff is busy on
other projects.
•	Optimizing data for different users is difficult and costly. Departments such
as finance, sales and manufacturing may organize and define data differently.
Finance, for example, may organize customers based on their annual purchases
or credit ratings, while sales organizes them by geography, and manufacturing by
the products they purchase. The cost and delay of building data marts tailored
to the needs of each unit can make it impossible to quickly uncover vital insights.
•	Moving and reorganizing data introduces delays and cost. Data storage is be-
coming steadily less expensive, but it still represents a capital and operational ex-
pense. Hampered by insufficient funding and the high costs of onboarding data,
organizations often wait until there is a defined need for a set of data before
loading it into an analytics database. By then, a competitor may have already
captured the data and acted on the analysis.
•	Data provenance and meaning is often lost in translation. Before knowing
what data to query, a user must understand what that data represents and how
it is used. One example: “This number (X) represents defect rate per thousand,
which we use to justify our premium pricing for a customer.” As data knowledge
is transferred from expert to designer to developer, meaning is often replaced
with technical syntax and rules that the business user cannot understand, effec-
tively removing all-important context from “corporate memory.” Maintaining the
provenance and meaning allows the data to tell a story that matters to a real-life
decision-maker.
•	Who transformed what data, and why? In order to ensure that users aren’t
making decisions based on faulty data, it is essential to understand who changed
the format of the data, cleansed it to eliminate inconsistencies or changed values
to comply with standards. Such management artifacts are often tracked only in
spreadsheets and don’t capture the latest changes made to live data. This prob-
lem extends to tracking changes to data after it is onboarded, increasing the
costs and complexity of performing data analysis.
•	Models are too rigid or incomplete. Excessive standardization of the vocabulary
used to define and organize data may make it harder to conduct specific types of
analysis. On the other hand, the lack of a model makes it difficult or impossible
to identify what data is available, how to access it and how to integrate it with
other data. That means delays when users must search for the required data, de-
termine who owns it, request permission to access it, and integrate it with other
data.
•	The speed of change. With every new competitor, product offering, business
model or distribution channel, an organization must rapidly assimilate and query
new types of data, organized and optimized in new and innovative ways. Lengthy
data design and standardization processes are not effective in today’s business
environment.
Business Needs Traditional Technologies
and Practices
Data Lake Technologies
and Practices
Onboard
new data
Load data
Connect external
data to the data
repository
Organize data for
specific purposes
Discover data
and data
relationships
SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 5
All of these issues are solvable with a “smart” data lake strategy. Figure 1 depicts
best practices that overcome key points of friction that organizations encounter
when using traditional approaches.
Data Lake Essentials
Filling the data lake to generate unique customer experiences and unlock market
insights requires new processes and technologies across the data and query
management lifecycle (see Figure 2, next page). At each step, semantic technology
plays a central role (see sidebar, page 7).
•	Data ingestion: This involves finding and importing the data needed for Code
Halo analysis. Structured and unstructured data can be collected through tradi-
tional scheduled batch loading, data streaming or real-time, on-demand queries
for internally- and externally-hosted data.
Data is prioritized based on the data’s potential ability
to drive insights.
Flexible data models can be revised or extended
without redesign of the database.
Agile, evolutionary refinement of the data organiza-
tion makes it easier to use the data and leverage new
insights. A catalog of data assets captures all details of
onboarded data.
Model-based ETL processing is defined and generated
automatically, with data loaded unfiltered and untrans-
formed. Self-service features speed time to business
value.
A common model maps to all internal and external
data assets. Frequently queried data is collected and
loaded to provide predictable query performance.
Internal and external data is queried through common
methods, with real-time values delivered at query time.
Automation facilitates data organization and syn-
chronization within the repository. All data remains
connected to data definitions, sources and provenance,
and it is updated automatically. Automation manages
data retention.
Semantic search helps end users ask questions about
the data as needed to improve their understanding of
it. Data discovery tools continuously mine and analyze
the data for currently unknown data relationships and
recommend updates to the models that increase their
business value.
Data is prioritized, based on the current
business case for analytics.
Comprehensive analysis creates rigid
structures that are difficult to change.
Minimal definition of data organization
during onboarding requires users to
have a detailed understanding of data
contents.
Comprehensive, custom ETL processes
capture, cleanse and load the data into
the data repository.
Source data is collected and loaded
into the data repository. The data is
refreshed on a scheduled frequency.
Data is copied to a separate region of
the data repository, with little documen-
tation, organization or management.
Data silos proliferate.
Users must create projects to mine and
analyze data for currently unknown data
relationships.
The Data Lake Defined
Figure 1
6 KEEP CHALLENGING September 2014
Organizations can quickly analyze the data needed without loading every con-
ceivably useful bit of data beforehand. The combination of semantic models and
automated data movement greatly reduces the need for manual coding of ETL
processes by defining all the available data, how it is organized, where it is located
and what data movement/transformations must be performed. During ingestion,
the data is semantically tagged and categorized so it can be easily identified for
future needs.
Pre-modeling also lets business users perform self-service data mapping and
queries of new data that will only be needed for a short time or that are awaiting
integration with formal models. They can then use this data for rapid, on-demand
Code Halo analysis to meet urgent needs without waiting for help from IT.
Data Lake Components
Figure 2
Data Management
DDDaaatttaaa LLLaaakkkeee MMMaaannnaaaggggeeemmeenntt
Semantic
GraphHPC
Columnar
In-memory
Linked Data
Streaming
NoSQL Map
Reduce
HDFS Storage
Desktop
and Mobile
Social Media
and Cloud
Operational
Systems
Internet of Things
ManagementData Ingestion
Model-driven
Models Governance Workflow
Access/
Security
Data Assets Catalog
Semantic
Tagging
On-demand
Query
Scheduled
Batch Load
Self-service
Streaming
On-demand
Query
SScchheedduulleedd
Batch Load
Structured and
Unstructured Data
Data
Movement Provenance Semantic Search
Data Discovery
Analytics Directed to
the Best Query Engine
Capture, Embed
and Share
Analytics Expertise
Query Data, Metadata
and Provenance
geeeemmmmeeeeeennnnnttt
eemmmeent
sets Cata
n
e
t
c Sea
o
n
scov
ed
gine
D
uer
s D
is
C
a
ics
t
e
nd S
beb
rti
The intelligent data management layer directs
the model-driven movement of data, while
provenance capabilities capture and track details
about its ingestion, transformation, movement
and the queries run against it.
Quick Take
A smart semantics model can help organizations make
use of their Code Halos by capturing meaning, as well
as the related domain expertise, from structured,
unstructured or semi-structured data. The building
blocks for such a model are standards and technolo-
gies, such as:
•	The Resource Definition Framework (RDF), which
organizes data in a graph structure, reducing devel-
opment time and cost while delivering business
value sooner.
•	The Web Ontology Language (OWL), which
provides a comprehensive model of data definitions
and relationships that is human- and machine-
readable.
•	The SPARQL Query Language, a SQL-like query
language for semantic data that can leverage onto-
logical relationships to execute smarter questions
across multiple databases in a single query.
•	Inferencing, which makes it easier for users to
construct queries by capturing and embedding
expertise in the ontology model.
Semantics Maps the Data Lake
SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 7
By performing a semantic analysis of the data during data loading, organizations
can capture an understanding of the content, which makes it easier to embed
intelligence in the data. This intelligence might include which characteristics
or patterns indicate the likelihood that a customer will “churn” from one cable
provider to another, or that a homeowner is at increased risk of falling behind in
loan payments. A trigger can generate an offer to customers that meets their
needs and enhances the relationship.
•	Data management: Data lakes benefit from modern, low-cost, high-performance
storage technologies, such as the Hadoop Distributed File System (HDFS), that
provide high availability and automated failover. Specialized database and ana-
lytic engines, such as high-performance computing, NoSQL and semantic, can be
configured to speed Code Halo analytics at the lowest possible cost.
The intelligent data management layer directs the model-driven movement of
data, while provenance capabilities capture and track details about its ingestion,
transformation, movement and the queries run against it.
The data management layer supports the various engines required for different
types of analytics, including future technologies, as an organization’s use of Code
Halos matures. It also allows the engines to access and analyze data directly from
high-performance storage, eliminating the need for custom ETL processes.
•	Data lake management: Like any other corporate resource, the data lake re-
quires management and governance to ensure that it operates as efficiently,
consistently and securely as possible. The management layer directs data in-
gestion and movement workflows while managing access and balancing perfor-
mance, cost and security to deliver the best results at the lowest cost.
The model manager allows analysts to embed expert knowledge, such as a
complex calculation or set of logic, into the ontology model for use by non-
experts. For example, a researcher can easily share a new risk-assessment model
for home loans with underwriters in the field. A genomics specialist can share a
model that maps new biomarker indications to patients’ genetic backgrounds to
help researchers at clinical trial sites test a new cancer treatment.
8 KEEP CHALLENGING September 2014
Proper governance is required to ensure the data is accurate and consistent and
that it complies with organizational and industry standards. This should cover
both master and reference data, and take advantage of industry standards, such
as the Financial Industry Business Ontology (FIBO)5
in banking/financial services
and CDISC/RDF6
in life sciences. The model manager captures governance-based
standards to ensure consistency across models that reference shared data.
Business units will organize views of required data that include enterprise- and
business unit-specific data. The business unit-level data models will inherit the
definitions of enterprise standard models while adding business unit-specific
data relationships and terminology (see Figure 3).
The model manager enables organizations to construct data views to support a
specific analytic or class of analytics, without the cost of building data marts or
custom data movement processing. By capturing the details of data relationships
and data movement that directly support the analytics process, organizations
can enable automation that reduces delays and costs.
The data catalog manager captures the details of both internal and external
data assets, describing the content, definitions and organization of the data and
supporting multiple views of shared data. End-users can define workflows for
scheduled data movement.
Different Data Models for Different Users
Figure 3
Industry
Ontology
North America
Market Ontology
Enterprise
Ontology
International
Market
Ontology
The model manager enables organizations to
construct data views to support a specific analytic or
class of analytics, without the cost of building data
marts or custom data movement processing.
SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 9
Access management features help users find data assets available in the data
lake, automate registration and access to new data assets, and provide secure
access to the data. Monitoring functions ensure that everything from the storage
technology to the analytics and reporting functions is working properly, with
tools to support intervention when necessary.
•	The query management layer: This helps users understand what data is avail-
able and is most likely to deliver the richest Code Halo insights. Semantic search
uses natural language queries to analyze the data and metadata to best identify
the data that will support an analytical process. To reduce the time and effort
required to find and model business-critical data, data discovery functions ana-
lyze the data to find and link related data concepts. Optimized data structures
speed analytics, with queries directed to the analytics engine that delivers the
best performance.
The query management layer thus enables “smart” analytics in which users can
define not only which data to use, how to transform it, which analytics engine to
use and which analysis to run, but also what type of chart to plot the results to,
what file format to save the results in and who to mail the results to. Workflow
tools allow users to automate the analytics process.
•	Rules-based data sourcing: This helps users fine-tune their queries using a con-
text such as metadata and provenance. For example, a user could specify a query
that says, “Read the data from the data lake, unless that data is more than X
hours old, in which case it should be read directly from the source system.”
While the data lake is most often used by analysts or business decision-makers,
true Code Halo value is achieved when it responds to a customer’s click with
an immediate, personalized response. This is enabled by features that expose
analytics processes as a Web service, or through an application program interface
(API) that is callable from a Web page or mobile app.
Moving Forward
Building an intelligent semantic model on top of your current information architec-
ture may sound daunting. But it can and must be done, if you hope to discover and
act on the next game-changing insight before your competitors do. You can get
started with these important but gradual changes:
•	Prioritize the onboarding of data by its ability to create truly individualized
customer experiences or business-changing insights into market needs or
operations.
•	Onboard data assets as they become available, without waiting for a specific
use case. Those uses will emerge when you least expect them, and when they do,
you’ll need the data immediately. At a minimum, map the data to the semantic
model for easier access.
•	Load the data without filtering or transforming it, since new data rules may
override old rules. Filtering and transformation rules can be applied as the data
is moved to an analytics engine or during query execution.
•	Model the data using familiar terminology. This makes it easier to change the
model as needed without physically moving the data. Customize models for spe-
cific business groups, encouraging them to ensure its accuracy and complete-
ness.
•	Enable search mechanisms that make it easier for business users to see what
data is available and accessible.
•	Balance legal and compliance needs for security with the imperative to improve
the customer experience through analytics.
10 KEEP CHALLENGING September 2014
Code Halos are powerful sources of industry-changing insights. Too often, this
knowledge is difficult to find within the volume and complexity of the data. A data
lake provides the radar that allows end users to quickly see past the chaos of too
much data and focus on the actionable insights they need.
Footnotes
1	
For more on Code Halo thinking, see Malcolm Frank, Paul Roehrig and Ben Pring,
“Code Rules: A Playbook for Managing at the Crossroads,” Cognizant Technology
Solutions, June 2013, http://www.cognizant.com/Futureofwork/Documents/
code-rules.pdf, or read our book Code Halos: How the Digital Lives of People,
Things, and Organizations are Changing the Rules of Business, by Malcolm Frank,
Paul Roehrig and Ben Pring, John Wiley & Sons, April 2014, http://www.wiley.com/
WileyCDA/WileyTitle/productCd-1118862074.html.
2	
“Gartner Says the Internet of Things Installed Base Will Grow to 26 Billion Units by
2020,” Gartner, Inc., Dec. 13, 2013, http://www.gartner.com/newsroom/id/2636073.
3	
James Dixon, “Pentaho, Big Data, and Data Lakes,” Pentaho, October 2010,
http://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/.
4	
Thomas Kelly, “Semantic Technology Drives Agile Business,” Cognizant Technology
Solutions, August 2013, http://www.cognizant.com/InsightsWhitepapers/How-
Semantic-Technology-Drives-Agile-Business.pdf.
5	
“Financial Industry Business Ontology,” Object Management Group,
http://www.omg.org/hot-topics/fibo.htm.
6	
“Representing Regulations and Guidance in RDF,” Phuse Wiki, http://bit.ly/18WAxxf.
Note: The logos and company names presented throughout this white paper are the
property of their respective trademark owners, are not affiliated with Cognizant
Technology Solutions, and are displayed for illustrative purposes only. Use of the
logo does not imply endorsement of the organization by Cognizant, nor vice versa.
SEMANTIC “RADAR” STEERS USERS TO INSIGHTS IN THE DATA LAKE 11
About the Authors
Thomas Kelly is a Director in Cognizant’s Enterprise Information Management (EIM)
Practice and heads its Semantic Technology Center of Excellence, a technology
specialty of Cognizant Business Consulting. He has 20-plus years of technology
consulting experience in leading data warehousing, business intelligence and big
data projects, focused primarily on the life sciences and healthcare industries. He
can be reached at Thomas.Kelly@cognizant.com.
Robert Hoyle Brown is an Associate Vice President in Cognizant’s Center for the
Future of Work, and drives strategy and market outreach for the Business Process
Services Practice. Robert is responsible for looking at how Code Halos will change
enterprise business process services in the coming years and is a regular contributor
to the blog www.unevenlydistributed.com, “Signals from the Future of Work.” Prior
to joining Cognizant, he was Managing Vice President of the Business and Appli-
cations Services team at Gartner, and as a research analyst, he was a recognized
subject matter expert in BPO, cloud services/BPaaS and HR services. He also held
roles at Hewlett-Packard and G2 Research, a boutique outsourcing research firm in
Silicon Valley. He holds a Bachelor of Arts degree from the University of California
at Berkeley and, prior to his graduation, attended the London School of Economics
as a Hansard Scholar. He can be reached at Robert.H.Brown@cognizant.com.
The authors would like to thank Editorial Director Alan Alper for his insights and
contributions to this white paper.
World Headquarters
500 Frank W. Burr Blvd.
Teaneck, NJ 07666 USA
Phone: +1 201 801 0233
Fax: +1 201 801 0243
Toll Free: +1 888 937 3277
inquiry@cognizant.com
European Headquarters
1 Kingdom Street
Paddington Central
London W2 6BD
Phone: +44 (0) 207 297 7600
Fax: +44 (0) 207 121 0102
infouk@cognizant.com
India Operations Headquarters
#5/535, Old Mahabalipuram Road
Okkiyam Pettai, Thoraipakkam
Chennai, 600 096 India
Phone: +91 (0) 44 4209 6000
Fax: +91 (0) 44 4209 6060
inquiryindia@cognizant.com
© Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to
change without notice. All other trademarks mentioned herein are the property of their respective owners.
About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of informa-
tion technology, consulting, and business process outsourcing
services, dedicated to helping the world’s leading companies
build stronger businesses. Headquartered in Teaneck, New Jer-
sey (U.S.), Cognizant combines a passion for client satisfaction,
technology innovation, deep industry and business process
expertise, and a global, collaborative workforce that embod-
ies the future of work. With over 75 development and delivery
centers worldwide and approximately 187,400 employees as
of June 30, 2014, Cognizant is a member of the NASDAQ-100,
the S&P 500, the Forbes Global 2000, and the Fortune 500
and is ranked among the top performing and fastest growing
companies in the world. Visit us online at www.cognizant.com
or follow us on Twitter: Cognizant.

Weitere ähnliche Inhalte

Was ist angesagt?

Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...Sokho TRINH
 
Reference data management in financial services industry
Reference data management in financial services industryReference data management in financial services industry
Reference data management in financial services industryNIIT Technologies
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overviewnetpeachteam
 
Role of business intelligence in knowledge management
Role of business intelligence in knowledge managementRole of business intelligence in knowledge management
Role of business intelligence in knowledge managementShakthi Fernando
 
The opportunity of the business data lake
The opportunity of the business data lakeThe opportunity of the business data lake
The opportunity of the business data lakeCapgemini
 
What to focus on when choosing a Business Intelligence tool?
What to focus on when choosing a Business Intelligence tool? What to focus on when choosing a Business Intelligence tool?
What to focus on when choosing a Business Intelligence tool? Marketplanet
 
Microsoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewMicrosoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewLi Ken Chong
 
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide ShutFrancisco Calzado
 
Lead to Cash: The Value of Big Data and Analytics for Telco
Lead to Cash: The Value of Big Data and Analytics for TelcoLead to Cash: The Value of Big Data and Analytics for Telco
Lead to Cash: The Value of Big Data and Analytics for TelcoSam Thomsett
 
Business intellegence
Business intellegenceBusiness intellegence
Business intellegenceGaurav Khatri
 
Business Intellegence
Business IntellegenceBusiness Intellegence
Business IntellegenceKallol Sarkar
 
History of Business Intelligence
History of Business IntelligenceHistory of Business Intelligence
History of Business IntelligenceNic Smith
 
Business intellegence erp
Business intellegence erpBusiness intellegence erp
Business intellegence erpSargam Sethi
 
Addressing Storage Challenges to Support Business Analytics and Big Data Work...
Addressing Storage Challenges to Support Business Analytics and Big Data Work...Addressing Storage Challenges to Support Business Analytics and Big Data Work...
Addressing Storage Challenges to Support Business Analytics and Big Data Work...IBM India Smarter Computing
 

Was ist angesagt? (20)

Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
 
Data valuation
Data valuationData valuation
Data valuation
 
Offers bank dss
Offers bank dssOffers bank dss
Offers bank dss
 
Reference data management in financial services industry
Reference data management in financial services industryReference data management in financial services industry
Reference data management in financial services industry
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Role of business intelligence in knowledge management
Role of business intelligence in knowledge managementRole of business intelligence in knowledge management
Role of business intelligence in knowledge management
 
The opportunity of the business data lake
The opportunity of the business data lakeThe opportunity of the business data lake
The opportunity of the business data lake
 
What to focus on when choosing a Business Intelligence tool?
What to focus on when choosing a Business Intelligence tool? What to focus on when choosing a Business Intelligence tool?
What to focus on when choosing a Business Intelligence tool?
 
Microsoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewMicrosoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & Overview
 
Bi orientations
Bi orientationsBi orientations
Bi orientations
 
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
 
Lead to Cash: The Value of Big Data and Analytics for Telco
Lead to Cash: The Value of Big Data and Analytics for TelcoLead to Cash: The Value of Big Data and Analytics for Telco
Lead to Cash: The Value of Big Data and Analytics for Telco
 
Business intellegence
Business intellegenceBusiness intellegence
Business intellegence
 
Business Intellegence
Business IntellegenceBusiness Intellegence
Business Intellegence
 
History of Business Intelligence
History of Business IntelligenceHistory of Business Intelligence
History of Business Intelligence
 
Business intellegence erp
Business intellegence erpBusiness intellegence erp
Business intellegence erp
 
BI Presentation
BI PresentationBI Presentation
BI Presentation
 
Addressing Storage Challenges to Support Business Analytics and Big Data Work...
Addressing Storage Challenges to Support Business Analytics and Big Data Work...Addressing Storage Challenges to Support Business Analytics and Big Data Work...
Addressing Storage Challenges to Support Business Analytics and Big Data Work...
 
Vendor comparisons: the end game in business intelligence
Vendor comparisons: the end game in business intelligenceVendor comparisons: the end game in business intelligence
Vendor comparisons: the end game in business intelligence
 

Ähnlich wie Semantic 'Radar' Steers Users to Insights in the Data Lake

Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeSG Analytics
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Papershashanksalunkhe12
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfCiente
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfCiente
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxhealdkathaleen
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxtodd271
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperExperian
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefLindy-Anne Botha
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
The Second Big Bang
The Second Big BangThe Second Big Bang
The Second Big BangConnexica
 
A&D In Memory POV R2.2
A&D In Memory POV R2.2A&D In Memory POV R2.2
A&D In Memory POV R2.2berrygibson
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsAbhishek Sood
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsAnametrix
 

Ähnlich wie Semantic 'Radar' Steers Users to Insights in the Data Lake (20)

Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Abstract
AbstractAbstract
Abstract
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdf
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdf
 
Data Analytics.pptx
Data Analytics.pptxData Analytics.pptx
Data Analytics.pptx
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docx
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docx
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
The Second Big Bang
The Second Big BangThe Second Big Bang
The Second Big Bang
 
A&D In Memory POV R2.2
A&D In Memory POV R2.2A&D In Memory POV R2.2
A&D In Memory POV R2.2
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 

Mehr von Cognizant

Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...Cognizant
 
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-makingData Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-makingCognizant
 
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional ExperiencesIt Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional ExperiencesCognizant
 
Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition EngineeredCognizant
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...Cognizant
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesCognizant
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateCognizant
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...Cognizant
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Cognizant
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Cognizant
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityCognizant
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersCognizant
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalCognizant
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueCognizant
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachCognizant
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudCognizant
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedCognizant
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the FutureCognizant
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformCognizant
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...Cognizant
 

Mehr von Cognizant (20)

Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
 
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-makingData Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
 
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional ExperiencesIt Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
 
Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition Engineered
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for Sustainability
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for Insurers
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to Value
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First Approach
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the Cloud
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the Future
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 

Semantic 'Radar' Steers Users to Insights in the Data Lake

  • 1. Semantic ‘Radar’ Steers Users to Insights in the Data Lake By infusing information with intelligence, users can discover meaning in the digital data that envelops people, organizations, processes, products and things.
  • 2. 2 KEEP CHALLENGING September 2014 Executive Summary In the past several years, we’ve seen Amazon grow to nearly $75 billion in sales without operating a single storefront, in part by using information about customers’ past purchases to recommend new offers to them. More recently, Airbnb managed 10 million stays in 2013 without owning a single hotel; instead, it encourages members to offer space in their own homes to rent out to others and publishes online ratings of guests and hosts. These are only two examples of how businesses are transforming customer experiences, establishing new business models and redefining their markets. At the heart of their success is an innate ability to continuously understand and manage the digital data generated by people, organizations, processes, products and things1 — what we call a Code Halo™. In an increasingly “data-rich, meaning-poor” business world, companies everywhere are attempting to learn the value of signal and the cost of noise when coming to grips with their big data and analytics initiatives. Extracting meaning from this data empowers companies, brands and employers to better understand and meet customer needs, and to create more personalized experiences for them. Doing so requires new tools to manage and understand the torrents of data that will come from the projected tens of billions of devices as the “Internet of Things” links everything from cars to clothing.2 It also requires algorithms that filter data “noise,” generate new insights and create personalized and enriching experiences that help Code Halo pioneers like Pandora and Netflix treat us as if we were individual markets of one. Combining streams of data from different sources into a “data lake”3 can help steer business decision-makers to the insights they need more quickly and efficiently. At its core is high-performance, low-cost data storage technology to store and manage frequently used, large volumes of data. Multiple analytics engines speed the delivery of a personalized experience to each customer at the point of interaction. Semantic models standardize the definition and mapping of internal and external data. Domain expertise,
  • 3. SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 3 such as knowledge of which patterns in the data represent risk or opportunity, is embedded in the semantic models4 to guide business users to insights. Finally, self-service features help business users rapidly find and analyze data without delay. This white paper highlights the challenges that organizations face when using today’s data architectures to find and act on business-changing big data insights. It describes how a “data lake“ can help, and the central role that semantic technology plays by slashing the time required to find, integrate, share and analyze the data. It finally describes how enterprises can begin creating their own data lakes.
  • 4. 4 KEEP CHALLENGING September 2014 Today’s Information Architectures: Too Slow, Expensive and Hard to Understand A data lake eliminates many of the roadblocks that now stand between business users and business insights. Among these are: • Onboarding and integrating data is slow and expensive. Creating personalized customer experiences and tracking complex markets requires collecting, inte- grating, analyzing and responding to data from many different sources. However, the ETL (extract, transform and load) function requires custom processes and technologies for each source system, causing delays when skilled staff is busy on other projects. • Optimizing data for different users is difficult and costly. Departments such as finance, sales and manufacturing may organize and define data differently. Finance, for example, may organize customers based on their annual purchases or credit ratings, while sales organizes them by geography, and manufacturing by the products they purchase. The cost and delay of building data marts tailored to the needs of each unit can make it impossible to quickly uncover vital insights. • Moving and reorganizing data introduces delays and cost. Data storage is be- coming steadily less expensive, but it still represents a capital and operational ex- pense. Hampered by insufficient funding and the high costs of onboarding data, organizations often wait until there is a defined need for a set of data before loading it into an analytics database. By then, a competitor may have already captured the data and acted on the analysis. • Data provenance and meaning is often lost in translation. Before knowing what data to query, a user must understand what that data represents and how it is used. One example: “This number (X) represents defect rate per thousand, which we use to justify our premium pricing for a customer.” As data knowledge is transferred from expert to designer to developer, meaning is often replaced with technical syntax and rules that the business user cannot understand, effec- tively removing all-important context from “corporate memory.” Maintaining the provenance and meaning allows the data to tell a story that matters to a real-life decision-maker. • Who transformed what data, and why? In order to ensure that users aren’t making decisions based on faulty data, it is essential to understand who changed the format of the data, cleansed it to eliminate inconsistencies or changed values to comply with standards. Such management artifacts are often tracked only in spreadsheets and don’t capture the latest changes made to live data. This prob- lem extends to tracking changes to data after it is onboarded, increasing the costs and complexity of performing data analysis. • Models are too rigid or incomplete. Excessive standardization of the vocabulary used to define and organize data may make it harder to conduct specific types of analysis. On the other hand, the lack of a model makes it difficult or impossible to identify what data is available, how to access it and how to integrate it with other data. That means delays when users must search for the required data, de- termine who owns it, request permission to access it, and integrate it with other data. • The speed of change. With every new competitor, product offering, business model or distribution channel, an organization must rapidly assimilate and query new types of data, organized and optimized in new and innovative ways. Lengthy data design and standardization processes are not effective in today’s business environment.
  • 5. Business Needs Traditional Technologies and Practices Data Lake Technologies and Practices Onboard new data Load data Connect external data to the data repository Organize data for specific purposes Discover data and data relationships SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 5 All of these issues are solvable with a “smart” data lake strategy. Figure 1 depicts best practices that overcome key points of friction that organizations encounter when using traditional approaches. Data Lake Essentials Filling the data lake to generate unique customer experiences and unlock market insights requires new processes and technologies across the data and query management lifecycle (see Figure 2, next page). At each step, semantic technology plays a central role (see sidebar, page 7). • Data ingestion: This involves finding and importing the data needed for Code Halo analysis. Structured and unstructured data can be collected through tradi- tional scheduled batch loading, data streaming or real-time, on-demand queries for internally- and externally-hosted data. Data is prioritized based on the data’s potential ability to drive insights. Flexible data models can be revised or extended without redesign of the database. Agile, evolutionary refinement of the data organiza- tion makes it easier to use the data and leverage new insights. A catalog of data assets captures all details of onboarded data. Model-based ETL processing is defined and generated automatically, with data loaded unfiltered and untrans- formed. Self-service features speed time to business value. A common model maps to all internal and external data assets. Frequently queried data is collected and loaded to provide predictable query performance. Internal and external data is queried through common methods, with real-time values delivered at query time. Automation facilitates data organization and syn- chronization within the repository. All data remains connected to data definitions, sources and provenance, and it is updated automatically. Automation manages data retention. Semantic search helps end users ask questions about the data as needed to improve their understanding of it. Data discovery tools continuously mine and analyze the data for currently unknown data relationships and recommend updates to the models that increase their business value. Data is prioritized, based on the current business case for analytics. Comprehensive analysis creates rigid structures that are difficult to change. Minimal definition of data organization during onboarding requires users to have a detailed understanding of data contents. Comprehensive, custom ETL processes capture, cleanse and load the data into the data repository. Source data is collected and loaded into the data repository. The data is refreshed on a scheduled frequency. Data is copied to a separate region of the data repository, with little documen- tation, organization or management. Data silos proliferate. Users must create projects to mine and analyze data for currently unknown data relationships. The Data Lake Defined Figure 1
  • 6. 6 KEEP CHALLENGING September 2014 Organizations can quickly analyze the data needed without loading every con- ceivably useful bit of data beforehand. The combination of semantic models and automated data movement greatly reduces the need for manual coding of ETL processes by defining all the available data, how it is organized, where it is located and what data movement/transformations must be performed. During ingestion, the data is semantically tagged and categorized so it can be easily identified for future needs. Pre-modeling also lets business users perform self-service data mapping and queries of new data that will only be needed for a short time or that are awaiting integration with formal models. They can then use this data for rapid, on-demand Code Halo analysis to meet urgent needs without waiting for help from IT. Data Lake Components Figure 2 Data Management DDDaaatttaaa LLLaaakkkeee MMMaaannnaaaggggeeemmeenntt Semantic GraphHPC Columnar In-memory Linked Data Streaming NoSQL Map Reduce HDFS Storage Desktop and Mobile Social Media and Cloud Operational Systems Internet of Things ManagementData Ingestion Model-driven Models Governance Workflow Access/ Security Data Assets Catalog Semantic Tagging On-demand Query Scheduled Batch Load Self-service Streaming On-demand Query SScchheedduulleedd Batch Load Structured and Unstructured Data Data Movement Provenance Semantic Search Data Discovery Analytics Directed to the Best Query Engine Capture, Embed and Share Analytics Expertise Query Data, Metadata and Provenance geeeemmmmeeeeeennnnnttt eemmmeent sets Cata n e t c Sea o n scov ed gine D uer s D is C a ics t e nd S beb rti The intelligent data management layer directs the model-driven movement of data, while provenance capabilities capture and track details about its ingestion, transformation, movement and the queries run against it.
  • 7. Quick Take A smart semantics model can help organizations make use of their Code Halos by capturing meaning, as well as the related domain expertise, from structured, unstructured or semi-structured data. The building blocks for such a model are standards and technolo- gies, such as: • The Resource Definition Framework (RDF), which organizes data in a graph structure, reducing devel- opment time and cost while delivering business value sooner. • The Web Ontology Language (OWL), which provides a comprehensive model of data definitions and relationships that is human- and machine- readable. • The SPARQL Query Language, a SQL-like query language for semantic data that can leverage onto- logical relationships to execute smarter questions across multiple databases in a single query. • Inferencing, which makes it easier for users to construct queries by capturing and embedding expertise in the ontology model. Semantics Maps the Data Lake SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 7 By performing a semantic analysis of the data during data loading, organizations can capture an understanding of the content, which makes it easier to embed intelligence in the data. This intelligence might include which characteristics or patterns indicate the likelihood that a customer will “churn” from one cable provider to another, or that a homeowner is at increased risk of falling behind in loan payments. A trigger can generate an offer to customers that meets their needs and enhances the relationship. • Data management: Data lakes benefit from modern, low-cost, high-performance storage technologies, such as the Hadoop Distributed File System (HDFS), that provide high availability and automated failover. Specialized database and ana- lytic engines, such as high-performance computing, NoSQL and semantic, can be configured to speed Code Halo analytics at the lowest possible cost. The intelligent data management layer directs the model-driven movement of data, while provenance capabilities capture and track details about its ingestion, transformation, movement and the queries run against it. The data management layer supports the various engines required for different types of analytics, including future technologies, as an organization’s use of Code Halos matures. It also allows the engines to access and analyze data directly from high-performance storage, eliminating the need for custom ETL processes. • Data lake management: Like any other corporate resource, the data lake re- quires management and governance to ensure that it operates as efficiently, consistently and securely as possible. The management layer directs data in- gestion and movement workflows while managing access and balancing perfor- mance, cost and security to deliver the best results at the lowest cost. The model manager allows analysts to embed expert knowledge, such as a complex calculation or set of logic, into the ontology model for use by non- experts. For example, a researcher can easily share a new risk-assessment model for home loans with underwriters in the field. A genomics specialist can share a model that maps new biomarker indications to patients’ genetic backgrounds to help researchers at clinical trial sites test a new cancer treatment.
  • 8. 8 KEEP CHALLENGING September 2014 Proper governance is required to ensure the data is accurate and consistent and that it complies with organizational and industry standards. This should cover both master and reference data, and take advantage of industry standards, such as the Financial Industry Business Ontology (FIBO)5 in banking/financial services and CDISC/RDF6 in life sciences. The model manager captures governance-based standards to ensure consistency across models that reference shared data. Business units will organize views of required data that include enterprise- and business unit-specific data. The business unit-level data models will inherit the definitions of enterprise standard models while adding business unit-specific data relationships and terminology (see Figure 3). The model manager enables organizations to construct data views to support a specific analytic or class of analytics, without the cost of building data marts or custom data movement processing. By capturing the details of data relationships and data movement that directly support the analytics process, organizations can enable automation that reduces delays and costs. The data catalog manager captures the details of both internal and external data assets, describing the content, definitions and organization of the data and supporting multiple views of shared data. End-users can define workflows for scheduled data movement. Different Data Models for Different Users Figure 3 Industry Ontology North America Market Ontology Enterprise Ontology International Market Ontology The model manager enables organizations to construct data views to support a specific analytic or class of analytics, without the cost of building data marts or custom data movement processing.
  • 9. SEMANTIC ‘RADAR’ STEERS USERS TO INSIGHTS IN THE DATA LAKE 9 Access management features help users find data assets available in the data lake, automate registration and access to new data assets, and provide secure access to the data. Monitoring functions ensure that everything from the storage technology to the analytics and reporting functions is working properly, with tools to support intervention when necessary. • The query management layer: This helps users understand what data is avail- able and is most likely to deliver the richest Code Halo insights. Semantic search uses natural language queries to analyze the data and metadata to best identify the data that will support an analytical process. To reduce the time and effort required to find and model business-critical data, data discovery functions ana- lyze the data to find and link related data concepts. Optimized data structures speed analytics, with queries directed to the analytics engine that delivers the best performance. The query management layer thus enables “smart” analytics in which users can define not only which data to use, how to transform it, which analytics engine to use and which analysis to run, but also what type of chart to plot the results to, what file format to save the results in and who to mail the results to. Workflow tools allow users to automate the analytics process. • Rules-based data sourcing: This helps users fine-tune their queries using a con- text such as metadata and provenance. For example, a user could specify a query that says, “Read the data from the data lake, unless that data is more than X hours old, in which case it should be read directly from the source system.” While the data lake is most often used by analysts or business decision-makers, true Code Halo value is achieved when it responds to a customer’s click with an immediate, personalized response. This is enabled by features that expose analytics processes as a Web service, or through an application program interface (API) that is callable from a Web page or mobile app. Moving Forward Building an intelligent semantic model on top of your current information architec- ture may sound daunting. But it can and must be done, if you hope to discover and act on the next game-changing insight before your competitors do. You can get started with these important but gradual changes: • Prioritize the onboarding of data by its ability to create truly individualized customer experiences or business-changing insights into market needs or operations. • Onboard data assets as they become available, without waiting for a specific use case. Those uses will emerge when you least expect them, and when they do, you’ll need the data immediately. At a minimum, map the data to the semantic model for easier access. • Load the data without filtering or transforming it, since new data rules may override old rules. Filtering and transformation rules can be applied as the data is moved to an analytics engine or during query execution. • Model the data using familiar terminology. This makes it easier to change the model as needed without physically moving the data. Customize models for spe- cific business groups, encouraging them to ensure its accuracy and complete- ness. • Enable search mechanisms that make it easier for business users to see what data is available and accessible. • Balance legal and compliance needs for security with the imperative to improve the customer experience through analytics.
  • 10. 10 KEEP CHALLENGING September 2014 Code Halos are powerful sources of industry-changing insights. Too often, this knowledge is difficult to find within the volume and complexity of the data. A data lake provides the radar that allows end users to quickly see past the chaos of too much data and focus on the actionable insights they need. Footnotes 1 For more on Code Halo thinking, see Malcolm Frank, Paul Roehrig and Ben Pring, “Code Rules: A Playbook for Managing at the Crossroads,” Cognizant Technology Solutions, June 2013, http://www.cognizant.com/Futureofwork/Documents/ code-rules.pdf, or read our book Code Halos: How the Digital Lives of People, Things, and Organizations are Changing the Rules of Business, by Malcolm Frank, Paul Roehrig and Ben Pring, John Wiley & Sons, April 2014, http://www.wiley.com/ WileyCDA/WileyTitle/productCd-1118862074.html. 2 “Gartner Says the Internet of Things Installed Base Will Grow to 26 Billion Units by 2020,” Gartner, Inc., Dec. 13, 2013, http://www.gartner.com/newsroom/id/2636073. 3 James Dixon, “Pentaho, Big Data, and Data Lakes,” Pentaho, October 2010, http://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/. 4 Thomas Kelly, “Semantic Technology Drives Agile Business,” Cognizant Technology Solutions, August 2013, http://www.cognizant.com/InsightsWhitepapers/How- Semantic-Technology-Drives-Agile-Business.pdf. 5 “Financial Industry Business Ontology,” Object Management Group, http://www.omg.org/hot-topics/fibo.htm. 6 “Representing Regulations and Guidance in RDF,” Phuse Wiki, http://bit.ly/18WAxxf. Note: The logos and company names presented throughout this white paper are the property of their respective trademark owners, are not affiliated with Cognizant Technology Solutions, and are displayed for illustrative purposes only. Use of the logo does not imply endorsement of the organization by Cognizant, nor vice versa.
  • 11. SEMANTIC “RADAR” STEERS USERS TO INSIGHTS IN THE DATA LAKE 11 About the Authors Thomas Kelly is a Director in Cognizant’s Enterprise Information Management (EIM) Practice and heads its Semantic Technology Center of Excellence, a technology specialty of Cognizant Business Consulting. He has 20-plus years of technology consulting experience in leading data warehousing, business intelligence and big data projects, focused primarily on the life sciences and healthcare industries. He can be reached at Thomas.Kelly@cognizant.com. Robert Hoyle Brown is an Associate Vice President in Cognizant’s Center for the Future of Work, and drives strategy and market outreach for the Business Process Services Practice. Robert is responsible for looking at how Code Halos will change enterprise business process services in the coming years and is a regular contributor to the blog www.unevenlydistributed.com, “Signals from the Future of Work.” Prior to joining Cognizant, he was Managing Vice President of the Business and Appli- cations Services team at Gartner, and as a research analyst, he was a recognized subject matter expert in BPO, cloud services/BPaaS and HR services. He also held roles at Hewlett-Packard and G2 Research, a boutique outsourcing research firm in Silicon Valley. He holds a Bachelor of Arts degree from the University of California at Berkeley and, prior to his graduation, attended the London School of Economics as a Hansard Scholar. He can be reached at Robert.H.Brown@cognizant.com. The authors would like to thank Editorial Director Alan Alper for his insights and contributions to this white paper.
  • 12. World Headquarters 500 Frank W. Burr Blvd. Teaneck, NJ 07666 USA Phone: +1 201 801 0233 Fax: +1 201 801 0243 Toll Free: +1 888 937 3277 inquiry@cognizant.com European Headquarters 1 Kingdom Street Paddington Central London W2 6BD Phone: +44 (0) 207 297 7600 Fax: +44 (0) 207 121 0102 infouk@cognizant.com India Operations Headquarters #5/535, Old Mahabalipuram Road Okkiyam Pettai, Thoraipakkam Chennai, 600 096 India Phone: +91 (0) 44 4209 6000 Fax: +91 (0) 44 4209 6060 inquiryindia@cognizant.com © Copyright 2014, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of informa- tion technology, consulting, and business process outsourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jer- sey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embod- ies the future of work. With over 75 development and delivery centers worldwide and approximately 187,400 employees as of June 30, 2014, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.