Take a look at this review of current industry problems concerning data quality, and learn more about how companies are addressing quality problems with customer, product, and other types of corporate data. Read about products and use cases from SAP to see how vendors are supporting data cleansing.
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
Developing A Universal Approach to Cleansing Customer and Product Data
1. SAP Thought Leadership
Information Management
Developing a Universal approach to cleansing
cUstomer anD proDUct Data
2.
3. content
4 Executive Summary
5 Data Quality: What’s The Problem?
5 What Is Data Quality?
5 Measuring Data Quality
6 Components of a Data Quality Program
7 A Business Process Perspective to Data Quality
7 Data Quality Problems by Business Entity
8 The Role of Master Data
8 Types of Data Involved in Managing Data Quality
9 Cleansing Customer and Product Data
9 Product Data Example
10 Supporting Standards
10 Supporting National Languages
10 The Issue of Increasing Complexity
11 A Universal Approach to Data Quality Management
Is Required
12 SAP BusinessObjects Information Management
Solutions
12 Data Integration
12 Data Quality
12 Metadata Management
12 SAP BusinessObjects Data Insight
12 SAP BusinessObjects Data Quality Management
13 SAP BusinessObjects Universal Data Cleanse
14 Product Data
14 Multinational Data
15 Summary
15 For More Information
About the Author
Colin White, BI Research; BI Research is a research and consulting company
whose goal is to help companies understand and exploit new developments in
business intelligence and business integration. When combined, business intelli-
e
gence and business integration enable an organization to become a smart and
agile business. For more information, visit www.bi-research.com.
4. eXecUtive sUmmarY
Data quality has always been an impor- are addressing quality problems with why companies need to take an enter-
tant issue for companies, and this is customer, product, and other types of prise-wide approach to data cleansing
even more the case today. Business corporate data. It explains why data and data quality management. Products
and legislative pressures coupled with quality projects often evolve into mas- and use cases from SAP are used in
the explosion in the amount of data ter data management projects, and the paper to demonstrate how vendors
created by organizations is leading to discusses how master data can be are supporting data cleansing, and to
increased corporate attention on extracted from unstructured business illustrate how customers are using such
improving the quality and accuracy content and integrated into traditional functionality.
of business data. corporate data systems.
This paper reviews current industry The paper looks at the role of data
problems concerning data quality, and cleansing tools in helping improve data
takes a detailed look at how companies consistency and accuracy, and explains
4 SAP Thought Leadership – Information Management
5. Data QUalitY: What’s the proBlem?
THE COST OF POOR DATA QUALITy
There has been a steady stream of arti- Although nobody disputes that poor 100% safe. The degree to which a cus-
cles in the press about the impact of data quality has a significant financial tomer can accept a less than perfect or
poor data quality on the business ever impact on companies, the figures in safe product will depend on the type of
since companies began building data these articles are so variable and so customer buying the product, and on
warehouses and examining the quality imprecise that there is a risk of reader how the product will be used. The qual-
of data flowing into them. Here are fatigue when discussing data quality ity and safety of a product, however, is
some early examples: issues at this superficial level of detail. also dependent on how much money
Maybe this is why so many reports a manufacturer is willing to spend to
“The survey showed that estimates of show that a high percentage of organi- develop and produce the product,
poor information quality costs vary zations feel they have a data quality which in turn is dependent on how
from one to 20% of total company rev- problem, while at the same time indicat- much money a customer is willing to
enues. This disparity in cost estimation ing that an equally large number of orga- pay for it. All of these factors apply
existed not only from company to com- nizations are also doing little about it. equally to providing and supporting
pany but among senior information data quality.
technology managers within the same The problem is that “data quality”
organization.”1 is a vague term that means different Measuring Data Quality
things to different people. Before
“The Data Warehousing Institute data quality problems can be dis- While researching ISO 8402, I came
(TDWI) estimates that poor quality cus- cussed in detail, data quality must across an excellent 1994 paper by
tomer data costs U.S. businesses a be defined more precisely. Robert Walker, chairperson of the
staggering $611 billion a year in post- Association for Geographic Information
age, printing, and staff overhead.”2 What Is Data Quality? (AGI) Standards Data Quality Working
Group. He had this to say about the
More recent articles show the situation A useful definition of quality can be quality of data:
has not improved: found in the ISO 8402 specification,
which defines quality as, “The totality “Clearly, quality issues vary depending
“Poor data quality costs the typical of characteristics of an entity that bear on the type of data and the purpose to
company at least 10% of revenue; on its ability to satisfy stated or implied which it is put. The criteria that are
20% is probably a better estimate.”3 needs.” If we apply this definition to widely used for assessing data quality
data, then it means that data quality is are as follows:
“Companies are investing billions of dependent not only the characteristics • Lineage: This covers the ancestry of
dollars in CRM applications and data of the data itself, but also on the busi- the dataset. It describes the source
integration projects to gain a better ness context in which it is used (that is, material from which the data was
view of their customers – only to dis- on the business processes and busi- derived and the methods of deriva-
cover that conflicting data makes them ness users that employ it). tion. The description of the source
blind. Gartner estimates that more than material should include details of edi-
25% of critical data within large busi- This is somewhat analogous to product tions and dates of that source materi-
nesses is somehow inaccurate or quality. All product manufacturers have al and any ancillary information used
incomplete. And that imprecise data quality assurance programs, but no for updates. The methods of deriva-
is wreaking havoc.”4 product is ever 100% perfect, or tion should include details of the
1. Andrea Malcolm, “Poor Data Quality Costs 10% of Revenues, Survey Reveals,” ComputerWorld, July 1998.
2. Wayne W. Eckerson, “Data Quality and the Bottom Line,” TDWI Report Series, 2002.
3. Thomas C. Redman, “Data: An Unfolding Quality Disaster,” DM Review, August 2004.
4. Rick Whiting, “Hamstrung By Defective Data,” InformationWeek, May 2006.
SAP Thought Leadership – Information Management 5
6. operations used, the logical process- The first task is to measure and analyze additional data elements can be added
es (for example, algorithms and (or profile) the data to determine the as required to the data to enhance its
transformations) and details of any type and number of defects that may information content.
subsequent processing used in exist. This step provides the knowledge
producing the final digital data. required to develop a strategy to Data associated with a business pro-
• Accuracy: This is the closeness of improve the quality of the data. Once cess may come from a variety of differ-
the topic data associated with an this has been done, the next task is to ent sources, and although the data may
object or feature, to the true values, cleanse (parse, standardize, and cor- be consistent and correct for any given
or values that are accepted as being rect) the data. This requires parsing it source, merging it with other sources
true. It is usually recorded as a per- into its individual components, standard- can lead to duplicate records, and
centage correctness for each topic izing data formats and values based on redundant or inconsistent data ele-
or attribute. internal and industry business definitions ments. The role of the match and con-
• Currency: This is the level at which and rules, and applying cleansing algo- solidate tasks is to resolve and remove
the data is current (as opposed to rithms to remove syntactical or semantic this duplicate, redundant, and inconsis-
the date of the last update). data errors. At this stage in the process, tent data. Once the data has been
• Logical consistency: This is the
fidelity of data in relation to the Continuous
Monitoring
data structures. Measures ongoing data quality
• Completeness: This is a measure scores and provides alerts
of the correspondence between the Measure
real world and the specified dataset. Quantifies the number
and types of defects
It includes a statement of the rules
Consolidate
for including or excluding items from Combines unique
Analyze
the dataset.”5 data elements from
Assesses the
matched records into
a single source nature and
Although these comments concerned cause of data
defects
geographic data, the attributes of Match
lineage, accuracy, currency, logical
consistency, and completeness apply
Identifies duplicate
records within
Your
equally to all forms of data, regardless
multiple tables or
databases
Data
of its type or source. To maintain
data quality, organizations need to Match & Consolidate
constantly monitor and measure these Parse
Enhance Identifies and
attributes as a part of an ongoing data Appends additional isolates data
quality program. data increasing the elements in
value of the information data structures
Components of a Data Quality Data Enhancement
Standardize
Program Normalizes data value and
Correct formats according to your
Figure 1 (courtesy of SAP) illustrates Verifies, scrubs and rules and that of third-party
references
the main tasks of a data quality appends data based on
a set of sophisticated
program. algorithms that work with
secondary source
Data Cleansing
Figure 1: Components of a Data Quality Program [Source: SAP]
5. R.S. Walker, Data quality standard, In: Proceedings of AGI’94 Conference, 1994.
6 SAP Thought Leadership – Information Management
7. cleansed, enhanced, consolidated, and An example of the business case for required data quality, and a business
merged, it is ready for use. The quality approaching data quality improvement case built to address those data
of the data, however, must be moni- from a business process and business quality issues that provide the best
tored and profiled on a continuing basis entity perspective can be found in a return on investment. Unfortunately,
to measure its accuracy, currency, con- reply to an April 2006 Business Intelli- this understanding is severely lacking
sistency, and completeness. gence Network blog (“Surprise! Poor in many companies.
Data Quality Costs a Lot of Money”)
A Business Process Perspective by David Loshin: Data Quality Problems by
to Data Quality Business Entity
“At Shell, the business thought it was
IT staff usually think of data quality ini- selling 20,000 product combinations to A TDWI survey published in an April
tiatives in terms of individual applica- 20,000 commercial customers, and 2006 research report on data quality
tions. From a business perspective, was actually selling 5,000 products to (“Taking Data Quality to the Enterprise
however, the problems caused by poor 6,000 customers! It can be seen that through Data Governance” by Philip
data quality typically surface in the the potential for cost saving in a situa- Russom) showed that customer and
business processes of the organiza- tion like that was significant.” product data are the two main types
tion. Examples include: of business entity that are especially
• Wasted marketing efforts and costs Data quality initiatives to correct susceptible to quality problems. Sur-
• Customer dissatisfaction problems like those at Shell can only prisingly, as can be seen in Figure 2,
• Shipment, delivery, and billing errors succeed, however, if an organization financial data came third in the table of
• Excess inventory understands its business processes, results. Customer and product are two
• Incorrect financial statements and the data and data quality required of the main components of the master
• Delays in new product introduction by those processes. In this way, actual data of an organization.
data quality can be measured against
To improve data quality, IT staff must
change their application-specific
approach to data quality initiatives, and Type of Business Data Percentage of Respondents
align them instead with key corporate Customer Data 74%
business processes, such as customer Product Data 43%
marketing and supply chain optimiza- Financial Data 36%
tion, and underlying business entities
Sales Contact Data 27%
like customer and product. This align-
Data from ERP Systems 25%
ment helps improve data consistency
across application systems, which in Employee Data 16%
turn simplifies data integration tasks, International Data 12%
such as building an integrated product Other 10%
catalog or creating a single view of
the customer. Figure 2: Business Entities Susceptible to Data Quality Problems
SAP Thought Leadership – Information Management 7
8. The Role of Master Data and simple data structures, has simple As companies begin to realize that they
keys and governance rules, is often must focus data quality efforts on spe-
Master data is defined as data about standardized (U.S. state codes, for cific business processes and business
the key business entities of an organi- example), involves only a few applica- entities, the ability to manage master
zation. Examples include customer, tions, and is reasonably stable. data becomes increasingly important.
product, organizational structure, and This is why many data quality initiatives
chart of accounts. A common question Master business entity data, such as often evolve into master data manage-
about master data is, “What is the dif- customer and product, on the other ment projects. The Business Intelli-
ference between master data and refer- hand, is usually ill-defined, has complex gence Network recently published a
ence data?” Some people take the data structures and relationships, report (“The Pillars of Master Data
position that they are the same thing, requires compound and intelligent keys Management: Data Profiling, Data Inte-
but it can be argued that not all refer- and complex governance rules, is not gration and Data Quality”) by David
ence data is master data. For example, usually standardized, involves many Loshin that discusses this topic in
lookup and code tables that are used applications, and changes frequently. detail and is recommended reading.
to encode information, such as state
names and order codes, are not strictly Does this distinction really matter? Types of Data Involved in Manag-
master data tables. When developing data quality manage- ing Data Quality
ment and master data management
The dividing line between master data systems, it does. Cleaning and manag- A data quality program involves many
and reference data is not always clear- ing master reference data is a reason- different types of data. The data may
cut. One solution is to break master ably easy job. Cleaning master data, on be current or historic, detailed or sum-
data into two types: master reference the other hand, is a complex task and marized, and may have a structured,
data and master business entity data. requires data cleansing tools that are unstructured, or semi-structured for-
Master reference data has well-defined specifically designed for that purpose. mat. It may consist of master business
entity or master reference data, or
activity data created by business trans-
action (order entry, billing, shipping),
BI (reporting, analytics), planning (bud-
geting, forecasting), and collaborative
(e-mail, blogging) applications. All of
this data must be structurally and
semantically consistent with its associ-
ated technical and business rules. One
of the main objectives of a data quality
program is to assess and correct data
consistency and accuracy problems
caused by applications that do not fully
enforce those rules.
Most data quality initiatives focus
almost exclusively on structured data,
which represents only a fraction of the
data that exists in an organization.
There is, however, increasing interest in
8 SAP Thought Leadership – Information Management
9. leveraging the value of unstructured customers are providing data in lan- (CD, Jumbo CD, and so on), and wish
data in both master data and business guages other than English. These to standardize the values for yield and
intelligence (BI) applications. This is problems are causing considerable minimum deposit.
especially the case for customer and business inefficiency and are leading
product data, which may be managed to costly mistakes being made. Solving These types of data cleansing opera-
in unstructured files, or encapsulated in these problems, therefore, can provide tions can be used for all types of
a single field with a structured file or significant business benefit. unstructured information. To date,
database. To be useful this unstruc- most companies have focused on
tured data must be cleansed and trans- Product Data Example customer data, but many are now
formed into a semi-structured (XML, Figure 3 illustrates how unstructured beginning to process product and
for example) or structured format. data describing financial product offer- other types of data.
ings can be decomposed into separate
Cleansing and transforming unstruc- data elements, and then used to Customer data is usually easier to
tured data improves operational effi- populate a product table. The original deconstruct and clean than product
ciency and can add significant value to unstructured data could come from a data. Although there are variations in
existing decision-making applications. Web page or a product catalog, but in people names, company names, titles,
Examples of applications here include: some cases may simply be encoded phone numbers, and addresses, cus-
• Customer and market intelligence – in a single field of a structured data- tomer data records are reasonably con-
Internet Web pages base record. sistent in the number of data elements
• Customer sentiment and complaint involved. This is mainly due to the fact
analysis – Web logs (blogs) and In this example, the customer wants a that most countries have standardized
customer support center call records data quality solution to parse, standard- name and address formats. This is not
• Product safety and quality analysis – ize, and cleanse the financial data, and the case for product data, which is
Service center records to identify the financial institutions that more variable, and involves a wider
• Product master data management – are offering certificates of deposit range of industry standards. Some
Product catalogs (CD). They also want information about of these standards are very specific,
• Legal discovery – E-mails and maturity (3 months, 1 year, and so on) while others are global in nature.
instant messages and the type of CD being offered
Cleansing Customer and
Product Data
Every company struggles with the enor-
mous amount of customer and product
data that is represented in a variety of
ways across disparate systems or
departments. A product name such as
LIGHT BULB-120 VOLT 100 WATT
may be entered multiple times with dif-
ferent capitalization, spelling, abbrevia-
tions, and punctuation. Adding to this
issue is the problem of supporting
multiple languages in today’s global
marketplace where suppliers and
Figure 3: Cleansing Unstructured Product Data
SAP Thought Leadership – Information Management 9
10. Supporting Standards Organizations that deal with global
information often need to adjust data
Many organizations create their own fields for unique regional attributes and
hierarchical product codes, while others relationships. In some countries there
follow industry standards where possi- are titles and relationships that may not
ble. Supply chain efficiency can be be known in the English language.
gained by appending industry standard Sonya Gonzalez viuda de Martinez, for
product codes to product descriptions. example, translates to Sonya Gonzalez
A common master product identifier widow of Martinez. In this case, the
shared by all suppliers makes it much text widow of might not be recognized
easier to support roll-up and drill-down properly by the organization if it is just
reports and analyses based on a family processing standard first and last name
of products. elements. This could be important for a
banking institution that is trying to dis-
Well-known standards include: burse the funds of a will to the correct
• United Nations Standard Products Sonya Gonzalez.
and Services Code (UNSPSC)
• Export Control Classification Number Another example is parsing vendor
(ECCN−used by the import/export information for paying invoices. An
and transportation industries input string in Portuguese could read
• International Organization for Computadoras Brasileiras Incorpora-
Standardization (ISO) das representante autorizado de Micro-
• National Stock Number Codes soft. The text representante autorizado
(NSN)−used by the military de is the equivalent to authorized agent
for (in this case, Microsoft). What
It is very important, therefore, that exists here is a business relationship The Issue of Increasing
data cleansing tools provide support between Computadoras Brasileiras Complexity
for these standards. Incorporadas and Microsoft, and this
understanding is crucial if a payment is Corporate data is created and managed
Supporting National Languages to be made payable to the right party. It by many different business processes
is important when dealing with interna- (see Figure 4) and in most companies,
Whether an organization has custom- tional data to have a data quality solu- the applications supporting those busi-
ers, suppliers, or products in the United tion that is Unicode-enabled, and that ness processes have been developed
States, Europe, Asia, or any combina- recognizes international characters oth- over many years. Some involve legacy
tion of countries, it is important that the er than those found in the English lan- applications on centralized mainframes,
level of data quality is the same across guage. By applying data quality global- while others employ distributed comput-
all data stores, and it is vital to be able ly, organizations have a more complete ing technologies, such as client/server
to trust the data, regardless of what view of customer and product informa- computing, Web-based computing,
language the original data is in. Equally tion, which improves business deci- or Web services. Some are developed
important is the ability to present data sions, customer service, and supply in-house, while others use packaged
in local languages, and to be able to chain management. solutions from applications vendors
tailor data quality program rules to suit
a specific language or region.
10 SAP Thought Leadership – Information Management
11. or software-as-a-service vendors. A Universal Approach to Data dynamically by applications (see
Company acquisitions and mergers Quality Management Is Required Figure 4). This allows business rules for
add to the application mix. These appli- data quality enforcement to be moved
cations have usually been developed The trend by organizations toward a outside of applications and applied
and deployed independently of each services-oriented architecture (SOA) universally at a business process level,
other, without any notion that one day for building new applications and rather than at the individual application
the data they create and manage may integrating existing ones reduces level. These services can be called
be used by other applications for point-to-point application connections proactively by applications as data is
other purposes. and eases application integration. It entered into an application system,
also encourages developers to think or reactively by batch data quality man-
It is this complex and unmanaged envi- in terms of supporting business pro- agement tools after the data has been
ronment that has led to many of the cesses as a set of services, rather than created. Single-record rule enforcement
data quality problems that exist today, as a package of monolithic applications. can be done proactively, whereas rule
and given the huge growth in the data enforcement that spans records, data-
being generated both inside and out- The SOA approach supports any task bases, and applications systems will
side of companies, application com- that can be defined as a service. This typically be done reactively.
plexity and data quality is likely to concept not only applies to business
steadily get worse unless companies process tasks, but also to data and
pay more attention to enforcing data system management tasks as well.
quality and evolve to support a univer- This enables the tasks associated with
sal approach to managing data cleans- a data quality program to be deployed
ing and data quality management. as a set of services that can be called
Figure 4: An SOA Can Enable a Universal
Approach to Data Quality Management
web external and
content syndicated
Business Business plans
digital information
collaboration planning forecasts
media processes email processes budgets
instant messages
workgroup
documents
Service-oriented
corporate architecture
low-latency
documents Business Business data
transaction intelligence
transaction processes processes historical
data data
Master Data quality
integrated services
master data data
processes
SAP Thought Leadership – Information Management 11
12. sap Businessobjects inFormation
management solUtions
SAP is an industry-leading provider of products, relational databases, model- • An open metadata repository that
business intelligence and information ing tools, and third-party solutions. IT allows the tool to be used by
management solutions. The objective staff and business analysts use a thin- custom-built applications
of information management is to pro- client Web interface to explore metada- • Scheduling of profiling tasks to run
vide customers with a flexible solution ta and gain a better understanding of any time without operator
for building an enterprise-wide data the metadata used in projects, to sup- intervention
integration and data quality manage- port impact analysis, and to report on
ment environment. metadata and data lineage. The knowledge gained from SAP
BusinessObjects Data Insight concern-
SAP BusinessObjects solutions sup- SAP BusinessObjects ing existing data quality issues can be
port three key information management Data Insight™ used to develop a detailed strategy for
functions in data integration, data cleansing and correcting data defects
quality, and metadata management. SAP BusinessObjects Data Insight using SAP BusinessObjects Data
software is a data profiling tool that Quality Management software.
Data Integration handles the data assessment task (see
Key data integration solutions include: Figure 1) of a data quality program. It is SAP BusinessObjects Data
• SAP BusinessObjects Data Integra- used to monitor, measure, analyze, and Quality Management
tor software, a batch-oriented report on the quality of data stored in
extract, transformation, and load relational database systems and flat SAP BusinessObjects Data Quality
(ETL) tool supporting enterprise data files. It helps users gain visibility and a Management handles the data cleans-
consolidation and federation detailed understanding of the data qual- ing, enhancement, matching, and
• SAP BusinessObjects Data ity defects in the source data. Key fea- consolidation tasks of the data quality
Federator software, an enterprise tures of SAP BusinessObjects Data program shown in Figure 1. The
information integration tool enabling Insight include: main facilities provided by SAP
on-demand data access to a variety • Flexible processing models that can BusinessObjects Data Quality
of data sources process source data in place or after Management include:
• SAP BusinessObjects Rapid Marts® extracting it into SAP BusinessOb- • Parsing and standardizing customer
packages, packaged data integration jects Data Insight and other types of operational data
solutions for enterprise applications • Automatic discovery of business • Correcting data based on supplied
such as Oracle, PeopleSoft, and rules and relationships and user-defined dictionaries and
Siebel • Support for data redundancy profil- business rules
ing, drill-down frequency distribu- • Appending additional information
Data Quality tions, cross-column and cross-table such as phone numbers to provide
SAP BusinessObjects Data Insight™ comparisons, and pattern recognition a more complete set of data
software and BusinessObjects Data • Allows users to establish thresholds • Matching data to build data relation-
Quality Management software enable and set up automated alerts to notify ships, and identify and remove dupli-
the profiling, cleansing, enhancement, if analysis results exceed a specific cate records
matching, consolidation, and monitoring quality metric • Consolidating data to create a single
of data. • Graphical and dashboard reports view across databases
(using Crystal Reports® software)
Metadata Management that provide summary profiles, fre-
SAP BusinessObjects Metadata quency distributions, and referential
Management software collects and integrity information
unifies metadata from BI objects, ETL
12 SAP Thought Leadership – Information Management
13. SAP BusinessObjects Data Quality
Data Quality
Project Architect Management now includes universal
data cleanse (UDC), an add-on option
that allows user-defined transforms and
their associated dictionaries and busi-
Web Browser Universal
Data ness rules to be created for cleansing
Web Server
Cleanse
Data other types of data (unstructured prod-
Insight uct data, for example). Key features of
Project the data cleansing capability include:
portal
Data • Identifies, standardizes, and corrects
Packaged Quality
Applications Data quality global data for over 200 countries
web services Server
Data Quality • Provides international name (firm and
( NET & Java)
Repository person, prefix, title) parsing packages
(workflows, for numerous countries
Custom
transforms
Applications
and runtime • Identifies customer information
metadata) (addresses, phone numbers, social
Metadata Repository security numbers, dates) and verifies
(runtime statistics) that it is properly formatted
• Allows users to add custom dictionar-
Figure 5: SAP BusinessObjects Data Quality Management ies and rules to parse and standard-
ize other data elements, such as part
At the heart of SAP BusinessObjects Support for Web services and an SOA numbers, product codes, purchase
Data Quality Management is a data environment in SAP BusinessObjects orders, SKUs, customer identification
quality server (see Figure 5) that Data Quality Management allows data numbers, and so forth
executes workflows and workflow quality software services to be shared • Handles free-form text up to 8,000
transforms to carry out data quality by multiple business processes. Sepa- characters in size and parses it into
management tasks that have been rating data quality rules from the pro- appropriate structured data fields
defined using the provided GUI-driven cesses that use them improves flexibili-
project architect. These workflows ty and makes it possible for the rules User-defined dictionaries can be
can be called dynamically as a Web to be dynamically maintained to meet populated interactively using the
service, run as a standalone batch job, changing business needs. project architect, or can be bulk loaded
or integrated with external systems and from an XML file. The bulk load capabili-
applications, such as SAP software, SAP BusinessObjects Universal ty is useful for loading industry standard
Informatica, Oracle, PeopleSoft, Data Cleanse product codes and classifications,
and Siebel. weights and measures tables, interna-
The data cleansing process of tional language translation tables, and
The capability of SAP BusinessObjects SAP BusinessObjects Data Quality data classifications identified using SAP
Data Quality Management to call a data Management standardizes data to BusinessObjects Data Insight software.
quality workflow as a Web service in an ensure a consistent record format, and
SOA environment enables the work- corrects data based on classification
flows to be used dynamically by an dictionaries and business rules. Prior
organization’s operational business to the current release, the product
processes and the quality of informa- provided cleansing transforms that
tion to be checked, in-flight, as it flows focused primarily on customer data.
between systems and applications.
SAP Thought Leadership – Information Management 13
14. Product Data One SAP BusinessObjects customer processes, regardless of what language
uses this approach to reconcile data for the original data is in. Equally important
An important use case for UDC is the 140,000 products that are managed in is the ability to present data in local
cleansing, matching, and consolidation 11 different data repositories. These languages. UDC is Unicode-enabled,
of structured and unstructured product techniques could also be extended to supports mixed language product
data. A goal for a manufacturer, for support the reconciling of product data descriptions, and incorporates
example, may be to improve the consis- for integration into a single master data language-specific lexicons for colors,
tency between item descriptions on a management system. sizes, and weights. It is, therefore,
Web site and those in product cata- ideally suited for cleansing data in
logs. To achieve consistency, the com- Multinational Data companies whose operations span
pany would parse the item descriptions many different countries.
on the Web site and in the catalogs, Another scenario for UDC is for the
understand the relationships between processing of multinational data. For
the items, and use the results to cor- an increasing number of organizations,
rect the item entries in each location to it is important to be able to create
make them consistent. intelligent and standardized parsing
14 SAP Thought Leadership – Information Management
15. sUmmarY
Companies today cannot operate effec- It is important that data quality tasks For More Information
tively, or compete successfully, unless are not buried in individual applications,
they give their users timely access to but are implemented instead as a set of To learn more about SAP
accurate and consistent information. services that can be used by any appli- BusinessObjects information manage-
The three key words here are timely, cation from anywhere in the organiza- ment solutions from SAP, visit our
accurate, and consistent. tion. This approach ensures that busi- Web site at www.sap.com/
ness processes obey a set of common sapbusinessobjects.
The problem is that many organizations business rules, and permits data quality
don’t know how consistent or accurate service levels to be adjusted without
their data is, or how much poor data directly affecting existing applications.
quality data is costing them. This is It enables a data quality program to be
because these organizations don’t con- deployed in a phased approach, one
sider data quality requirements from business process at a time, and allows
the perspective of individual business applications to be migrated to a com-
users and processes. Data quality is mon data quality approach in an
not an all-or-nothing concept. Different orderly fashion.
users need different levels of data qual-
ity, and it is essential to categorize data Executives have a lot at stake when it
by business needs and required data comes to data quality and the tough
quality service levels. It is important to corporate compliance environment that
profile and constantly monitor data to we live in today. It is crucial, therefore,
ensure that it is satisfying those ser- that data quality, like data integration,
vice levels, and to rapidly correct any be viewed from an enterprise perspec-
deficiencies that are degrading service tive. SAP is a company that recognizes
levels. Organizations also want to this need and, through its information
ensure that data quality processes are management solutions, is addressing
scalable and sharable, and provide the that need.
flexibility to allow users to dynamically
change business rules based on busi-
ness needs and context.
SAP Thought Leadership – Information Management 15