SlideShare a Scribd company logo
1 of 16
Download to read offline
SAP Thought Leadership
                                 Information Management




Developing a Universal approach to cleansing
cUstomer anD proDUct Data
content




 4   Executive Summary

 5   Data Quality: What’s The Problem?
 5   What Is Data Quality?
 5   Measuring Data Quality
 6   Components of a Data Quality Program
 7   A Business Process Perspective to Data Quality
 7   Data Quality Problems by Business Entity
 8   The Role of Master Data
 8   Types of Data Involved in Managing Data Quality
 9   Cleansing Customer and Product Data
 9   Product Data Example
10   Supporting Standards
10   Supporting National Languages
10   The Issue of Increasing Complexity
11   A Universal Approach to Data Quality Management
     Is Required

12   SAP BusinessObjects Information Management
     Solutions
12   Data Integration
12   Data Quality
12   Metadata Management
12   SAP BusinessObjects Data Insight
12   SAP BusinessObjects Data Quality Management
13   SAP BusinessObjects Universal Data Cleanse
14   Product Data
14   Multinational Data

15   Summary
15   For More Information




About the Author

Colin White, BI Research; BI Research is a research and consulting company
whose goal is to help companies understand and exploit new developments in
business intelligence and business integration. When combined, business intelli-
                    e
gence and business integration enable an organization to become a smart and
agile business. For more information, visit www.bi-research.com.
eXecUtive sUmmarY




Data quality has always been an impor-        are addressing quality problems with      why companies need to take an enter-
tant issue for companies, and this is         customer, product, and other types of     prise-wide approach to data cleansing
even more the case today. Business            corporate data. It explains why data      and data quality management. Products
and legislative pressures coupled with        quality projects often evolve into mas-   and use cases from SAP are used in
the explosion in the amount of data           ter data management projects, and         the paper to demonstrate how vendors
created by organizations is leading to        discusses how master data can be          are supporting data cleansing, and to
increased corporate attention on              extracted from unstructured business      illustrate how customers are using such
improving the quality and accuracy            content and integrated into traditional   functionality.
of business data.                             corporate data systems.

This paper reviews current industry           The paper looks at the role of data
problems concerning data quality, and         cleansing tools in helping improve data
takes a detailed look at how companies        consistency and accuracy, and explains




4   SAP Thought Leadership – Information Management
Data QUalitY: What’s the proBlem?
THE COST OF POOR DATA QUALITy




There has been a steady stream of arti-           Although nobody disputes that poor               100% safe. The degree to which a cus-
cles in the press about the impact of             data quality has a significant financial         tomer can accept a less than perfect or
poor data quality on the business ever            impact on companies, the figures in              safe product will depend on the type of
since companies began building data               these articles are so variable and so            customer buying the product, and on
warehouses and examining the quality              imprecise that there is a risk of reader         how the product will be used. The qual-
of data flowing into them. Here are               fatigue when discussing data quality             ity and safety of a product, however, is
some early examples:                              issues at this superficial level of detail.      also dependent on how much money
                                                  Maybe this is why so many reports                a manufacturer is willing to spend to
“The survey showed that estimates of              show that a high percentage of organi-           develop and produce the product,
poor information quality costs vary               zations feel they have a data quality            which in turn is dependent on how
from one to 20% of total company rev-             problem, while at the same time indicat-         much money a customer is willing to
enues. This disparity in cost estimation          ing that an equally large number of orga-        pay for it. All of these factors apply
existed not only from company to com-             nizations are also doing little about it.        equally to providing and supporting
pany but among senior information                                                                  data quality.
technology managers within the same               The problem is that “data quality”
organization.”1                                   is a vague term that means different             Measuring Data Quality
                                                  things to different people. Before
“The Data Warehousing Institute                   data quality problems can be dis-                While researching ISO 8402, I came
(TDWI) estimates that poor quality cus-           cussed in detail, data quality must              across an excellent 1994 paper by
tomer data costs U.S. businesses a                be defined more precisely.                       Robert Walker, chairperson of the
staggering $611 billion a year in post-                                                            Association for Geographic Information
age, printing, and staff overhead.”2              What Is Data Quality?                            (AGI) Standards Data Quality Working
                                                                                                   Group. He had this to say about the
More recent articles show the situation           A useful definition of quality can be            quality of data:
has not improved:                                 found in the ISO 8402 specification,
                                                  which defines quality as, “The totality          “Clearly, quality issues vary depending
“Poor data quality costs the typical              of characteristics of an entity that bear        on the type of data and the purpose to
company at least 10% of revenue;                  on its ability to satisfy stated or implied      which it is put. The criteria that are
20% is probably a better estimate.”3              needs.” If we apply this definition to           widely used for assessing data quality
                                                  data, then it means that data quality is         are as follows:
“Companies are investing billions of              dependent not only the characteristics           • Lineage: This covers the ancestry of
dollars in CRM applications and data              of the data itself, but also on the busi-          the dataset. It describes the source
integration projects to gain a better             ness context in which it is used (that is,         material from which the data was
view of their customers – only to dis-            on the business processes and busi-                derived and the methods of deriva-
cover that conflicting data makes them            ness users that employ it).                        tion. The description of the source
blind. Gartner estimates that more than                                                              material should include details of edi-
25% of critical data within large busi-           This is somewhat analogous to product              tions and dates of that source materi-
nesses is somehow inaccurate or                   quality. All product manufacturers have            al and any ancillary information used
incomplete. And that imprecise data               quality assurance programs, but no                 for updates. The methods of deriva-
is wreaking havoc.”4                              product is ever 100% perfect, or                   tion should include details of the




1.   Andrea Malcolm, “Poor Data Quality Costs 10% of Revenues, Survey Reveals,” ComputerWorld, July 1998.
2.   Wayne W. Eckerson, “Data Quality and the Bottom Line,” TDWI Report Series, 2002.
3.   Thomas C. Redman, “Data: An Unfolding Quality Disaster,” DM Review, August 2004.
4.   Rick Whiting, “Hamstrung By Defective Data,” InformationWeek, May 2006.


                                                                                           SAP Thought Leadership – Information Management   5
operations used, the logical process-          The first task is to measure and analyze                    additional data elements can be added
    es (for example, algorithms and                (or profile) the data to determine the                      as required to the data to enhance its
    transformations) and details of any            type and number of defects that may                         information content.
    subsequent processing used in                  exist. This step provides the knowledge
    producing the final digital data.              required to develop a strategy to                           Data associated with a business pro-
•   Accuracy: This is the closeness of             improve the quality of the data. Once                       cess may come from a variety of differ-
    the topic data associated with an              this has been done, the next task is to                     ent sources, and although the data may
    object or feature, to the true values,         cleanse (parse, standardize, and cor-                       be consistent and correct for any given
    or values that are accepted as being           rect) the data. This requires parsing it                    source, merging it with other sources
    true. It is usually recorded as a per-         into its individual components, standard-                   can lead to duplicate records, and
    centage correctness for each topic             izing data formats and values based on                      redundant or inconsistent data ele-
    or attribute.                                  internal and industry business definitions                  ments. The role of the match and con-
•   Currency: This is the level at which           and rules, and applying cleansing algo-                     solidate tasks is to resolve and remove
    the data is current (as opposed to             rithms to remove syntactical or semantic                    this duplicate, redundant, and inconsis-
    the date of the last update).                  data errors. At this stage in the process,                  tent data. Once the data has been
•   Logical consistency: This is the
    fidelity of data in relation to the                                                         Continuous
                                                                                                Monitoring
    data structures.                                                           Measures ongoing data quality
•   Completeness: This is a measure                                               scores and provides alerts
    of the correspondence between the                                                                                            Measure
    real world and the specified dataset.                                                                                        Quantifies the number
                                                                                                                                 and types of defects
    It includes a statement of the rules
                                                             Consolidate
    for including or excluding items from               Combines unique
                                                                                                                                             Analyze
    the dataset.”5                                    data elements from
                                                                                                                                             Assesses the
                                                     matched records into
                                                          a single source                                                                    nature and
Although these comments concerned                                                                                                            cause of data
                                                                                                                                             defects
geographic data, the attributes of                                Match
lineage, accuracy, currency, logical
consistency, and completeness apply
                                                     Identifies duplicate
                                                         records within
                                                                                                        Your
equally to all forms of data, regardless
                                                       multiple tables or
                                                              databases
                                                                                                        Data
of its type or source. To maintain
data quality, organizations need to                Match & Consolidate
constantly monitor and measure these                                                                                                         Parse
                                                         Enhance                                                                             Identifies and
attributes as a part of an ongoing data                  Appends additional                                                                  isolates data
quality program.                                         data increasing the                                                                 elements in
                                                         value of the information                                                            data structures

Components of a Data Quality                            Data Enhancement
                                                                                                                              Standardize
Program                                                                                                                       Normalizes data value and
                                                                                              Correct                         formats according to your
Figure 1 (courtesy of SAP) illustrates                                                        Verifies, scrubs and            rules and that of third-party
                                                                                                                              references
the main tasks of a data quality                                                              appends data based on
                                                                                              a set of sophisticated
program.                                                                                      algorithms that work with
                                                                                              secondary source

                                                                                                                                              Data Cleansing

                                                   Figure 1: Components of a Data Quality Program [Source: SAP]

5. R.S. Walker, Data quality standard, In: Proceedings of AGI’94 Conference, 1994.


6     SAP Thought Leadership – Information Management
cleansed, enhanced, consolidated, and         An example of the business case for               required data quality, and a business
merged, it is ready for use. The quality      approaching data quality improvement              case built to address those data
of the data, however, must be moni-           from a business process and business              quality issues that provide the best
tored and profiled on a continuing basis      entity perspective can be found in a              return on investment. Unfortunately,
to measure its accuracy, currency, con-       reply to an April 2006 Business Intelli-          this understanding is severely lacking
sistency, and completeness.                   gence Network blog (“Surprise! Poor               in many companies.
                                              Data Quality Costs a Lot of Money”)
A Business Process Perspective                by David Loshin:                                  Data Quality Problems by
to Data Quality                                                                                 Business Entity
                                              “At Shell, the business thought it was
IT staff usually think of data quality ini-   selling 20,000 product combinations to            A TDWI survey published in an April
tiatives in terms of individual applica-      20,000 commercial customers, and                  2006 research report on data quality
tions. From a business perspective,           was actually selling 5,000 products to            (“Taking Data Quality to the Enterprise
however, the problems caused by poor          6,000 customers! It can be seen that              through Data Governance” by Philip
data quality typically surface in the         the potential for cost saving in a situa-         Russom) showed that customer and
business processes of the organiza-           tion like that was significant.”                  product data are the two main types
tion. Examples include:                                                                         of business entity that are especially
• Wasted marketing efforts and costs          Data quality initiatives to correct               susceptible to quality problems. Sur-
• Customer dissatisfaction                    problems like those at Shell can only             prisingly, as can be seen in Figure 2,
• Shipment, delivery, and billing errors      succeed, however, if an organization              financial data came third in the table of
• Excess inventory                            understands its business processes,               results. Customer and product are two
• Incorrect financial statements              and the data and data quality required            of the main components of the master
• Delays in new product introduction          by those processes. In this way, actual           data of an organization.
                                              data quality can be measured against
To improve data quality, IT staff must
change their application-specific
approach to data quality initiatives, and     Type of Business Data                             Percentage of Respondents
align them instead with key corporate         Customer Data                                     74%
business processes, such as customer          Product Data                                      43%
marketing and supply chain optimiza-          Financial Data                                    36%
tion, and underlying business entities
                                              Sales Contact Data                                27%
like customer and product. This align-
                                              Data from ERP Systems                             25%
ment helps improve data consistency
across application systems, which in          Employee Data                                     16%
turn simplifies data integration tasks,       International Data                                12%
such as building an integrated product        Other                                             10%
catalog or creating a single view of
the customer.                                 Figure 2: Business Entities Susceptible to Data Quality Problems




                                                                                        SAP Thought Leadership – Information Management   7
The Role of Master Data                       and simple data structures, has simple      As companies begin to realize that they
                                              keys and governance rules, is often         must focus data quality efforts on spe-
Master data is defined as data about          standardized (U.S. state codes, for         cific business processes and business
the key business entities of an organi-       example), involves only a few applica-      entities, the ability to manage master
zation. Examples include customer,            tions, and is reasonably stable.            data becomes increasingly important.
product, organizational structure, and                                                    This is why many data quality initiatives
chart of accounts. A common question          Master business entity data, such as        often evolve into master data manage-
about master data is, “What is the dif-       customer and product, on the other          ment projects. The Business Intelli-
ference between master data and refer-        hand, is usually ill-defined, has complex   gence Network recently published a
ence data?” Some people take the              data structures and relationships,          report (“The Pillars of Master Data
position that they are the same thing,        requires compound and intelligent keys      Management: Data Profiling, Data Inte-
but it can be argued that not all refer-      and complex governance rules, is not        gration and Data Quality”) by David
ence data is master data. For example,        usually standardized, involves many         Loshin that discusses this topic in
lookup and code tables that are used          applications, and changes frequently.       detail and is recommended reading.
to encode information, such as state
names and order codes, are not strictly       Does this distinction really matter?        Types of Data Involved in Manag-
master data tables.                           When developing data quality manage-        ing Data Quality
                                              ment and master data management
The dividing line between master data         systems, it does. Cleaning and manag-       A data quality program involves many
and reference data is not always clear-       ing master reference data is a reason-      different types of data. The data may
cut. One solution is to break master          ably easy job. Cleaning master data, on     be current or historic, detailed or sum-
data into two types: master reference         the other hand, is a complex task and       marized, and may have a structured,
data and master business entity data.         requires data cleansing tools that are      unstructured, or semi-structured for-
Master reference data has well-defined        specifically designed for that purpose.     mat. It may consist of master business
                                                                                          entity or master reference data, or
                                                                                          activity data created by business trans-
                                                                                          action (order entry, billing, shipping),
                                                                                          BI (reporting, analytics), planning (bud-
                                                                                          geting, forecasting), and collaborative
                                                                                          (e-mail, blogging) applications. All of
                                                                                          this data must be structurally and
                                                                                          semantically consistent with its associ-
                                                                                          ated technical and business rules. One
                                                                                          of the main objectives of a data quality
                                                                                          program is to assess and correct data
                                                                                          consistency and accuracy problems
                                                                                          caused by applications that do not fully
                                                                                          enforce those rules.

                                                                                          Most data quality initiatives focus
                                                                                          almost exclusively on structured data,
                                                                                          which represents only a fraction of the
                                                                                          data that exists in an organization.
                                                                                          There is, however, increasing interest in




8   SAP Thought Leadership – Information Management
leveraging the value of unstructured         customers are providing data in lan-            (CD, Jumbo CD, and so on), and wish
data in both master data and business        guages other than English. These                to standardize the values for yield and
intelligence (BI) applications. This is      problems are causing considerable               minimum deposit.
especially the case for customer and         business inefficiency and are leading
product data, which may be managed           to costly mistakes being made. Solving          These types of data cleansing opera-
in unstructured files, or encapsulated in    these problems, therefore, can provide          tions can be used for all types of
a single field with a structured file or     significant business benefit.                   unstructured information. To date,
database. To be useful this unstruc-                                                         most companies have focused on
tured data must be cleansed and trans-       Product Data Example                            customer data, but many are now
formed into a semi-structured (XML,          Figure 3 illustrates how unstructured           beginning to process product and
for example) or structured format.           data describing financial product offer-        other types of data.
                                             ings can be decomposed into separate
Cleansing and transforming unstruc-          data elements, and then used to                 Customer data is usually easier to
tured data improves operational effi-        populate a product table. The original          deconstruct and clean than product
ciency and can add significant value to      unstructured data could come from a             data. Although there are variations in
existing decision-making applications.       Web page or a product catalog, but in           people names, company names, titles,
Examples of applications here include:       some cases may simply be encoded                phone numbers, and addresses, cus-
• Customer and market intelligence –         in a single field of a structured data-         tomer data records are reasonably con-
  Internet Web pages                         base record.                                    sistent in the number of data elements
• Customer sentiment and complaint                                                           involved. This is mainly due to the fact
  analysis – Web logs (blogs) and            In this example, the customer wants a           that most countries have standardized
  customer support center call records       data quality solution to parse, standard-       name and address formats. This is not
• Product safety and quality analysis –      ize, and cleanse the financial data, and        the case for product data, which is
  Service center records                     to identify the financial institutions that     more variable, and involves a wider
• Product master data management –           are offering certificates of deposit            range of industry standards. Some
  Product catalogs                           (CD). They also want information about          of these standards are very specific,
• Legal discovery – E-mails and              maturity (3 months, 1 year, and so on)          while others are global in nature.
  instant messages                           and the type of CD being offered

Cleansing Customer and
Product Data

Every company struggles with the enor-
mous amount of customer and product
data that is represented in a variety of
ways across disparate systems or
departments. A product name such as
LIGHT BULB-120 VOLT 100 WATT
may be entered multiple times with dif-
ferent capitalization, spelling, abbrevia-
tions, and punctuation. Adding to this
issue is the problem of supporting
multiple languages in today’s global
marketplace where suppliers and


                                             Figure 3: Cleansing Unstructured Product Data



                                                                                     SAP Thought Leadership – Information Management   9
Supporting Standards                           Organizations that deal with global
                                               information often need to adjust data
Many organizations create their own            fields for unique regional attributes and
hierarchical product codes, while others       relationships. In some countries there
follow industry standards where possi-         are titles and relationships that may not
ble. Supply chain efficiency can be            be known in the English language.
gained by appending industry standard          Sonya Gonzalez viuda de Martinez, for
product codes to product descriptions.         example, translates to Sonya Gonzalez
A common master product identifier             widow of Martinez. In this case, the
shared by all suppliers makes it much          text widow of might not be recognized
easier to support roll-up and drill-down       properly by the organization if it is just
reports and analyses based on a family         processing standard first and last name
of products.                                   elements. This could be important for a
                                               banking institution that is trying to dis-
Well-known standards include:                  burse the funds of a will to the correct
• United Nations Standard Products             Sonya Gonzalez.
  and Services Code (UNSPSC)
• Export Control Classification Number         Another example is parsing vendor
  (ECCN−used by the import/export              information for paying invoices. An
  and transportation industries                input string in Portuguese could read
• International Organization for               Computadoras Brasileiras Incorpora-
  Standardization (ISO)                        das representante autorizado de Micro-
• National Stock Number Codes                  soft. The text representante autorizado
  (NSN)−used by the military                   de is the equivalent to authorized agent
                                               for (in this case, Microsoft). What
It is very important, therefore, that          exists here is a business relationship       The Issue of Increasing
data cleansing tools provide support           between Computadoras Brasileiras             Complexity
for these standards.                           Incorporadas and Microsoft, and this
                                               understanding is crucial if a payment is     Corporate data is created and managed
Supporting National Languages                  to be made payable to the right party. It    by many different business processes
                                               is important when dealing with interna-      (see Figure 4) and in most companies,
Whether an organization has custom-            tional data to have a data quality solu-     the applications supporting those busi-
ers, suppliers, or products in the United      tion that is Unicode-enabled, and that       ness processes have been developed
States, Europe, Asia, or any combina-          recognizes international characters oth-     over many years. Some involve legacy
tion of countries, it is important that the    er than those found in the English lan-      applications on centralized mainframes,
level of data quality is the same across       guage. By applying data quality global-      while others employ distributed comput-
all data stores, and it is vital to be able    ly, organizations have a more complete       ing technologies, such as client/server
to trust the data, regardless of what          view of customer and product informa-        computing, Web-based computing,
language the original data is in. Equally      tion, which improves business deci-          or Web services. Some are developed
important is the ability to present data       sions, customer service, and supply          in-house, while others use packaged
in local languages, and to be able to          chain management.                            solutions from applications vendors
tailor data quality program rules to suit
a specific language or region.




10   SAP Thought Leadership – Information Management
or software-as-a-service vendors.           A Universal Approach to Data                    dynamically by applications (see
Company acquisitions and mergers            Quality Management Is Required                  Figure 4). This allows business rules for
add to the application mix. These appli-                                                    data quality enforcement to be moved
cations have usually been developed         The trend by organizations toward a             outside of applications and applied
and deployed independently of each          services-oriented architecture (SOA)            universally at a business process level,
other, without any notion that one day      for building new applications and               rather than at the individual application
the data they create and manage may         integrating existing ones reduces               level. These services can be called
be used by other applications for           point-to-point application connections          proactively by applications as data is
other purposes.                             and eases application integration. It           entered into an application system,
                                            also encourages developers to think             or reactively by batch data quality man-
It is this complex and unmanaged envi-      in terms of supporting business pro-            agement tools after the data has been
ronment that has led to many of the         cesses as a set of services, rather than        created. Single-record rule enforcement
data quality problems that exist today,     as a package of monolithic applications.        can be done proactively, whereas rule
and given the huge growth in the data                                                       enforcement that spans records, data-
being generated both inside and out-        The SOA approach supports any task              bases, and applications systems will
side of companies, application com-         that can be defined as a service. This          typically be done reactively.
plexity and data quality is likely to       concept not only applies to business
steadily get worse unless companies         process tasks, but also to data and
pay more attention to enforcing data        system management tasks as well.
quality and evolve to support a univer-     This enables the tasks associated with
sal approach to managing data cleans-       a data quality program to be deployed
ing and data quality management.            as a set of services that can be called

                                                                                            Figure 4: An SOA Can Enable a Universal
                                                                                            Approach to Data Quality Management
         web                         external and
      content                        syndicated
                     Business                            Business       plans
        digital                      information
                   collaboration                          planning      forecasts
        media       processes        email               processes      budgets
                                     instant messages
   workgroup
   documents

                                   Service-oriented
    corporate                        architecture
                                                                        low-latency
   documents         Business                             Business      data
                    transaction                          intelligence
   transaction       processes                            processes     historical
         data                                                           data



                                        Master          Data quality
                     integrated                          services
                    master data          data
                                      processes




                                                                                     SAP Thought Leadership – Information Management   11
sap Businessobjects inFormation
management solUtions




SAP is an industry-leading provider of         products, relational databases, model-        • An open metadata repository that
business intelligence and information          ing tools, and third-party solutions. IT        allows the tool to be used by
management solutions. The objective            staff and business analysts use a thin-         custom-built applications
of information management is to pro-           client Web interface to explore metada-       • Scheduling of profiling tasks to run
vide customers with a flexible solution        ta and gain a better understanding of           any time without operator
for building an enterprise-wide data           the metadata used in projects, to sup-          intervention
integration and data quality manage-           port impact analysis, and to report on
ment environment.                              metadata and data lineage.                    The knowledge gained from SAP
                                                                                             BusinessObjects Data Insight concern-
SAP BusinessObjects solutions sup-             SAP BusinessObjects                           ing existing data quality issues can be
port three key information management          Data Insight™                                 used to develop a detailed strategy for
functions in data integration, data                                                          cleansing and correcting data defects
quality, and metadata management.              SAP BusinessObjects Data Insight              using SAP BusinessObjects Data
                                               software is a data profiling tool that        Quality Management software.
Data Integration                               handles the data assessment task (see
Key data integration solutions include:        Figure 1) of a data quality program. It is    SAP BusinessObjects Data
• SAP BusinessObjects Data Integra-            used to monitor, measure, analyze, and        Quality Management
  tor software, a batch-oriented               report on the quality of data stored in
  extract, transformation, and load            relational database systems and flat          SAP BusinessObjects Data Quality
  (ETL) tool supporting enterprise data        files. It helps users gain visibility and a   Management handles the data cleans-
  consolidation and federation                 detailed understanding of the data qual-      ing, enhancement, matching, and
• SAP BusinessObjects Data                     ity defects in the source data. Key fea-      consolidation tasks of the data quality
  Federator software, an enterprise            tures of SAP BusinessObjects Data             program shown in Figure 1. The
  information integration tool enabling        Insight include:                              main facilities provided by SAP
  on-demand data access to a variety           • Flexible processing models that can         BusinessObjects Data Quality
  of data sources                                 process source data in place or after      Management include:
• SAP BusinessObjects Rapid Marts®                extracting it into SAP BusinessOb-         • Parsing and standardizing customer
  packages, packaged data integration             jects Data Insight                           and other types of operational data
  solutions for enterprise applications        • Automatic discovery of business             • Correcting data based on supplied
  such as Oracle, PeopleSoft, and                 rules and relationships                      and user-defined dictionaries and
  Siebel                                       • Support for data redundancy profil-           business rules
                                                  ing, drill-down frequency distribu-        • Appending additional information
Data Quality                                      tions, cross-column and cross-table          such as phone numbers to provide
SAP BusinessObjects Data Insight™                 comparisons, and pattern recognition         a more complete set of data
software and BusinessObjects Data              • Allows users to establish thresholds        • Matching data to build data relation-
Quality Management software enable                and set up automated alerts to notify        ships, and identify and remove dupli-
the profiling, cleansing, enhancement,            if analysis results exceed a specific        cate records
matching, consolidation, and monitoring           quality metric                             • Consolidating data to create a single
of data.                                       • Graphical and dashboard reports               view across databases
                                                  (using Crystal Reports® software)
Metadata Management                               that provide summary profiles, fre-
SAP BusinessObjects Metadata                      quency distributions, and referential
Management software collects and                  integrity information
unifies metadata from BI objects, ETL




12   SAP Thought Leadership – Information Management
SAP BusinessObjects Data Quality
                                                 Data Quality
                                               Project Architect                                   Management now includes universal
                                                                                                   data cleanse (UDC), an add-on option
                                                                                                   that allows user-defined transforms and
                                                                                                   their associated dictionaries and busi-
     Web Browser                                                    Universal
                                                                     Data                          ness rules to be created for cleansing
                          Web Server
                                                                    Cleanse
                                                                                   Data            other types of data (unstructured prod-
                                                                                  Insight          uct data, for example). Key features of
                             Project                                                               the data cleansing capability include:
                             portal
                                                     Data                                          • Identifies, standardizes, and corrects
       Packaged                                     Quality
      Applications         Data quality                                                              global data for over 200 countries
                          web services              Server
                                                                   Data Quality                    • Provides international name (firm and
                          ( NET & Java)
                                                                   Repository                        person, prefix, title) parsing packages
                                                                   (workflows,                       for numerous countries
        Custom
                                                                    transforms
      Applications
                                                                   and runtime                     • Identifies customer information
                                                                     metadata)                       (addresses, phone numbers, social
                                  Metadata Repository                                                security numbers, dates) and verifies
                                   (runtime statistics)                                              that it is properly formatted
                                                                                                   • Allows users to add custom dictionar-
Figure 5: SAP BusinessObjects Data Quality Management                                                ies and rules to parse and standard-
                                                                                                     ize other data elements, such as part
At the heart of SAP BusinessObjects              Support for Web services and an SOA                 numbers, product codes, purchase
Data Quality Management is a data                environment in SAP BusinessObjects                  orders, SKUs, customer identification
quality server (see Figure 5) that               Data Quality Management allows data                 numbers, and so forth
executes workflows and workflow                  quality software services to be shared            • Handles free-form text up to 8,000
transforms to carry out data quality             by multiple business processes. Sepa-               characters in size and parses it into
management tasks that have been                  rating data quality rules from the pro-             appropriate structured data fields
defined using the provided GUI-driven            cesses that use them improves flexibili-
project architect. These workflows               ty and makes it possible for the rules            User-defined dictionaries can be
can be called dynamically as a Web               to be dynamically maintained to meet              populated interactively using the
service, run as a standalone batch job,          changing business needs.                          project architect, or can be bulk loaded
or integrated with external systems and                                                            from an XML file. The bulk load capabili-
applications, such as SAP software,              SAP BusinessObjects Universal                     ty is useful for loading industry standard
Informatica, Oracle, PeopleSoft,                 Data Cleanse                                      product codes and classifications,
and Siebel.                                                                                        weights and measures tables, interna-
                                                 The data cleansing process of                     tional language translation tables, and
The capability of SAP BusinessObjects            SAP BusinessObjects Data Quality                  data classifications identified using SAP
Data Quality Management to call a data           Management standardizes data to                   BusinessObjects Data Insight software.
quality workflow as a Web service in an          ensure a consistent record format, and
SOA environment enables the work-                corrects data based on classification
flows to be used dynamically by an               dictionaries and business rules. Prior
organization’s operational business              to the current release, the product
processes and the quality of informa-            provided cleansing transforms that
tion to be checked, in-flight, as it flows       focused primarily on customer data.
between systems and applications.




                                                                                            SAP Thought Leadership – Information Management   13
Product Data                                   One SAP BusinessObjects customer            processes, regardless of what language
                                               uses this approach to reconcile data for    the original data is in. Equally important
An important use case for UDC is the           140,000 products that are managed in        is the ability to present data in local
cleansing, matching, and consolidation         11 different data repositories. These       languages. UDC is Unicode-enabled,
of structured and unstructured product         techniques could also be extended to        supports mixed language product
data. A goal for a manufacturer, for           support the reconciling of product data     descriptions, and incorporates
example, may be to improve the consis-         for integration into a single master data   language-specific lexicons for colors,
tency between item descriptions on a           management system.                          sizes, and weights. It is, therefore,
Web site and those in product cata-                                                        ideally suited for cleansing data in
logs. To achieve consistency, the com-         Multinational Data                          companies whose operations span
pany would parse the item descriptions                                                     many different countries.
on the Web site and in the catalogs,           Another scenario for UDC is for the
understand the relationships between           processing of multinational data. For
the items, and use the results to cor-         an increasing number of organizations,
rect the item entries in each location to      it is important to be able to create
make them consistent.                          intelligent and standardized parsing




14   SAP Thought Leadership – Information Management
sUmmarY




Companies today cannot operate effec-         It is important that data quality tasks       For More Information
tively, or compete successfully, unless       are not buried in individual applications,
they give their users timely access to        but are implemented instead as a set of       To learn more about SAP
accurate and consistent information.          services that can be used by any appli-       BusinessObjects information manage-
The three key words here are timely,          cation from anywhere in the organiza-         ment solutions from SAP, visit our
accurate, and consistent.                     tion. This approach ensures that busi-        Web site at www.sap.com/
                                              ness processes obey a set of common           sapbusinessobjects.
The problem is that many organizations        business rules, and permits data quality
don’t know how consistent or accurate         service levels to be adjusted without
their data is, or how much poor data          directly affecting existing applications.
quality data is costing them. This is         It enables a data quality program to be
because these organizations don’t con-        deployed in a phased approach, one
sider data quality requirements from          business process at a time, and allows
the perspective of individual business        applications to be migrated to a com-
users and processes. Data quality is          mon data quality approach in an
not an all-or-nothing concept. Different      orderly fashion.
users need different levels of data qual-
ity, and it is essential to categorize data   Executives have a lot at stake when it
by business needs and required data           comes to data quality and the tough
quality service levels. It is important to    corporate compliance environment that
profile and constantly monitor data to        we live in today. It is crucial, therefore,
ensure that it is satisfying those ser-       that data quality, like data integration,
vice levels, and to rapidly correct any       be viewed from an enterprise perspec-
deficiencies that are degrading service       tive. SAP is a company that recognizes
levels. Organizations also want to            this need and, through its information
ensure that data quality processes are        management solutions, is addressing
scalable and sharable, and provide the        that need.
flexibility to allow users to dynamically
change business rules based on busi-
ness needs and context.




                                                                                     SAP Thought Leadership – Information Management   15
50 094 570 (09/03)
©2009 by SAP AG.

All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge,
ByDesign, SAP Business ByDesign, and other SAP products and services
mentioned herein as well as their respective logos are trademarks or
registered trademarks of SAP AG in Germany and other countries.

Business Objects and the Business Objects logo, BusinessObjects,
Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other
Business Objects products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of Business
Objects S.A. in the United States and in other countries. Business
Objects is an SAP company.

All other product and service names mentioned are the trademarks of
their respective companies. Data contained in this document serves
informational purposes only. National product specifications may vary.

These materials are subject to change without notice. These materials
are provided by SAP AG and its affiliated companies (“SAP Group”)
for informational purposes only, without representation or warranty of
any kind, and SAP Group shall not be liable for errors or omissions with
respect to the materials. The only warranties for SAP Group products and
services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should
be construed as constituting an additional warranty.




   www.sap.com /contactsap

More Related Content

What's hot

What is the price of bad customer data?
What is the price of bad customer data?What is the price of bad customer data?
What is the price of bad customer data?
DataValueTalk
 
Managing Dirty Data In Organization Using Erp
Managing Dirty Data In Organization Using ErpManaging Dirty Data In Organization Using Erp
Managing Dirty Data In Organization Using Erp
Donovan Mulder
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
DATAVERSITY
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalytics
Health Care DataWorks
 

What's hot (20)

What is the price of bad customer data?
What is the price of bad customer data?What is the price of bad customer data?
What is the price of bad customer data?
 
Example data specifications and info requirements framework OVERVIEW
Example data specifications and info requirements framework OVERVIEWExample data specifications and info requirements framework OVERVIEW
Example data specifications and info requirements framework OVERVIEW
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Information Management Capabilities, Competencies & Staff Maturity Assessment
Information Management Capabilities, Competencies & Staff Maturity AssessmentInformation Management Capabilities, Competencies & Staff Maturity Assessment
Information Management Capabilities, Competencies & Staff Maturity Assessment
 
Managing Dirty Data In Organization Using Erp
Managing Dirty Data In Organization Using ErpManaging Dirty Data In Organization Using Erp
Managing Dirty Data In Organization Using Erp
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate Environment
 
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve thei...
 
Managing Data as a Strategic Resource – Foundation of the Digital and Data-Dr...
Managing Data as a Strategic Resource – Foundation of the Digital and Data-Dr...Managing Data as a Strategic Resource – Foundation of the Digital and Data-Dr...
Managing Data as a Strategic Resource – Foundation of the Digital and Data-Dr...
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalytics
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance framework
 
Big Data Forum - Phoenix
Big Data Forum - PhoenixBig Data Forum - Phoenix
Big Data Forum - Phoenix
 
Bigdata
BigdataBigdata
Bigdata
 
A Data Quality Model for Asset Management in Engineering Organisations
A Data Quality Model for Asset Management in Engineering OrganisationsA Data Quality Model for Asset Management in Engineering Organisations
A Data Quality Model for Asset Management in Engineering Organisations
 
Data-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality Challenges
 
05. Physical Data Specification Template
05. Physical Data Specification Template05. Physical Data Specification Template
05. Physical Data Specification Template
 
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision MakingData Science and its Relationship to Big Data and Data-Driven Decision Making
Data Science and its Relationship to Big Data and Data-Driven Decision Making
 
Organizing Master Data Management
Organizing Master Data ManagementOrganizing Master Data Management
Organizing Master Data Management
 
04. Logical Data Definition template
04. Logical Data Definition template04. Logical Data Definition template
04. Logical Data Definition template
 
Boosting Cybersecurity with Data Governance (peer reviewed)
Boosting Cybersecurity with Data Governance (peer reviewed)Boosting Cybersecurity with Data Governance (peer reviewed)
Boosting Cybersecurity with Data Governance (peer reviewed)
 

Viewers also liked

RAZONES DATA CLEANSING
RAZONES DATA CLEANSINGRAZONES DATA CLEANSING
RAZONES DATA CLEANSING
Noel Poler
 
Data analysis and cleansing
Data analysis and cleansingData analysis and cleansing
Data analysis and cleansing
DemandGen
 

Viewers also liked (8)

What to Expect When Investing in Data Cleansing Services?
What to Expect When Investing in Data Cleansing Services?What to Expect When Investing in Data Cleansing Services?
What to Expect When Investing in Data Cleansing Services?
 
RAZONES DATA CLEANSING
RAZONES DATA CLEANSINGRAZONES DATA CLEANSING
RAZONES DATA CLEANSING
 
Data analysis and cleansing
Data analysis and cleansingData analysis and cleansing
Data analysis and cleansing
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
 
Data Quality - The Cleansing Process
Data Quality - The Cleansing ProcessData Quality - The Cleansing Process
Data Quality - The Cleansing Process
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Best Practices to Implement an Effective Change Control Program Company Wide
Best Practices to Implement an Effective Change Control Program Company WideBest Practices to Implement an Effective Change Control Program Company Wide
Best Practices to Implement an Effective Change Control Program Company Wide
 

Similar to Developing A Universal Approach to Cleansing Customer and Product Data

Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskModernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Worldwide Business Research
 

Similar to Developing A Universal Approach to Cleansing Customer and Product Data (20)

Overall Approach To Data Quality Roi
Overall Approach To Data Quality RoiOverall Approach To Data Quality Roi
Overall Approach To Data Quality Roi
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdf
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
 
D&B Whitepaper The Big Payback On Data Quality
D&B Whitepaper The Big Payback On Data QualityD&B Whitepaper The Big Payback On Data Quality
D&B Whitepaper The Big Payback On Data Quality
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
Overall Approach to Data Quality ROI
Overall Approach to Data Quality ROIOverall Approach to Data Quality ROI
Overall Approach to Data Quality ROI
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Making Data Quality a Way of Life
Making Data Quality a Way of LifeMaking Data Quality a Way of Life
Making Data Quality a Way of Life
 
Driving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementDriving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information Management
 
The New Age Data Quality
The New Age Data QualityThe New Age Data Quality
The New Age Data Quality
 
Tom Kunz
Tom KunzTom Kunz
Tom Kunz
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Dw19 t1+ +dq+fundamentals-cvs+template
Dw19 t1+ +dq+fundamentals-cvs+templateDw19 t1+ +dq+fundamentals-cvs+template
Dw19 t1+ +dq+fundamentals-cvs+template
 
Optimizing Solution Value – Dynamic Data Quality, Governance, and MDM
Optimizing Solution Value – Dynamic Data Quality, Governance, and MDMOptimizing Solution Value – Dynamic Data Quality, Governance, and MDM
Optimizing Solution Value – Dynamic Data Quality, Governance, and MDM
 
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing RiskModernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
Modernizing Data Quality & Governance: Unlocking Performance & Reducing Risk
 
Data Verification
Data VerificationData Verification
Data Verification
 
E outsource asia 2010
E outsource asia 2010E outsource asia 2010
E outsource asia 2010
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 

More from FindWhitePapers

More from FindWhitePapers (20)

The state of privacy and data security compliance
The state of privacy and data security complianceThe state of privacy and data security compliance
The state of privacy and data security compliance
 
Is your data at risk? Why physical security is insufficient for laptop computers
Is your data at risk? Why physical security is insufficient for laptop computersIs your data at risk? Why physical security is insufficient for laptop computers
Is your data at risk? Why physical security is insufficient for laptop computers
 
Closing the gaps in enterprise data security: A model for 360 degrees protection
Closing the gaps in enterprise data security: A model for 360 degrees protectionClosing the gaps in enterprise data security: A model for 360 degrees protection
Closing the gaps in enterprise data security: A model for 360 degrees protection
 
Buyers Guide to Endpoint Protection Platforms
Buyers Guide to Endpoint Protection PlatformsBuyers Guide to Endpoint Protection Platforms
Buyers Guide to Endpoint Protection Platforms
 
VMware DRS: Why You Still Need Assured Application Delivery and Application D...
VMware DRS: Why You Still Need Assured Application Delivery and Application D...VMware DRS: Why You Still Need Assured Application Delivery and Application D...
VMware DRS: Why You Still Need Assured Application Delivery and Application D...
 
The ROI of Application Delivery Controllers in Traditional and Virtualized En...
The ROI of Application Delivery Controllers in Traditional and Virtualized En...The ROI of Application Delivery Controllers in Traditional and Virtualized En...
The ROI of Application Delivery Controllers in Traditional and Virtualized En...
 
The Economic Impact of File Virtualization
The Economic Impact of File VirtualizationThe Economic Impact of File Virtualization
The Economic Impact of File Virtualization
 
Geolocation and Application Delivery
Geolocation and Application DeliveryGeolocation and Application Delivery
Geolocation and Application Delivery
 
DNSSEC: The Antidote to DNS Cache Poisoning and Other DNS Attacks
DNSSEC: The Antidote to DNS Cache Poisoning and Other DNS AttacksDNSSEC: The Antidote to DNS Cache Poisoning and Other DNS Attacks
DNSSEC: The Antidote to DNS Cache Poisoning and Other DNS Attacks
 
Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...
Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...
Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...
 
Inventory Optimization: A Technique for Improving Operational-Inventory Targets
Inventory Optimization: A Technique for Improving Operational-Inventory TargetsInventory Optimization: A Technique for Improving Operational-Inventory Targets
Inventory Optimization: A Technique for Improving Operational-Inventory Targets
 
Improving Organizational Performance Through Pervasive Business Intelligence
Improving Organizational Performance Through Pervasive Business IntelligenceImproving Organizational Performance Through Pervasive Business Intelligence
Improving Organizational Performance Through Pervasive Business Intelligence
 
IDC Energy Insights - Enterprise Risk Management
IDC Energy Insights - Enterprise Risk ManagementIDC Energy Insights - Enterprise Risk Management
IDC Energy Insights - Enterprise Risk Management
 
How to Use Technology to Support the Lean Enterprise
How to Use Technology to Support the Lean EnterpriseHow to Use Technology to Support the Lean Enterprise
How to Use Technology to Support the Lean Enterprise
 
High Efficiency in Manufacturing Operations
High Efficiency in Manufacturing OperationsHigh Efficiency in Manufacturing Operations
High Efficiency in Manufacturing Operations
 
Enterprise Knowledge Workers: Understanding Risks and Opportunities
Enterprise Knowledge Workers: Understanding Risks and OpportunitiesEnterprise Knowledge Workers: Understanding Risks and Opportunities
Enterprise Knowledge Workers: Understanding Risks and Opportunities
 
Enterprise Information Management: In Support of Operational, Analytic, and G...
Enterprise Information Management: In Support of Operational, Analytic, and G...Enterprise Information Management: In Support of Operational, Analytic, and G...
Enterprise Information Management: In Support of Operational, Analytic, and G...
 
Enabling Strategy and Innovation: Achieving Optimized Outcomes from Planning ...
Enabling Strategy and Innovation: Achieving Optimized Outcomes from Planning ...Enabling Strategy and Innovation: Achieving Optimized Outcomes from Planning ...
Enabling Strategy and Innovation: Achieving Optimized Outcomes from Planning ...
 
Data Migration: A White Paper by Bloor Research
Data Migration: A White Paper by Bloor ResearchData Migration: A White Paper by Bloor Research
Data Migration: A White Paper by Bloor Research
 
Automating Stimulus Fund Reporting: How New Technologies Simplify Federal Rep...
Automating Stimulus Fund Reporting: How New Technologies Simplify Federal Rep...Automating Stimulus Fund Reporting: How New Technologies Simplify Federal Rep...
Automating Stimulus Fund Reporting: How New Technologies Simplify Federal Rep...
 

Recently uploaded

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
Renandantas16
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Abortion pills in Kuwait Cytotec pills in Kuwait
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
lizamodels9
 

Recently uploaded (20)

Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 

Developing A Universal Approach to Cleansing Customer and Product Data

  • 1. SAP Thought Leadership Information Management Developing a Universal approach to cleansing cUstomer anD proDUct Data
  • 2.
  • 3. content 4 Executive Summary 5 Data Quality: What’s The Problem? 5 What Is Data Quality? 5 Measuring Data Quality 6 Components of a Data Quality Program 7 A Business Process Perspective to Data Quality 7 Data Quality Problems by Business Entity 8 The Role of Master Data 8 Types of Data Involved in Managing Data Quality 9 Cleansing Customer and Product Data 9 Product Data Example 10 Supporting Standards 10 Supporting National Languages 10 The Issue of Increasing Complexity 11 A Universal Approach to Data Quality Management Is Required 12 SAP BusinessObjects Information Management Solutions 12 Data Integration 12 Data Quality 12 Metadata Management 12 SAP BusinessObjects Data Insight 12 SAP BusinessObjects Data Quality Management 13 SAP BusinessObjects Universal Data Cleanse 14 Product Data 14 Multinational Data 15 Summary 15 For More Information About the Author Colin White, BI Research; BI Research is a research and consulting company whose goal is to help companies understand and exploit new developments in business intelligence and business integration. When combined, business intelli- e gence and business integration enable an organization to become a smart and agile business. For more information, visit www.bi-research.com.
  • 4. eXecUtive sUmmarY Data quality has always been an impor- are addressing quality problems with why companies need to take an enter- tant issue for companies, and this is customer, product, and other types of prise-wide approach to data cleansing even more the case today. Business corporate data. It explains why data and data quality management. Products and legislative pressures coupled with quality projects often evolve into mas- and use cases from SAP are used in the explosion in the amount of data ter data management projects, and the paper to demonstrate how vendors created by organizations is leading to discusses how master data can be are supporting data cleansing, and to increased corporate attention on extracted from unstructured business illustrate how customers are using such improving the quality and accuracy content and integrated into traditional functionality. of business data. corporate data systems. This paper reviews current industry The paper looks at the role of data problems concerning data quality, and cleansing tools in helping improve data takes a detailed look at how companies consistency and accuracy, and explains 4 SAP Thought Leadership – Information Management
  • 5. Data QUalitY: What’s the proBlem? THE COST OF POOR DATA QUALITy There has been a steady stream of arti- Although nobody disputes that poor 100% safe. The degree to which a cus- cles in the press about the impact of data quality has a significant financial tomer can accept a less than perfect or poor data quality on the business ever impact on companies, the figures in safe product will depend on the type of since companies began building data these articles are so variable and so customer buying the product, and on warehouses and examining the quality imprecise that there is a risk of reader how the product will be used. The qual- of data flowing into them. Here are fatigue when discussing data quality ity and safety of a product, however, is some early examples: issues at this superficial level of detail. also dependent on how much money Maybe this is why so many reports a manufacturer is willing to spend to “The survey showed that estimates of show that a high percentage of organi- develop and produce the product, poor information quality costs vary zations feel they have a data quality which in turn is dependent on how from one to 20% of total company rev- problem, while at the same time indicat- much money a customer is willing to enues. This disparity in cost estimation ing that an equally large number of orga- pay for it. All of these factors apply existed not only from company to com- nizations are also doing little about it. equally to providing and supporting pany but among senior information data quality. technology managers within the same The problem is that “data quality” organization.”1 is a vague term that means different Measuring Data Quality things to different people. Before “The Data Warehousing Institute data quality problems can be dis- While researching ISO 8402, I came (TDWI) estimates that poor quality cus- cussed in detail, data quality must across an excellent 1994 paper by tomer data costs U.S. businesses a be defined more precisely. Robert Walker, chairperson of the staggering $611 billion a year in post- Association for Geographic Information age, printing, and staff overhead.”2 What Is Data Quality? (AGI) Standards Data Quality Working Group. He had this to say about the More recent articles show the situation A useful definition of quality can be quality of data: has not improved: found in the ISO 8402 specification, which defines quality as, “The totality “Clearly, quality issues vary depending “Poor data quality costs the typical of characteristics of an entity that bear on the type of data and the purpose to company at least 10% of revenue; on its ability to satisfy stated or implied which it is put. The criteria that are 20% is probably a better estimate.”3 needs.” If we apply this definition to widely used for assessing data quality data, then it means that data quality is are as follows: “Companies are investing billions of dependent not only the characteristics • Lineage: This covers the ancestry of dollars in CRM applications and data of the data itself, but also on the busi- the dataset. It describes the source integration projects to gain a better ness context in which it is used (that is, material from which the data was view of their customers – only to dis- on the business processes and busi- derived and the methods of deriva- cover that conflicting data makes them ness users that employ it). tion. The description of the source blind. Gartner estimates that more than material should include details of edi- 25% of critical data within large busi- This is somewhat analogous to product tions and dates of that source materi- nesses is somehow inaccurate or quality. All product manufacturers have al and any ancillary information used incomplete. And that imprecise data quality assurance programs, but no for updates. The methods of deriva- is wreaking havoc.”4 product is ever 100% perfect, or tion should include details of the 1. Andrea Malcolm, “Poor Data Quality Costs 10% of Revenues, Survey Reveals,” ComputerWorld, July 1998. 2. Wayne W. Eckerson, “Data Quality and the Bottom Line,” TDWI Report Series, 2002. 3. Thomas C. Redman, “Data: An Unfolding Quality Disaster,” DM Review, August 2004. 4. Rick Whiting, “Hamstrung By Defective Data,” InformationWeek, May 2006. SAP Thought Leadership – Information Management 5
  • 6. operations used, the logical process- The first task is to measure and analyze additional data elements can be added es (for example, algorithms and (or profile) the data to determine the as required to the data to enhance its transformations) and details of any type and number of defects that may information content. subsequent processing used in exist. This step provides the knowledge producing the final digital data. required to develop a strategy to Data associated with a business pro- • Accuracy: This is the closeness of improve the quality of the data. Once cess may come from a variety of differ- the topic data associated with an this has been done, the next task is to ent sources, and although the data may object or feature, to the true values, cleanse (parse, standardize, and cor- be consistent and correct for any given or values that are accepted as being rect) the data. This requires parsing it source, merging it with other sources true. It is usually recorded as a per- into its individual components, standard- can lead to duplicate records, and centage correctness for each topic izing data formats and values based on redundant or inconsistent data ele- or attribute. internal and industry business definitions ments. The role of the match and con- • Currency: This is the level at which and rules, and applying cleansing algo- solidate tasks is to resolve and remove the data is current (as opposed to rithms to remove syntactical or semantic this duplicate, redundant, and inconsis- the date of the last update). data errors. At this stage in the process, tent data. Once the data has been • Logical consistency: This is the fidelity of data in relation to the Continuous Monitoring data structures. Measures ongoing data quality • Completeness: This is a measure scores and provides alerts of the correspondence between the Measure real world and the specified dataset. Quantifies the number and types of defects It includes a statement of the rules Consolidate for including or excluding items from Combines unique Analyze the dataset.”5 data elements from Assesses the matched records into a single source nature and Although these comments concerned cause of data defects geographic data, the attributes of Match lineage, accuracy, currency, logical consistency, and completeness apply Identifies duplicate records within Your equally to all forms of data, regardless multiple tables or databases Data of its type or source. To maintain data quality, organizations need to Match & Consolidate constantly monitor and measure these Parse Enhance Identifies and attributes as a part of an ongoing data Appends additional isolates data quality program. data increasing the elements in value of the information data structures Components of a Data Quality Data Enhancement Standardize Program Normalizes data value and Correct formats according to your Figure 1 (courtesy of SAP) illustrates Verifies, scrubs and rules and that of third-party references the main tasks of a data quality appends data based on a set of sophisticated program. algorithms that work with secondary source Data Cleansing Figure 1: Components of a Data Quality Program [Source: SAP] 5. R.S. Walker, Data quality standard, In: Proceedings of AGI’94 Conference, 1994. 6 SAP Thought Leadership – Information Management
  • 7. cleansed, enhanced, consolidated, and An example of the business case for required data quality, and a business merged, it is ready for use. The quality approaching data quality improvement case built to address those data of the data, however, must be moni- from a business process and business quality issues that provide the best tored and profiled on a continuing basis entity perspective can be found in a return on investment. Unfortunately, to measure its accuracy, currency, con- reply to an April 2006 Business Intelli- this understanding is severely lacking sistency, and completeness. gence Network blog (“Surprise! Poor in many companies. Data Quality Costs a Lot of Money”) A Business Process Perspective by David Loshin: Data Quality Problems by to Data Quality Business Entity “At Shell, the business thought it was IT staff usually think of data quality ini- selling 20,000 product combinations to A TDWI survey published in an April tiatives in terms of individual applica- 20,000 commercial customers, and 2006 research report on data quality tions. From a business perspective, was actually selling 5,000 products to (“Taking Data Quality to the Enterprise however, the problems caused by poor 6,000 customers! It can be seen that through Data Governance” by Philip data quality typically surface in the the potential for cost saving in a situa- Russom) showed that customer and business processes of the organiza- tion like that was significant.” product data are the two main types tion. Examples include: of business entity that are especially • Wasted marketing efforts and costs Data quality initiatives to correct susceptible to quality problems. Sur- • Customer dissatisfaction problems like those at Shell can only prisingly, as can be seen in Figure 2, • Shipment, delivery, and billing errors succeed, however, if an organization financial data came third in the table of • Excess inventory understands its business processes, results. Customer and product are two • Incorrect financial statements and the data and data quality required of the main components of the master • Delays in new product introduction by those processes. In this way, actual data of an organization. data quality can be measured against To improve data quality, IT staff must change their application-specific approach to data quality initiatives, and Type of Business Data Percentage of Respondents align them instead with key corporate Customer Data 74% business processes, such as customer Product Data 43% marketing and supply chain optimiza- Financial Data 36% tion, and underlying business entities Sales Contact Data 27% like customer and product. This align- Data from ERP Systems 25% ment helps improve data consistency across application systems, which in Employee Data 16% turn simplifies data integration tasks, International Data 12% such as building an integrated product Other 10% catalog or creating a single view of the customer. Figure 2: Business Entities Susceptible to Data Quality Problems SAP Thought Leadership – Information Management 7
  • 8. The Role of Master Data and simple data structures, has simple As companies begin to realize that they keys and governance rules, is often must focus data quality efforts on spe- Master data is defined as data about standardized (U.S. state codes, for cific business processes and business the key business entities of an organi- example), involves only a few applica- entities, the ability to manage master zation. Examples include customer, tions, and is reasonably stable. data becomes increasingly important. product, organizational structure, and This is why many data quality initiatives chart of accounts. A common question Master business entity data, such as often evolve into master data manage- about master data is, “What is the dif- customer and product, on the other ment projects. The Business Intelli- ference between master data and refer- hand, is usually ill-defined, has complex gence Network recently published a ence data?” Some people take the data structures and relationships, report (“The Pillars of Master Data position that they are the same thing, requires compound and intelligent keys Management: Data Profiling, Data Inte- but it can be argued that not all refer- and complex governance rules, is not gration and Data Quality”) by David ence data is master data. For example, usually standardized, involves many Loshin that discusses this topic in lookup and code tables that are used applications, and changes frequently. detail and is recommended reading. to encode information, such as state names and order codes, are not strictly Does this distinction really matter? Types of Data Involved in Manag- master data tables. When developing data quality manage- ing Data Quality ment and master data management The dividing line between master data systems, it does. Cleaning and manag- A data quality program involves many and reference data is not always clear- ing master reference data is a reason- different types of data. The data may cut. One solution is to break master ably easy job. Cleaning master data, on be current or historic, detailed or sum- data into two types: master reference the other hand, is a complex task and marized, and may have a structured, data and master business entity data. requires data cleansing tools that are unstructured, or semi-structured for- Master reference data has well-defined specifically designed for that purpose. mat. It may consist of master business entity or master reference data, or activity data created by business trans- action (order entry, billing, shipping), BI (reporting, analytics), planning (bud- geting, forecasting), and collaborative (e-mail, blogging) applications. All of this data must be structurally and semantically consistent with its associ- ated technical and business rules. One of the main objectives of a data quality program is to assess and correct data consistency and accuracy problems caused by applications that do not fully enforce those rules. Most data quality initiatives focus almost exclusively on structured data, which represents only a fraction of the data that exists in an organization. There is, however, increasing interest in 8 SAP Thought Leadership – Information Management
  • 9. leveraging the value of unstructured customers are providing data in lan- (CD, Jumbo CD, and so on), and wish data in both master data and business guages other than English. These to standardize the values for yield and intelligence (BI) applications. This is problems are causing considerable minimum deposit. especially the case for customer and business inefficiency and are leading product data, which may be managed to costly mistakes being made. Solving These types of data cleansing opera- in unstructured files, or encapsulated in these problems, therefore, can provide tions can be used for all types of a single field with a structured file or significant business benefit. unstructured information. To date, database. To be useful this unstruc- most companies have focused on tured data must be cleansed and trans- Product Data Example customer data, but many are now formed into a semi-structured (XML, Figure 3 illustrates how unstructured beginning to process product and for example) or structured format. data describing financial product offer- other types of data. ings can be decomposed into separate Cleansing and transforming unstruc- data elements, and then used to Customer data is usually easier to tured data improves operational effi- populate a product table. The original deconstruct and clean than product ciency and can add significant value to unstructured data could come from a data. Although there are variations in existing decision-making applications. Web page or a product catalog, but in people names, company names, titles, Examples of applications here include: some cases may simply be encoded phone numbers, and addresses, cus- • Customer and market intelligence – in a single field of a structured data- tomer data records are reasonably con- Internet Web pages base record. sistent in the number of data elements • Customer sentiment and complaint involved. This is mainly due to the fact analysis – Web logs (blogs) and In this example, the customer wants a that most countries have standardized customer support center call records data quality solution to parse, standard- name and address formats. This is not • Product safety and quality analysis – ize, and cleanse the financial data, and the case for product data, which is Service center records to identify the financial institutions that more variable, and involves a wider • Product master data management – are offering certificates of deposit range of industry standards. Some Product catalogs (CD). They also want information about of these standards are very specific, • Legal discovery – E-mails and maturity (3 months, 1 year, and so on) while others are global in nature. instant messages and the type of CD being offered Cleansing Customer and Product Data Every company struggles with the enor- mous amount of customer and product data that is represented in a variety of ways across disparate systems or departments. A product name such as LIGHT BULB-120 VOLT 100 WATT may be entered multiple times with dif- ferent capitalization, spelling, abbrevia- tions, and punctuation. Adding to this issue is the problem of supporting multiple languages in today’s global marketplace where suppliers and Figure 3: Cleansing Unstructured Product Data SAP Thought Leadership – Information Management 9
  • 10. Supporting Standards Organizations that deal with global information often need to adjust data Many organizations create their own fields for unique regional attributes and hierarchical product codes, while others relationships. In some countries there follow industry standards where possi- are titles and relationships that may not ble. Supply chain efficiency can be be known in the English language. gained by appending industry standard Sonya Gonzalez viuda de Martinez, for product codes to product descriptions. example, translates to Sonya Gonzalez A common master product identifier widow of Martinez. In this case, the shared by all suppliers makes it much text widow of might not be recognized easier to support roll-up and drill-down properly by the organization if it is just reports and analyses based on a family processing standard first and last name of products. elements. This could be important for a banking institution that is trying to dis- Well-known standards include: burse the funds of a will to the correct • United Nations Standard Products Sonya Gonzalez. and Services Code (UNSPSC) • Export Control Classification Number Another example is parsing vendor (ECCN−used by the import/export information for paying invoices. An and transportation industries input string in Portuguese could read • International Organization for Computadoras Brasileiras Incorpora- Standardization (ISO) das representante autorizado de Micro- • National Stock Number Codes soft. The text representante autorizado (NSN)−used by the military de is the equivalent to authorized agent for (in this case, Microsoft). What It is very important, therefore, that exists here is a business relationship The Issue of Increasing data cleansing tools provide support between Computadoras Brasileiras Complexity for these standards. Incorporadas and Microsoft, and this understanding is crucial if a payment is Corporate data is created and managed Supporting National Languages to be made payable to the right party. It by many different business processes is important when dealing with interna- (see Figure 4) and in most companies, Whether an organization has custom- tional data to have a data quality solu- the applications supporting those busi- ers, suppliers, or products in the United tion that is Unicode-enabled, and that ness processes have been developed States, Europe, Asia, or any combina- recognizes international characters oth- over many years. Some involve legacy tion of countries, it is important that the er than those found in the English lan- applications on centralized mainframes, level of data quality is the same across guage. By applying data quality global- while others employ distributed comput- all data stores, and it is vital to be able ly, organizations have a more complete ing technologies, such as client/server to trust the data, regardless of what view of customer and product informa- computing, Web-based computing, language the original data is in. Equally tion, which improves business deci- or Web services. Some are developed important is the ability to present data sions, customer service, and supply in-house, while others use packaged in local languages, and to be able to chain management. solutions from applications vendors tailor data quality program rules to suit a specific language or region. 10 SAP Thought Leadership – Information Management
  • 11. or software-as-a-service vendors. A Universal Approach to Data dynamically by applications (see Company acquisitions and mergers Quality Management Is Required Figure 4). This allows business rules for add to the application mix. These appli- data quality enforcement to be moved cations have usually been developed The trend by organizations toward a outside of applications and applied and deployed independently of each services-oriented architecture (SOA) universally at a business process level, other, without any notion that one day for building new applications and rather than at the individual application the data they create and manage may integrating existing ones reduces level. These services can be called be used by other applications for point-to-point application connections proactively by applications as data is other purposes. and eases application integration. It entered into an application system, also encourages developers to think or reactively by batch data quality man- It is this complex and unmanaged envi- in terms of supporting business pro- agement tools after the data has been ronment that has led to many of the cesses as a set of services, rather than created. Single-record rule enforcement data quality problems that exist today, as a package of monolithic applications. can be done proactively, whereas rule and given the huge growth in the data enforcement that spans records, data- being generated both inside and out- The SOA approach supports any task bases, and applications systems will side of companies, application com- that can be defined as a service. This typically be done reactively. plexity and data quality is likely to concept not only applies to business steadily get worse unless companies process tasks, but also to data and pay more attention to enforcing data system management tasks as well. quality and evolve to support a univer- This enables the tasks associated with sal approach to managing data cleans- a data quality program to be deployed ing and data quality management. as a set of services that can be called Figure 4: An SOA Can Enable a Universal Approach to Data Quality Management web external and content syndicated Business Business plans digital information collaboration planning forecasts media processes email processes budgets instant messages workgroup documents Service-oriented corporate architecture low-latency documents Business Business data transaction intelligence transaction processes processes historical data data Master Data quality integrated services master data data processes SAP Thought Leadership – Information Management 11
  • 12. sap Businessobjects inFormation management solUtions SAP is an industry-leading provider of products, relational databases, model- • An open metadata repository that business intelligence and information ing tools, and third-party solutions. IT allows the tool to be used by management solutions. The objective staff and business analysts use a thin- custom-built applications of information management is to pro- client Web interface to explore metada- • Scheduling of profiling tasks to run vide customers with a flexible solution ta and gain a better understanding of any time without operator for building an enterprise-wide data the metadata used in projects, to sup- intervention integration and data quality manage- port impact analysis, and to report on ment environment. metadata and data lineage. The knowledge gained from SAP BusinessObjects Data Insight concern- SAP BusinessObjects solutions sup- SAP BusinessObjects ing existing data quality issues can be port three key information management Data Insight™ used to develop a detailed strategy for functions in data integration, data cleansing and correcting data defects quality, and metadata management. SAP BusinessObjects Data Insight using SAP BusinessObjects Data software is a data profiling tool that Quality Management software. Data Integration handles the data assessment task (see Key data integration solutions include: Figure 1) of a data quality program. It is SAP BusinessObjects Data • SAP BusinessObjects Data Integra- used to monitor, measure, analyze, and Quality Management tor software, a batch-oriented report on the quality of data stored in extract, transformation, and load relational database systems and flat SAP BusinessObjects Data Quality (ETL) tool supporting enterprise data files. It helps users gain visibility and a Management handles the data cleans- consolidation and federation detailed understanding of the data qual- ing, enhancement, matching, and • SAP BusinessObjects Data ity defects in the source data. Key fea- consolidation tasks of the data quality Federator software, an enterprise tures of SAP BusinessObjects Data program shown in Figure 1. The information integration tool enabling Insight include: main facilities provided by SAP on-demand data access to a variety • Flexible processing models that can BusinessObjects Data Quality of data sources process source data in place or after Management include: • SAP BusinessObjects Rapid Marts® extracting it into SAP BusinessOb- • Parsing and standardizing customer packages, packaged data integration jects Data Insight and other types of operational data solutions for enterprise applications • Automatic discovery of business • Correcting data based on supplied such as Oracle, PeopleSoft, and rules and relationships and user-defined dictionaries and Siebel • Support for data redundancy profil- business rules ing, drill-down frequency distribu- • Appending additional information Data Quality tions, cross-column and cross-table such as phone numbers to provide SAP BusinessObjects Data Insight™ comparisons, and pattern recognition a more complete set of data software and BusinessObjects Data • Allows users to establish thresholds • Matching data to build data relation- Quality Management software enable and set up automated alerts to notify ships, and identify and remove dupli- the profiling, cleansing, enhancement, if analysis results exceed a specific cate records matching, consolidation, and monitoring quality metric • Consolidating data to create a single of data. • Graphical and dashboard reports view across databases (using Crystal Reports® software) Metadata Management that provide summary profiles, fre- SAP BusinessObjects Metadata quency distributions, and referential Management software collects and integrity information unifies metadata from BI objects, ETL 12 SAP Thought Leadership – Information Management
  • 13. SAP BusinessObjects Data Quality Data Quality Project Architect Management now includes universal data cleanse (UDC), an add-on option that allows user-defined transforms and their associated dictionaries and busi- Web Browser Universal Data ness rules to be created for cleansing Web Server Cleanse Data other types of data (unstructured prod- Insight uct data, for example). Key features of Project the data cleansing capability include: portal Data • Identifies, standardizes, and corrects Packaged Quality Applications Data quality global data for over 200 countries web services Server Data Quality • Provides international name (firm and ( NET & Java) Repository person, prefix, title) parsing packages (workflows, for numerous countries Custom transforms Applications and runtime • Identifies customer information metadata) (addresses, phone numbers, social Metadata Repository security numbers, dates) and verifies (runtime statistics) that it is properly formatted • Allows users to add custom dictionar- Figure 5: SAP BusinessObjects Data Quality Management ies and rules to parse and standard- ize other data elements, such as part At the heart of SAP BusinessObjects Support for Web services and an SOA numbers, product codes, purchase Data Quality Management is a data environment in SAP BusinessObjects orders, SKUs, customer identification quality server (see Figure 5) that Data Quality Management allows data numbers, and so forth executes workflows and workflow quality software services to be shared • Handles free-form text up to 8,000 transforms to carry out data quality by multiple business processes. Sepa- characters in size and parses it into management tasks that have been rating data quality rules from the pro- appropriate structured data fields defined using the provided GUI-driven cesses that use them improves flexibili- project architect. These workflows ty and makes it possible for the rules User-defined dictionaries can be can be called dynamically as a Web to be dynamically maintained to meet populated interactively using the service, run as a standalone batch job, changing business needs. project architect, or can be bulk loaded or integrated with external systems and from an XML file. The bulk load capabili- applications, such as SAP software, SAP BusinessObjects Universal ty is useful for loading industry standard Informatica, Oracle, PeopleSoft, Data Cleanse product codes and classifications, and Siebel. weights and measures tables, interna- The data cleansing process of tional language translation tables, and The capability of SAP BusinessObjects SAP BusinessObjects Data Quality data classifications identified using SAP Data Quality Management to call a data Management standardizes data to BusinessObjects Data Insight software. quality workflow as a Web service in an ensure a consistent record format, and SOA environment enables the work- corrects data based on classification flows to be used dynamically by an dictionaries and business rules. Prior organization’s operational business to the current release, the product processes and the quality of informa- provided cleansing transforms that tion to be checked, in-flight, as it flows focused primarily on customer data. between systems and applications. SAP Thought Leadership – Information Management 13
  • 14. Product Data One SAP BusinessObjects customer processes, regardless of what language uses this approach to reconcile data for the original data is in. Equally important An important use case for UDC is the 140,000 products that are managed in is the ability to present data in local cleansing, matching, and consolidation 11 different data repositories. These languages. UDC is Unicode-enabled, of structured and unstructured product techniques could also be extended to supports mixed language product data. A goal for a manufacturer, for support the reconciling of product data descriptions, and incorporates example, may be to improve the consis- for integration into a single master data language-specific lexicons for colors, tency between item descriptions on a management system. sizes, and weights. It is, therefore, Web site and those in product cata- ideally suited for cleansing data in logs. To achieve consistency, the com- Multinational Data companies whose operations span pany would parse the item descriptions many different countries. on the Web site and in the catalogs, Another scenario for UDC is for the understand the relationships between processing of multinational data. For the items, and use the results to cor- an increasing number of organizations, rect the item entries in each location to it is important to be able to create make them consistent. intelligent and standardized parsing 14 SAP Thought Leadership – Information Management
  • 15. sUmmarY Companies today cannot operate effec- It is important that data quality tasks For More Information tively, or compete successfully, unless are not buried in individual applications, they give their users timely access to but are implemented instead as a set of To learn more about SAP accurate and consistent information. services that can be used by any appli- BusinessObjects information manage- The three key words here are timely, cation from anywhere in the organiza- ment solutions from SAP, visit our accurate, and consistent. tion. This approach ensures that busi- Web site at www.sap.com/ ness processes obey a set of common sapbusinessobjects. The problem is that many organizations business rules, and permits data quality don’t know how consistent or accurate service levels to be adjusted without their data is, or how much poor data directly affecting existing applications. quality data is costing them. This is It enables a data quality program to be because these organizations don’t con- deployed in a phased approach, one sider data quality requirements from business process at a time, and allows the perspective of individual business applications to be migrated to a com- users and processes. Data quality is mon data quality approach in an not an all-or-nothing concept. Different orderly fashion. users need different levels of data qual- ity, and it is essential to categorize data Executives have a lot at stake when it by business needs and required data comes to data quality and the tough quality service levels. It is important to corporate compliance environment that profile and constantly monitor data to we live in today. It is crucial, therefore, ensure that it is satisfying those ser- that data quality, like data integration, vice levels, and to rapidly correct any be viewed from an enterprise perspec- deficiencies that are degrading service tive. SAP is a company that recognizes levels. Organizations also want to this need and, through its information ensure that data quality processes are management solutions, is addressing scalable and sharable, and provide the that need. flexibility to allow users to dynamically change business rules based on busi- ness needs and context. SAP Thought Leadership – Information Management 15
  • 16. 50 094 570 (09/03) ©2009 by SAP AG. All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. www.sap.com /contactsap