SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Management
                                                                                           Challenges

CHAPTER 5                                                                     Business
                                                                            Applications   Module
                                                                                             II
                                                                                                        Information
                                                                                                        Technologies


                                                                                  Development     Foundation
                                                                                    Processes     Concepts




DATA RESOURCE MANAGEMENT

Chapter Highlights                                    Learning Objectives
Section I                                             After reading and studying this chapter, you should
Technical Foundations of Database Management          be able to:
Real World Case: Harrah’s Entertainment and Others:   1. Explain the business value of implementing data
Protecting the Data Jewels
                                                         resource management processes and technologies
Database Management
                                                         in an organization.
Fundamental Data Concepts
                                                      2. Outline the advantages of a database management
Database Structures
                                                         approach to managing the data resources of a
Database Development
                                                         business, compared to a file processing approach.
Section II
                                                      3. Explain how database management software helps
Managing Data Resources
                                                         business professionals and supports the operations
Real World Case: Emerson and Sanofi: Data Stewards       and management of a business.
Seek Data Conformity
Data Resource Management                              4. Provide examples to illustrate each of the
Types of Databases
                                                         following concepts:
Data Warehouses and Data Mining                        a. Major types of databases.
Traditional File Processing                            b.   Data warehouses and data mining.
The Database Management Approach                       c.   Logical data elements.
Real World Case: Acxiom Corporation: Data              d.   Fundamental database structures.
Demands Respect
                                                       e.   Database development.




                                                                                                                 149
150 ● Module II / Information Technologies


  SECTION I                   Technical Foundations of
                              Database Management
Database                      Just imagine how difficult it would be to get any information from an information sys-
                              tem if data were stored in an unorganized way, or if there were no systematic way to
Management                    retrieve them. Therefore, in all information systems, data resources must be organized
                              and structured in some logical manner so that they can be accessed easily, processed ef-
                              ficiently, retrieved quickly, and managed effectively. Data structures and access meth-
                              ods ranging from simple to complex have been devised to efficiently organize and
                              access data stored by information systems. In this chapter, we will explore these con-
                              cepts, as well as the managerial implications and value of data resource management.
                              See Figure 5.1.
                                  Read the Real World Case on data resources in the casino gaming and hospitality
                              industry. We can learn a lot from this case about the importance of protecting the data
                              resources of the organization.


Fundamental                   Before we go any further, let’s discuss some fundamental concepts about how data are
                              organized in information systems. A conceptual framework of several levels of data has
Data Concepts                 been devised that differentiates between different groupings, or elements, of data.
                              Thus, data may be logically organized into characters, fields, records, files, and data-
                              bases, just as writing can be organized in letters, words, sentences, paragraphs, and
                              documents. Examples of these logical data elements are shown in Figure 5.2.

Character                     The most basic logical data element is the character, which consists of a single alpha-
                              betic, numeric, or other symbol. You might argue that the bit or byte is a more ele-
                              mentary data element, but remember that those terms refer to the physical storage
                              elements provided by the computer hardware, discussed in Chapter 3. Using that un-
                              derstanding, one way to think of a character is that it is a byte used to represent a par-
                              ticular character. From a user’s point of view (that is, from a logical as opposed to a
                              physical or hardware view of data), a character is the most basic element of data that
                              can be observed and manipulated.

Field                         The next higher level of data is the field, or data item. A field consists of a grouping of
                              related characters. For example, the grouping of alphabetic characters in a person’s
                              name may form a name field (or typically, last name, first name, and middle initial
                              fields), and the grouping of numbers in a sales amount forms a sales amount field.
                              Specifically, a data field represents an attribute (a characteristic or quality) of some
                              entity (object, person, place, or event). For example, an employee’s salary is an
                              attribute that is a typical data field used to describe an entity who is an employee of a
                              business. Generally speaking, fields are organized such that they represent some logi-
                              cal order. For example, last_name, first_name, address, city, state, zipcode, and so on.

Record                        All of the fields used to describe the attributes of an entity are grouped to form a
                              record. Thus, a record represents a collection of attributes that describe an entity. An
                              example is a person’s payroll record, which consists of data fields describing attributes
                              such as the person’s name, Social Security number, and rate of pay. Fixed-length records
                              contain a fixed number of fixed-length data fields. Variable-length records contain a
                              variable number of fields and field lengths. Another way of looking at a record is that
                              it represents a single instance of an entity. Each record in an employee file describes
                              one specific employee.

File                          A group of related records is a data file, or table. Thus, an employee file would contain
                              the records of the employees of a firm. Files are frequently classified by the application
Chapter 5 / Data Resource Management ● 151




                                   1
REAL WORLD
                                              Harrah’s Entertainment and Others:
  CASE                                        Protecting the Data Jewels


I  n the casino industry, one of the most valuable assets is the
   dossier that casinos keep on their affluent customers, the
   high rollers. But in 2003, casino operator Harrah’s Enter-
tainment Inc. filed a lawsuit in Placer County, California,
Superior Court charging that a former employee had copied
                                                                   lists. Through these documents, employees “acknowledge
                                                                   that they will be introduced to this information and agree not
                                                                   to disclose it on departure from the company,” says Suzanne
                                                                   Labrit, a partner at law firm Shutts & Bowen LLP in West
                                                                   Palm Beach, Florida.
the records of up to 450 wealthy customers before leaving               Although most states have enacted trade-secrets laws,
the company to work at competitor Thunder Valley Casino            Labrit says they have different attitudes about enforcing these
in Lincoln, California.                                            laws with regard to customer lists. “But as a starting point, at
     The complaint said the employee was seen printing the         least you have this understanding [with employees] that the
list—which included names, contact information, and credit         customer information is being treated as confidential,” Labrit
and account histories—from a Harrah’s database. It also            says. Then if an employee leaves to work for a competitor and
alleged that he tried to lure those players to Thunder Valley.     uses this protected customer data, the employer will more
The employee denies the charge of stealing Harrah’s trade          likely be able to take legal action to stop the activity. “If you
secrets, and the case was still pending at this writing, but       don’t treat it as confidential information internally,” she says,
many similar cases have been filed in the past 20 years, legal     “the court will not treat it as confidential information, either.”
experts say.                                                            It’s also important to educate employees about the
     While savvy companies are using business intelligence         confidentiality of customer lists, because many people
and customer relationship management systems to identify           wrongly assume they’re public information, says Tim
their most profitable customers, there’s a genuine danger          Headley, a partner at the Houston law firm of Gardere
of that information falling into the wrong hands. Broader          Wynne Sewell LLP. “Most people think they can take the
access to those applications and the trend toward employees        lists with them,” he says. “You have to show that you’ve
switching jobs more frequently have made protecting cus-           kept it a secret and told employees it’s a valuable secret.
tomer lists an even greater priority.                              [Customer lists] are at the core of how you bring revenue
     Fortunately, there are managerial, legal, and technologi-     into the company. These are the decision-makers who are
cal steps you can take to help prevent, or at least discourage,    willing to buy your product.”
departing employees from walking out the door with this                 From a management and process standpoint, organiza-
vital information.                                                 tions should try to limit access to customer lists to only
     For starters, organizations should make sure that certain     employees, such as sales representatives, who need the
employees—particularly those with frequent access to cus-          information to do their jobs. “If you make it broadly avail-
tomer information—sign nondisclosure, noncompete, and              able to employees, then it’s not considered confidential,” says
nonsolicitation agreements that specifically mention customer      Labrit.
                                                                        Physical security should also be considered, Labrit says.
FIGURE 5.1                                                         Visitors such as vendors shouldn’t be permitted to roam free
                                                                   in the hallways or into conference rooms. And security poli-
                                                                   cies, such as a requirement that all computer systems have
                                                                   strong password protection, should be strictly enforced.
                                                                        Companies should instantly shut down access to com-
                                                                   puters and networks when employees leave, whether the rea-
                                                                   son is a layoff or a move to a new job. At the exit interview,
                                                                   the employee should be reminded of any signed agreements
                                                                   and corporate policies regarding customer lists and other
                                                                   confidential information. Employees should be told to turn
                                                                   over anything, including data that belongs to the company.
                                                                        In addition, employers should track the activities of em-
                                                                   ployees who’ve given notice but will be around for a while.
                                                                   This includes monitoring systems to see if the employee is
                                                                   e-mailing company-owned documents outside the company.
                                                                        Some organizations rely on technology to help prevent
  While data management is a strategic initiative in               the loss of customer lists and other critical data. Inflow Inc.,
  every modern organization, those in the gaming                   a Denver-based provider of managed Web hosting services,
  industry believe their success lies in the protection            uses a product from Opsware Inc. in Sunnyvale, California,
  and strategic management of their data resources.                that lets managers control access to specific systems, such as
                                                                   databases, from a central location.
Source: Jose Luis Palaez, Inc./Corbis.
152 ● Module II / Information Technologies

     The company also uses an e-mail-scanning service that              Vijay Sonty, chief technology officer at advertising firm
allows it to analyze messages that it suspects might contain       Foote Cone & Belding Worldwide in New York, says losing
proprietary files, says Lenny Monsour, general manager             customer information to competitors is a growing concern,
of application hosting and management. Inflow combines             particularly in industries where companies go after many of
the use of this technology with practices such as monitoring       the same clients.
employees who have access to data considered vital to the               “We have a lot of account executives who are very close
company.                                                           to the clients and have access to client lists,” Sonty says. “If
     A major financial services provider is using a firewall       an account executive leaves to join a competitor, he can take
from San Francisco-based Vontu Inc. that monitors out-             all this confidential information.” The widespread sharing of
bound e-mail, Webmail, Web posts, and instant messages to          corporate data, such as customer contact information, has
ensure that no confidential data leave the company. The            made it easier for people to do their jobs, but it has also
software includes search algorithms and can be customized          increased the risk of losing confidential data, Sonty says.
to automatically detect specific types of data such as lists on         He says the firm, which mandates that some employees
a spreadsheet or even something as granular as a customer’s        sign noncompete agreements, is looking into policies and
Social Security number. The firm began using the product           guidelines regarding the proper use of customer informa-
after it went through layoffs in 2000 and 2001.                    tion, as well as audit trails to see who’s accessing customer
     “Losing customer information was a primary concern of         lists. “I think it makes good business sense to take precau-
ours,” says the firm’s chief information security officer, who     tions and steps to prevent this from happening,” Sonty says.
asked to not be identified. “We were concerned about people        “We could lose a lot of money if key people leave.”
leaving and sending e-mail to their home accounts.” In fact,
he says, before using the firewall, the company had trouble
with departing employees taking intellectual property and
                                                                   Source: Adapted from Bob Violino, “Protecting the Data Jewels: Valuable
using it in their new jobs at rival firms, which sometimes led     Customer Lists,” Computerworld, July 19, 2004. Copyright © 2004 by
to lawsuits.                                                       Computerworld Inc., Framingham, MA 01701. All rights reserved.




         CASE STUDY QUESTIONS                                                REAL WORLD ACTIVITIES
 1. Why have developments in IT helped to increase the              1. Companies are increasingly adopting a position that
    value of the data resources of many companies?                     data is an asset that must be managed with the same
 2. How have these capabilities increased the security chal-           level of attention as that of cash and other capital.
    lenges associated with protecting a company’s data                 Using the Internet, see if you can find examples of how
    resources?                                                         companies treat their data. Does there seem to be any
                                                                       relationship between companies that look at their data
 3. How can companies use IT to meet the challenges of                 as an asset and companies that are highly successful in
    data resource security?                                            their respective industries?

The Real World Case above illustrates how valuable data re-        estimated that any firm in the financial industry would have
sources are to the casino industry. Break into small groups        a life expectancy of less than 100 hours if they were placed in
with your classmates, and discuss other industries where           a position where they could not access their organizational
their data are clearly their lifeblood. For example, it has been   data. Do you agree with this estimate?
Chapter 5 / Data Resource Management ● 153


FIGURE 5.2 Examples of the logical data elements in information systems. Note especially the examples of how
data fields, records, files, and databases are related.


                                                          Human Resource Database



                                                         Payroll File         Benefits File




              Employee                           Employee                              Employee                                Employee
              Record 1                           Record 2                              Record 3                                Record 4

  Name         SS No.       Salary     Name        SS No.         Salary    Name        SS No. Insurance          Name          SS No.       Insurance
  Field         Field        Field     Field        Field          Field    Field        Field    Field           Field          Field          Field

Jones T. A.   275-32-3874   20,000   Klugman J. L. 349-88-7913    28,000   Alvarez J.S. 542-40-3718    100,000   Porter M.L.   617-87-7915      50,000




                                      for which they are primarily used, such as a payroll file or an inventory file, or the type
                                      of data they contain, such as a document file or a graphical image file. Files are also classified
                                      by their permanence, for example, a payroll master file versus a payroll weekly transac-
                                      tion file. A transaction file, therefore, would contain records of all transactions occur-
                                      ring during a period and might be used periodically to update the permanent records
                                      contained in a master file. A history file is an obsolete transaction or master file retained
                                      for backup purposes or for long-term historical storage called archival storage.

Database                              A database is an integrated collection of logically related data elements. A database
                                      consolidates records previously stored in separate files into a common pool of data
                                      elements that provides data for many applications. The data stored in a database are
                                      independent of the application programs using them and of the type of storage devices
                                      on which they are stored.
                                          Thus, databases contain data elements describing entities and relationships among
                                      entities. For example, Figure 5.3 outlines some of the entities and relationships in a


FIGURE 5.3
Some of the entities and                                                        Electric Utility Database
relationships in a simplified
electric utility database.
Note a few of the business                              Billing                                                                   Payment
applications that access the                                                   Entities:                                         processing
data in the database.                                                           Customers, meters, bills,
                                                                                payments, meter readings


                                                        Meter                                                                     Service
                                                                               Relationships:
                                                       reading                                                                   start / stop
                                                                                Bills sent to customers,
                                                                                customers make payments,
                                                                                customers use meters, . . .



                                                       Source: Adapted from Michael V. Mannino, Database Application Development
                                                       and Design (Burr Ridge, IL: McGraw-Hill/Irwin, 2001), p. 6.
154 ● Module II / Information Technologies

                               database for an electric utility. Also shown are some of the business applications (billing,
                               payment processing) that depend on access to the data elements in the database.


Database                       The relationships among the many individual data elements stored in databases are
                               based on one of several logical data structures, or models. Database management sys-
Structures                     tem packages are designed to use a specific data structure to provide end users with
                               quick, easy access to information stored in databases. Five fundamental database struc-
                               tures are the hierarchical, network, relational, object-oriented, and multidimensional models.
                               Simplified illustrations of the first three database structures are shown in Figure 5.4.

Hierarchical                   Early mainframe DBMS packages used the hierarchical structure, in which the rela-
Structure                      tionships between records form a hierarchy or treelike structure. In the traditional
                               hierarchical model, all records are dependent and arranged in multilevel structures,


FIGURE 5.4                                     Hierarchical Structure
Example of three                                                                      Department
fundamental database                                                                 Data Element
structures. They represent
three basic ways to
develop and express the
relationships among the                                                 Project A                     Project B
data elements in a database.                                          Data Element                  Data Element




                                                       Employee 1                     Employee 2
                                                      Data Element                   Data Element


                                   Network Structure


                                                          Department A                Department B




                                             Employee                     Employee                      Employee
                                                1                            2                             3




                                                            Project                      Project
                                                              A                            B



                                   Relational Structure
                                     Department Table                       Employee Table
                                      Deptno     Dname     Dloc   Dmgr       Empno      Ename       Etitle   Esalary   Deptno
                                      Dept A                                 Emp 1                                     Dept A
                                      Dept B                                 Emp 2                                     Dept A
                                      Dept C                                 Emp 3                                     Dept B
                                                                             Emp 4                                     Dept B
                                                                             Emp 5                                     Dept C
                                                                             Emp 6                                     Dept B
Chapter 5 / Data Resource Management ● 155


                              consisting of one root record and any number of subordinate levels. Thus, all of the
                              relationships among records are one-to-many, since each data element is related to only
                              one element above it. The data element or record at the highest level of the hierarchy
                              (the department data element in this illustration) is called the root element. Any data
                              element can be accessed by moving progressively downward from a root and along the
                              branches of the tree until the desired record (for example, the employee data element)
                              is located.

Network Structure             The network structure can represent more complex logical relationships and is still
                              used by some mainframe DBMS packages. It allows many-to-many relationships
                              among records; that is, the network model can access a data element by following one
                              of several paths, because any data element or record can be related to any number of
                              other data elements. For example, in Figure 5.4, departmental records can be related
                              to more than one employee record, and employee records can be related to more than
                              one project record. Thus, you could locate all employee records for a particular
                              department, or all project records related to a particular employee.

Relational Structure          The relational model is the most widely used of the three database structures. It is used
                              by most microcomputer DBMS packages, as well as by most midrange and mainframe
                              systems. In the relational model, all data elements within the database are viewed as
                              being stored in the form of simple two-dimensional tables sometimes referred to as
                              relations. The tables in a relational database have rows and columns. Each row repre-
                              sents a single record in the file, and each column represents a field.
                                  Figure 5.4 illustrates the relational database model with two tables representing some
                              of the relationships among departmental and employee records. Other tables, or rela-
                              tions, for this organization’s database might represent the data element relationships
                              among projects, divisions, product lines, and so on. Database management system pack-
                              ages based on the relational model can link data elements from various tables to provide
                              information to users. For example, a manager might want to retrieve and display an
                              employee’s name and salary from the employee table in Figure 5.4, and the name of the
                              employee’s department from the department table, by using their common department
                              number field (Deptno) to link or join the two tables. See Figure 5.5. The relational
                              model can relate data in any one file with data in another file if both files share a com-
                              mon data element or field. Because of this, information can be created by retrieving data
                              from multiple files even if they are not all stored in the same physical location.

Relational                    Three basic operations can be performed on a relational database to create useful sets
Operations                    of data. The select operation is used to create a subset of records that meet a stated cri-
                              terion. For example, a select operation might be used on an employee database to
                              create a subset of records that contain all employees who make more than $30,000 per
                              year and who have been with the company more than three years. Another way to
                              think of the select operation is that it temporarily creates a table whose rows have
                              records that meet the selection criteria.



FIGURE 5.5                        Department Table                    Employee Table
Joining the Employee and            Deptno   Dname    Dloc   Dmgr      Empno     Ename    Etitle   Esalary   Deptno
Department tables in a              Dept A                             Emp 1                                 Dept A
relational database enables         Dept B                             Emp 2                                 Dept A
you to selectively access           Dept C                             Emp 3                                 Dept B
data in both tables at the                                             Emp 4                                 Dept B
same time.                                                             Emp 5                                 Dept C
                                                                       Emp 6                                 Dept B
156 ● Module II / Information Technologies

                                  The join operation can be used to temporarily combine two or more tables so that a
                              user can see relevant data in a form that looks like it is all in one big table. Using this
                              operation, a user can ask for data to be retrieved from multiple files or databases without
                              having to go to each one separately.
                                  Finally, the project operation is used to create a subset of the columns contained in
                              the temporary tables created by the select and join operations. Just as the select oper-
                              ation creates a subset of records that meet stated criteria, the project operation creates
                              a subset of the columns, or fields, that the user wants to see. Using a project operation,
                              the user can decide not to view all of the columns in the table but only those that have
                              data necessary to answer a particular question or to construct a specific report.
                                  Because of the widespread use of the relational model, an abundance of commer-
                              cial products exists to create and manage them. Leading mainframe relational database
                              applications include Oracle 10g from Oracle Corp. and DB2 from IBM. A very popu-
                              lar midrange database application is SQL server from Microsoft. The most commonly
                              used database application for the PC is Microsoft Access.

Multidimensional              The multidimensional model is a variation of the relational model that uses multidi-
Structure                     mensional structures to organize data and express the relationships between data. You
                              can visualize multidimensional structures as cubes of data and cubes within cubes
                              of data. Each side of the cube is considered a dimension of the data. Figure 5.6 is an
                              example that shows that each dimension can represent a different category, such as
                              product type, region, sales channel, and time [5].


FIGURE 5.6 An example of the different dimensions of a multidimensional database.
        Denver                                                         Profit
      Los Angeles                                                    Total Expenses
   San Francisco                                                   Margin
  West                                                            COGS
                      February        March                                              East          West
 East                                                           Sales
                    Actual Budget Actual Budget                                      Actual Budget Actual Budget
 Sales  Camera                                                  TV        January
        TV                                                                February
        VCR                                                               March
        Audio                                                             Qtr 1
 Margin Camera                                                  VCR       January
        TV                                                                February
        VCR                                                               March
        Audio                                                             Qtr 1


        April                                                           April
      Qtr 1                                                           Qtr 1
    March                                                           March
  February                                                        February
                       Actual       Budget                                              Sales        Margin
 January                                                        January
                    Sales Margin Sales Margin                                         TV    VCR     TV   VCR
 TV        East                                                 East      Actual
           West                                                           Budget
           South                                                          Forecast
           Total                                                          Variance
 VCR       East                                                 West      Actual
           West                                                           Budget
           South                                                          Forecast
           Total                                                          Variance
Chapter 5 / Data Resource Management ● 157


FIGURE 5.7                                                           Bank Account Object
The checking and savings                                                   Attributes
account objects can inherit                                          Customer
common attributes and                                                Balance
operations from the bank                                             Interest
account object.                                                           Operations
                                                                     Deposit (amount)
                                                                     Withdraw (amount)
                                                                     Get owner


                                                   Inheritance                                     Inheritance




                                    Checking Account Object                                        Savings Account Object
                                            Attributes                                                   Attributes
                                      Credit line                                                   Number of withdrawals
                                      Monthly statement                                             Quarterly statement
                                            Operations                                                    Operations
                                      Calculate interest owed                                       Calculate interest paid
                                      Print monthly statement                                       Print quarterly statement

                                  Source: Adapted from Ivar Jacobsen, Maria Ericsson, and Ageneta Jacobsen, The Object
                                  Advantage: Business Process Reengineering with Object Technology (New York: ACM Press, 1995),
                                  p. 65. Copyright © 1995, Association for Computing Machinery. By permission.


                                  Each cell within a multidimensional structure contains aggregated data related to
                              elements along each of its dimensions. For example, a single cell may contain the total
                              sales for a product in a region for a specific sales channel in a single month. A major
                              benefit of multidimensional databases is that they are a compact and easy-to-understand
                              way to visualize and manipulate data elements that have many interrelationships. So
                              multidimensional databases have become the most popular database structure for
                              the analytical databases that support online analytical processing (OLAP) applications,
                              in which fast answers to complex business queries are expected. We discuss OLAP
                              applications in Chapter 9.

Object-Oriented               The object-oriented model is considered to be one of the key technologies of a new
Structure                     generation of multimedia Web-based applications. As Figure 5.7 illustrates, an object
                              consists of data values describing the attributes of an entity, plus the operations that
                              can be performed upon the data. This encapsulation capability allows the object-
                              oriented model to more easily handle complex types of data (graphics, pictures, voice,
                              text) than other database structures.
                                  The object-oriented model also supports inheritance; that is, new objects can be
                              automatically created by replicating some or all of the characteristics of one or more
                              parent objects. Thus, in Figure 5.7, the checking and savings account objects can both
                              inherit the common attributes and operations of the parent bank account object. Such
                              capabilities have made object-oriented database management systems (OODBMS) popular
                              in computer-aided design (CAD) and in a growing number of applications. For exam-
                              ple, object technology allows designers to develop product designs, store them as ob-
                              jects in an object-oriented database, and replicate and modify them to create new
                              product designs. In addition, multimedia Web-based applications for the Internet and
                              corporate intranets and extranets have become a major application area for object
                              technology.
                                  Object technology proponents argue that an object-oriented DBMS can work with
                              complex data types such as document and graphic images, video clips, audio segments,
                              and other subsets of Web pages much more efficiently than relational database
                              management systems. However, major relational DBMS vendors have countered by
158 ● Module II / Information Technologies

FIGURE 5.8
This claims analysis
graphics display provided
by the CleverPath
enterprise portal is powered
by the Jasmine ii object-
oriented database
management system of
Computer Associates.




                               Source: Courtesy of Computer Associates.




                               adding object-oriented modules to their relational software. Examples include multi-
                               media object extensions to IBM’s DB2, and Oracle’s object-based “cartridges” for
                               Oracle 10g. See Figure 5.8.

Evaluation of                  The hierarchical data structure was a natural model for the databases used for the
Database Structures            structured, routine types of transaction processing characteristic of many business op-
                               erations in the early years of data processing and computing. Data for these operations
                               can easily be represented by groups of records in a hierarchical relationship. However,
                               as time progressed, there were many cases where information was needed about
                               records that did not have hierarchical relationships. For example, in some organizations,
                               employees from more than one department can work on more than one project (refer
                               back to Figure 5.4). A network data structure could easily handle this many-to-many
                               relationship, whereas a hierarchical model could not. As such, the more flexible net-
                               work structure became popular for these types of business operations. However, like
                               the hierarchical structure, because its relationships must be specified in advance, the
                               network model was unable to easily handle ad hoc requests for information, thus
                               pointing out the need for the relational model.
                                   Relational databases allow an end user to easily receive information in response to
                               ad hoc requests. That’s because not all of the relationships between the data elements
                               in a relationally organized database need to be specified when the database is created.
                               Database management software (such as Oracle 10g, DB2, Access, and Approach) cre-
                               ates new tables of data relationships by using parts of the data from several tables.
                               Thus, relational databases are easier for programmers to work with and easier to main-
                               tain than the hierarchical and network models.
                                   The major limitation of the relational model is that relational database manage-
                               ment systems cannot process large amounts of business transactions as quickly and
                               efficiently as those based on the hierarchical and network models, or process com-
                               plex, high-volume applications as well as the object-oriented model. This performance
                               gap has narrowed with the development of advanced relational database software
                               with object-oriented extensions. The use of database management software based on
                               the object-oriented and multidimensional models is growing steadily, as these tech-
                               nologies are playing a greater role for OLAP and Web-based applications.
Chapter 5 / Data Resource Management ● 159



Experian          Experian Inc. (www.experian.com), a unit of London-based GUS PLC, runs one of
Automotive: The   the largest credit reporting agencies in the United States. But Experian wanted to
                  expand its business beyond credit checks for automobile loans. If it could collect
Business Value    vehicle data from the various motor-vehicle departments in the United States and
of Relational     blend that with other data, such as change-of-address records, then its Experian
Database          Automotive division could sell the enhanced data to a variety of customers. For
                  example, car dealers could use the data to make sure their inventory matches local
Management        buying preferences. And toll collectors could match license plates to addresses to
                  find motorists who sail past tollbooths without paying.
                      But to offer new services, Experian first needed a way to extract, transfer, and
                  load data from the systems of 50 different U.S. state departments of motor vehicles
                  (DMVs), plus Puerto Rico, into a single database. That was a big challenge. “Unlike
                  the credit industry that writes to a common format, the DMVs do not,” says Ken
                  Kauppila, vice president of IT at Experian Automotive in Costa Mesa, California.
                      Of course, Experian didn’t want to replicate the hodgepodge of file formats it
                  inherited when the project began in January 1999—175 formats among 18,000
                  files. So Kauppila decided to transform and map the data to a common relational
                  database format.
                      Fortunately, off-the-shelf software tools for extracting, transforming, and loading
                  data (called ETL tools) make it economical to combine very large data repositories.
                  Using ETL Extract from Evolutionary Technologies, Experian created a database
                  that can incorporate vehicle information within 48 hours of its entry into any of the
                  nation’s DMV computers. This is one of the areas in which data management soft-
                  ware tools can excel, says Guy Creese, analyst at Aberdeen Group in Boston. “It can
                  simplify the mechanics of multiple data feeds, and it can add to data quality, making
                  fixes possible before errors are propagated to data warehouses,” he says.
                      Using the ETL extraction and transformation tools along with IBM’s DB2 data-
                  base system, Experian Automotive created a database that processes 175 million
                  transactions per month and has created a variety of profitable new revenue streams.
                  Experian’s automotive database is the 10th largest database in the world—now, with
                  up to 16 billion rows of data. But the company says the relational database is man-
                  aged by just three IT professionals. Experian says this demonstrates how efficiently
                  database software like DB2 and the ETL tools can work with a large database to
                  handle vast amounts of data quickly.



Database          Database management packages like Microsoft Access or Lotus Approach allow end
                  users to easily develop the databases they need. See Figure 5.9. However, large orga-
Development       nizations usually place control of enterprisewide database development in the hands of
                  database administrators (DBAs) and other database specialists. This improves the in-
                  tegrity and security of organizational databases. Database developers use the data def-
                  inition language (DDL) in database management systems like Oracle 10g or IBM’s DB2
                  to develop and specify the data contents, relationships, and structure of each database,
                  and to modify these database specifications when necessary. Such information is cata-
                  loged and stored in a database of data definitions and specifications called a data dictio-
                  nary, or metadata repository, which is managed by the database management software
                  and maintained by the DBA.
                      A data dictionary is a database management catalog or directory containing
                  metadata, that is, data about data. A data dictionary relies on a specialized database
                  software component to manage a database of data definitions, that is, metadata about
                  the structure, data elements, and other characteristics of an organization’s databases.
                  For example, it contains the names and descriptions of all types of data records and
                  their interrelationships, as well as information outlining requirements for end users’
                  access and use of application programs, and database maintenance and security.
160 ● Module II / Information Technologies

FIGURE 5.9
Creating a database table
using the Table Wizard
of Microsoft Access.




                              Source: Courtesy of Microsoft Corp.



                                  Data dictionaries can be queried by the database administrator to report the status
                              of any aspect of a firm’s metadata. The administrator can then make changes to the
                              definitions of selected data elements. Some active (versus passive) data dictionaries
                              automatically enforce standard data element definitions whenever end users and ap-
                              plication programs access an organization’s databases. For example, an active data dic-
                              tionary would not allow a data entry program to use a nonstandard definition of
                              a customer record, nor would it allow an employee to enter a name of a customer that
                              exceeded the defined size of that data element.
                                  Developing a large database of complex data types can be a complicated task. Data-
                              base administrators and database design analysts work with end users and systems
                              analysts to model business processes and the data they require. Then they determine
                              (1) what data definitions should be included in the database and (2) what structure or
                              relationships should exist among the data elements.

Data Planning and             As Figure 5.10 illustrates, database development may start with a top-down data plan-
Database Design               ning process. Database administrators and designers work with corporate and end
                              user management to develop an enterprise model that defines the basic business process
                              of the enterprise. Then they define the information needs of end users in a business
                              process, such as the purchasing/receiving process that all businesses have.
                                  Next, end users must identify the key data elements that are needed to perform
                              their specific business activities. This frequently involves developing entity relationship
                              diagrams (ERDs) that model the relationships among the many entities involved in
                              business processes. For example, Figure 5.11 illustrates some of the relationships
                              in a purchasing/receiving process. ERDs are simply graphical models of the various
                              files and their relationships contained within a database system. End users and data-
                              base designers could use database management or business modeling software
                              to help them develop ERD models for the purchasing/receiving process. This would
                              help identify what supplier and product data are required to automate their purchasing/
                              receiving and other business processes using enterprise resource management (ERM)
                              or supply chain management (SCM) software. You will learn about ERDs and other
                              data modeling tools in much greater detail if you ever take a course in systems analysis
                              and design.
Chapter 5 / Data Resource Management ● 161


FIGURE 5.10
Database development                         1. Data Planning                                 Physical Data Models
involves data planning and              Develops a model of business                       Storage representations and
database design activities.                      processes                                      access methods
Data models that support
business processes are used
to develop databases that
meet the information needs                                                                    5. Physical Design
of users.                               Enterprise model of business                      Determines the data storage
                                        processes with documentation                         structures and access
                                                                                                    methods




                                                                                               Logical Data Models
                                       2. Requirements Specification
                                                                                             E.g., relational, network,
                                       Defines information needs of end
                                                                                          hierarchical, multidimensional,
                                          users in a business process
                                                                                            or object-oriented models




                                       Description of users’ needs may                        4. Logical Design
                                          be represented in natural                        Translates the conceptual
                                       language or using the tools of a                   models into the data model of
                                        particular design methodology                               a DBMS




                                           3. Conceptual Design
                                                                                            Conceptual Data Models
                                          Expresses all information
                                                                                            Often expressed as entity
                                        requirements in the form of a
                                                                                               relationship models
                                              high-level model




                                   Such user views are a major part of a data modeling process where the relation-
                               ships between data elements are identified. Each data model defines the logical rela-
                               tionships among the data elements needed to support a basic business process. For
                               example, can a supplier provide more than one type of product to us? Can a customer
                               have more than one type of account with us? Can an employee have several pay rates
                               or be assigned to several project workgroups?
                                   Answering such questions will identify data relationships that have to be repre-
                               sented in a data model that supports a business process. These data models then serve
                               as logical frameworks (called schemas and subschemas) on which to base the physical de-
                               sign of databases and the development of application programs to support the business
                               processes of the organization. A schema is an overall logical view of the relationships



FIGURE 5.11                                           Ordered on                                Supplies
                                   Purchase
This entity relationship                                                  Product                                   Supplier
                                   Order Item
diagram illustrates some of
the relationships among the
                                                                             Stocked as




entities (product, supplier,
                                         Contains




warehouse, etc.) in a
purchasing/receiving
business process.
                                    Purchase                              Product                Holds
                                                                                                                   Warehouse
                                     Order                                 Stock
162 ● Module II / Information Technologies

FIGURE 5.12 Example of the logical and physical database views and the software interface of a banking services
information system.

                                               Installment
   Checking              Savings
                                                  Loan
  Application           Application
                                               Application




                                                             Logical User Views
            Checking and              Installment            Data elements and relationships (the subschemas) needed
              Savings                    Loan                for checking, savings, or installment loan processing
             Data Model               Data Model




                                                             Data elements and relationships (the schema)
                 Banking Services Data Model                 needed for the support of all bank services



                                                             Software Interface
                Database Management System
                                                             The DBMS provides access to the bank’s databases



                                                             Physical Data Views
                                                             Organization and location of data on the storage media
                          Bank
                        Databases




                               among the data elements in a database, while the subschema is a logical view of the
                               data relationships needed to support specific end user application programs that will
                               access that database.
                                   Remember that data models represent logical views of the data and relationships of
                               the database. Physical database design takes a physical view of the data (also called the
                               internal view) that describes how data are to be physically stored and accessed on the
                               storage devices of a computer system. For example, Figure 5.12 illustrates these dif-
                               ferent database views and the software interface of a bank database processing system.
                               In this example, checking, savings, and installment lending are the business processes
                               whose data models are part of a banking services data model that serves as a logical
                               data framework for all bank services.



 Aetna: Insuring               On a daily basis the operational services central support area at Aetna Inc. is
 Tons of Data                  responsible for 21.8 tons of data (174.6 terabytes [TB]). Over 119.2TB reside
                               on mainframe-connected disk drives, while the remaining 55.4TB sit on disks
                               attached to midrange computers. Almost all of this data are located in the com-
                               pany’s headquarters in Hartford, Connecticut—with most of the information in
                               relational databases. To make matters even more interesting, outside customers
                               have access to about 20TB of the information. Four interconnected data centers
                               containing 14 mainframes and more than 1,000 midrange servers process the
                               data. It takes more than 4,100 direct-access storage devices to hold Aetna’s key
                               databases.
Chapter 5 / Data Resource Management ● 163



    Most of Aetna’s ever-growing mountain of data is health care information. The
insurance company maintains records for both health maintenance organization
participants and customers covered by insurance policies. Aetna has detailed
records of providers, such as doctors, hospitals, dentists, and pharmacies, and it
keeps track of all the claims it has processed. Some of Aetna’s larger customers send
tapes containing insured employee data; the firm is moving toward using the Internet
to collect such data.
    If managing gigabytes of data is like flying a hang glider, managing multiple
terabytes of data is like piloting a space shuttle: a thousand times more complex.
You can’t just extrapolate from experiences with small and medium data stores to
understand how to successfully manage tons of data. Even an otherwise mundane
operation such as backing up a database can be daunting if the time needed to finish
copying the data exceeds the time available.
    Data integrity, backup, security, and availability are collectively the Holy
Grail of dealing with large data stores. The sheer volume of data makes these
goals a challenge, and a highly decentralized environment complicates matters
even more. Developing and adhering to standardized data maintenance proce-
dures always provide an organization with the best return on their data dollar
investment [9, 11].
164 ● Module II / Information Technologies


  SECTION II                  Managing Data Resources

Data Resource                 Data are a vital organizational resource that needs to be managed like other important
                              business assets. Today’s business enterprises cannot survive or succeed without quality
Management                    data about their internal operations and external environment.
                                  With each online mouse click, either a fresh bit of data is created or already-stored data are
                                  retrieved from all those business websites. All that’s on top of the heavy demand for indus-
                                  trial-strength data storage already in use by scores of big corporations. What’s driving the
                                  growth is a crushing imperative for corporations to analyze every bit of information they can
                                  extract from their huge data warehouses for competitive advantage. That has turned the
                                  data storage and management function into a key strategic role of the information age [8].
                                 That’s why organizations and their managers need to practice data resource man-
                              agement, a managerial activity that applies information systems technologies like data-
                              base management, data warehousing, and other data management tools to the task of
                              managing an organization’s data resources to meet the information needs of their busi-
                              ness stakeholders. This chapter will show you the managerial implications of using
                              data resource management technologies and methods to manage an organization’s data
                              assets to meet business information requirements.
                                  Read the Real World Case on data administration. We can learn a lot from this case
                              about the challenges of managing the data within an organization. See Figure 5.13.


Types of                      Continuing developments in information technology and its business applications
                              have resulted in the evolution of several major types of databases. Figure 5.14 illus-
Databases                     trates several major conceptual categories of databases that may be found in many
                              organizations. Let’s take a brief look at some of them now.

Operational                   Operational databases store detailed data needed to support the business processes
Databases                     and operations of a company. They are also called subject area databases (SADB), trans-
                              action databases, and production databases. Examples are a customer database, human re-
                              source database, inventory database, and other databases containing data generated by
                              business operations. For example, a human resource database like that shown earlier in
                              Figure 5.2 would include data identifying each employee and his or her time worked,
                              compensation, benefits, performance appraisals, training and development status, and
                              other related human resource data. Figure 5.15 illustrates some of the common oper-
                              ational databases that can be created and managed for a small business using Microsoft
                              Access database management software.

Distributed                   Many organizations replicate and distribute copies or parts of databases to network
Databases                     servers at a variety of sites. These distributed databases can reside on network servers
                              on the World Wide Web, on corporate intranets or extranets, or on other company
                              networks. Distributed databases may be copies of operational or analytical databases,
                              hypermedia or discussion databases, or any other type of database. Replication and dis-
                              tribution of databases are done to improve database performance at end user worksites.
                              Ensuring that the data in an organization’s distributed databases are consistently and
                              concurrently updated is a major challenge of distributed database management.
                                  Distributed databases have both advantages and disadvantages. One primary ad-
                              vantage of a distributed database lies with the protection of valuable data. If all of an
                              organization’s data reside in a single physical location, any catastrophic event like a fire
                              or damage to the media holding the data would result in an equally catastrophic loss
                              of use of that data. By having databases distributed in multiple locations, the negative
                              impact of such an event can be minimized.
Chapter 5 / Data Resource Management ● 165




                                   2
REAL WORLD
                                                          Emerson and Sanofi: Data
 CASE                                                     Stewards Seek Data Conformity


A           customer is a customer is a customer, right? Actu-
            ally, it’s not that simple. Just ask Emerson Process
            Management, an Emerson Electric Co. unit in
Austin that supplies process automation products. In 2000 the
company attempted to build a data warehouse to store cus-
                                                                          “It’s usually a seesaw effect,” says Chris Enger, formerly
                                                                     manager of information management at Philip Morris USA
                                                                     Inc. “When something goes wrong, they put someone in
                                                                     charge of data quality, and when things get better, they pull
                                                                     those resources away.”
tomer information from over 85 countries. The effort failed               Creating a data quality team requires gathering people
in large part because the structure of the warehouse couldn’t        with an unusual mix of business, technology, and diplomatic
accommodate the many variations on customers’ names.                 skills. It’s even difficult to agree on a job title. In Rybeck’s
     For instance, different users in different parts of the world   department, they’re called “data analysts,” but titles at other
might identify Exxon as Exxon, Mobil, Esso, or ExxonMobil,           companies include “data quality control supervisor,” “data
to name a few variations. The warehouse would see them as            coordinator,” or “data quality manager.”
separate customers, and that would lead to inaccurate results             “When you say you want a data analyst, they’ll come
when business users performed queries.                               back with a DBA [database administrator]. But it’s not the
     That’s when the company hired Nancy Rybeck as data              same at all,” Rybeck says. “It’s not the data structure, it’s the
administrator. Rybeck is now leading a renewed data ware-            content.”
house project that ensures not only the standardization of                At Emerson, data analysts in each business unit review
customer names, but also the quality and accuracy of cus-            data and correct errors before it’s put into the operational
tomer data, including postal addresses, shipping addresses,          systems. They also research customer relationships, loca-
and province codes.                                                  tions, and corporate hierarchies; train overseas workers to fix
     To accomplish this, Emerson has done something unusual:         data in their native languages; and serve as the main contact
It has started to build a department with 6 to 10 full-time “data    with the data administrator and database architect for new
stewards” dedicated to establishing and maintaining the quality      requirements and bug fixes.
of data entered into the operational systems that feed the data           As the leader of the group, Rybeck plays a role that
warehouse.                                                           includes establishing and communicating data standards,
     The practice of having formal data stewards is uncom-           ensuring data integrity is maintained during database con-
mon. Most companies recognize the importance of data                 versions, and doing the logical design for the data ware-
quality, but many treat it as a “find-and-fix” effort, to be con-    house tables.
ducted at the end of a project by someone in IT. Others                   The stewards have their work cut out for them. Bringing
casually assign the job to the business users who deal with the      together customer records from the 75 business units yielded
data head-on. Still others may throw resources at improving          a 75 percent duplication rate, misspellings, and fields with
data only when a major problem occurs.                               incorrect or missing data.
                                                                          “Most of the divisions would have sworn they had great
FIGURE 5.13                                                          processes and standards in place,” Rybeck says. “But when
                                                                     you show them they entered the customer name 17 different
                                                                     ways, or someone had entered, ‘Loading dock open 8:00–4:00’
                                                                     into the address field, they realize it’s not as clean as they
                                                                     thought.”
                                                                          Although the data steward may report to IT—as is the
                                                                     case at Emerson and at pharmaceuticals company Sanofi-
                                                                     Synthelabo Inc.—it’s not a job for someone steeped in tech-
                                                                     nical knowledge. Yet it’s not right for a businessperson who’s
                                                                     a technophobe, either.
                                                                          Seth Cohen is the first data quality control supervisor at
                                                                     Sanofi in New York. He was hired in 2003 to help design au-
                                                                     tomated processes to ensure the data quality of the customer
                                                                     knowledge base that Sanofi was beginning to build.
                                                                          Data stewards at Sanofi need to have business knowledge
                                                                     because they need to make frequent judgment calls, Cohen
                                                                     says. Indeed, judgment is a big part of the data steward’s
                                                                     job—including the ability to determine where you don’t
                                                                     need 100 percent perfection.
                                                                          Cohen says that task is one of the biggest challenges of the
                                                                     job. “One-hundred percent accuracy is just not achievable,”
Source: Flying Colours Ltd./Digital Vision/Getty Images
166 ● Module II / Information Technologies

he says. “Some things you’re just going to have to let go or        didn’t see why he was “causing them so many headaches and
you’d have a data warehouse with only 15 to 20 records.”            adding several extra steps to the process,” he says.
    A good example is when Sanofi purchases data on doctors             There are many political traps as well. Take the issue of
that includes their birth dates, Cohen says. If a birth date is     defining “customer address.” If data comes from a variety
given as February 31 or the number of the month is listed as        of sources, you’re likely to get different types of coding
13 but the rest of the data are good, do you throw out all of       schemes, some of which overlap.
the data or just figure the birth date isn’t all that important?        People may also argue about how data should be pro-
    It comes down to knowing how much it costs to fix the           duced, he says. Should field representatives enter it from
data versus the payback. “You can pay millions of dollars a         their laptops? Or should it first be independently checked for
year to get it perfect, but if the returns are in the hundreds of   quality? Should it be uploaded hourly or weekly?
thousands, is it worth it?” asks Chuck Kelley, senior advisory          Most of all, data stewards need to understand that data
consultant at Navigator Systems Inc., a corporate perfor-           quality is a journey, not a destination. “It’s not a one-shot
mance management consultancy in Addison, Texas.                     deal—it’s ongoing,” Rybeck of Emerson says. “You can’t quit
    Data stewards also need to be politically astute, diplo-        after the first task.”
matic, and good at conflict resolution—in part because the
                                                                    Source: Adapted from Mary Brandel, “Data Stewards Seek Data Conformity,”
environment isn’t always friendly. When Cohen joined                Computerworld, March 15, 2004. Copyright © 2004 by Computerworld Inc.,
Sanofi, some questioned why he was there. In particular, IT         Framingham, MA 01701. All rights reserved.




         CASE STUDY QUESTIONS                                                REAL WORLD ACTIVITIES
 1. Why is the role of a data steward considered to be               1. As discussed in the case, the role of data steward is
    innovative? Explain.                                                relatively new, and its creation is motivated by the
 2. What are the business benefits associated with the data             desire to protect the valuable data assets of the firm.
    steward program at Emerson?                                         There are many job descriptions in the modern organi-
                                                                        zation associated with the strategic management of data
 3. How does effective data resource management                         resources. Using the Internet, see if you can find evi-
    contribute to the strategic goals of an organization?               dence of other job roles that are focused on the man-
    Provide examples from Emerson and others.                           agement of an organization’s data. How might a person
                                                                        train for these new jobs?
                                                                     2. As more and more data are collected stored, processed,
                                                                        and disseminated by organizations, new and innovative
                                                                        ways to manage them must be developed. Break into
                                                                        small groups with your classmates, and discuss how the
                                                                        data resource management methods of today will need
                                                                        to evolve as more types of data emerge. Will we ever
                                                                        get to the point where we can manage our data in a
                                                                        completely automated manner?
Chapter 5 / Data Resource Management ● 167


FIGURE 5.14 Examples of some of the major types of databases used by organizations and end users.


                                                                                                   External
                                                                                                  Databases
                                                                                                    on the
                                                                                                 Internet and
                                                                                                    Online
                                Client PC                                                          Services

                                                                   Network
                                                                   Server

 Distributed
 Databases                                                                                       Operational
 on Intranets                                                                                     Databases
  and Other                                                                                         of the
  Networks                                                                                       Organization




                                End User                            Data                            Data
                                Databases                         Warehouse                         Marts




                                Another advantage of distributed databases is found in their storage requirements.
                            Often, a large database system may be distributed into smaller databases based on
                            some logical relationship between the data and the location. For example, a company
                            with several branch operations may distribute its data so that each branch operation
                            location is also the location of its branch database. Because multiple databases in a
                            distributed system can be joined together, each location has control of its local data
                            while all other locations can access any database in the company if so desired.
                                Distributed databases are not without some challenges, however. The primary chal-
                            lenge is the maintenance of data accuracy. If a company distributes its database to


FIGURE 5.15
Examples of operational
databases that can be
created and managed
for a small business by
microcomputer database
management software like
Microsoft Access.




                            Source: Courtesy of Microsoft Corp.
168 ● Module II / Information Technologies

                              multiple locations, any change to the data in one location must somehow be updated in
                              all other locations. This can be accomplished in one of two ways: replication or duplication.
                                   Updating a distributed database using replication involves using a specialized soft-
                              ware application that looks at each distributed database and then finds the changes
                              made to it. Once these changes have been identified, the replication process makes all
                              of the distributed databases look the same by making the appropriate changes to each
                              one. The replication process is very complex and, depending upon the number and
                              size of the distributed databases, can consume a lot of time and computer resources.
                                   The duplication process, in contrast, is much less complicated. It basically identi-
                              fies one database as a master and then duplicates that database at a prescribed time af-
                              ter hours so that each distributed location has the same data. One drawback to the
                              duplication process is that no changes can ever be made to any database other than the
                              master to avoid having local changes overwritten during the duplication process.
                              Nonetheless, properly used, duplication and replication can keep all distributed
                              locations current with the latest data.
                                   One additional challenge associated with distributed databases is the extra com-
                              puting power and bandwidth necessary to access multiple databases in multiple loca-
                              tions. We will look more closely at the issue of bandwidth in Chapter 6 when we focus
                              on telecommunications and networks.

External Databases            Access to a wealth of information from external databases is available for a fee from
                              commercial online services, and with or without charge from many sources on the
                              World Wide Web. Websites provide an endless variety of hyperlinked pages of multi-
                              media documents in hypermedia databases for you to access. Data are available in the
                              form of statistics on economic and demographic activity from statistical databanks. Or
                              you can view or download abstracts or complete copies of hundreds of newspapers,
                              magazines, newsletters, research papers, and other published material and other peri-
                              odicals from bibliographic and full text databases. Whenever you use a search engine like
                              Google or Yahoo to look up something on the Internet, you are using an external
                              database—a very, very large one!

Hypermedia                    The rapid growth of websites on the Internet and corporate intranets and extranets has
Databases                     dramatically increased the use of databases of hypertext and hypermedia documents.
                              A website stores such information in a hypermedia database consisting of hyper-
                              linked pages of multimedia (text, graphic, and photographic images, video clips, audio
                              segments, and so on). That is, from a database management point of view, the set of
                              interconnected multimedia pages at a website is a database of interrelated hypermedia
                              page elements, rather than interrelated data records [2].
                                  Figure 5.16 shows how you might use a Web browser on your client PC to connect
                              with a Web network server. This server runs Web server software to access and transfer the


FIGURE 5.16 The components of a Web-based information system include Web browsers,
servers, and hypermedia databases.


                          The Internet
                           Intranets
      Web                  Extranets                       HTML
     Browser                                    Web        XML
                                               Server
                                                                                     Web Pages
                                              Software
                                                                                     Image Files
                                                                                     Video Files
                                                                                     Audio Files


     Client PCs
                                              Network                  Hypermedia
                                              Server                    Database
Chapter 5 / Data Resource Management ● 169


FIGURE 5.17 The components of a complete data warehouse system.

Operational, External,
and Other Databases                                                     Analytical
                                                                        Data Store

                                      Data                              Enterprise
                                   Management                           Warehouse

                                                                           Data
                                                                           Marts
    Data Acquisition                                                                                   Data Analysis
    (Capture, clean,                                                                                  (Query, report,
 transform, transport,                                                                                 analyze, mine,
      load/apply)                                                                                         deliver)
                                                                         Metadata
                                    Metadata                             Directory
                                   Management
     Warehouse                                                          Metadata
                                                                        Repository                   Web Information
      Design                                                                                           Systems

Source: Adapted courtesy of Hewlett-Packard.


                                 Web pages you request. The website illustrated in Figure 5.17 uses a hypermedia database
                                 consisting of Web page content described by HTML (Hypertext Markup Language)
                                 code or XML (Extensible Markup Language) labels, image files, video files, and audio.
                                 The Web server software acts as a database management system to manage the transfer of
                                 hypermedia files for downloading by the multimedia plug-ins of your Web browser.


Data                             A data warehouse stores data that have been extracted from the various operational,
                                 external, and other databases of an organization. It is a central source of the data that
Warehouses                       have been cleaned, transformed, and cataloged so they can be used by managers and
and Data                         other business professionals for data mining, online analytical processing, and other
Mining                           forms of business analysis, market research, and decision support. (We’ll talk in depth
                                 about all of these activities in Chapter 9.) Data warehouses may be subdivided into
                                 data marts, which hold subsets of data from the warehouse that focus on specific
                                 aspects of a company, such as a department or a business process.
                                     Figure 5.17 illustrates the components of a complete data warehouse system. No-
                                 tice how data from various operational and external databases are captured, cleaned,
                                 and transformed into data that can be better used for analysis. This acquisition process
                                 might include activities like consolidating data from several sources, filtering out un-
                                 wanted data, correcting incorrect data, converting data to new data elements, and
                                 aggregating data into new data subsets.
                                     This data is then stored in the enterprise data warehouse, from where it can be moved
                                 into data marts or to an analytical data store that holds data in a more useful form for cer-
                                 tain types of analysis. Metadata (data that defines the data in the data warehouse) is stored
                                 in a metadata repository and cataloged by a metadata directory. Finally, a variety of ana-
                                 lytical software tools can be provided to query, report, mine, and analyze the data for
                                 delivery via Internet and intranet Web systems to business end users. See Figure 5.18.


 Revenue: Closing                In the late 1990s the state of Iowa had a tax gap, a polite way of describing compa-
 the Gap with a                  nies and individuals who either didn’t file state tax returns or who underreported
                                 their earnings. To identify noncompliant taxpayers, the Iowa Department of
 Data Warehouse                  Revenue and Finance (IDRF) relied on a jumble of nonintegrated mainframe
                                 applications, file extracts, and over 20 disparate stand-alone systems (databases,
170 ● Module II / Information Technologies

FIGURE 5.18                            Applications                                                    Data Marts
A data warehouse and its
data mart subsets hold data
that have been extracted                                                                                   Finance
                                     ERP
from various operational
databases for business
analysis, market research,
decision support, and data
mining applications.            Inventory
                                   control
                                                                                                           Marketing




                                 Logistics

                                                                       Data
                                                                     Warehouse                             Sales


                                 Shipping




                                                                                                           Accounting
                              Purchasing




                                    CRM
                                                                                                           Management
                                                                                                           reporting




                              mainframe data, and information on individual spreadsheets, to name a few).
                              The real problem was that none of these systems could communicate with each
                              other. What was needed was a central data warehouse to pull together information
                              from all those systems for analysis. But getting funding from the state for such a
                              large-scale project wasn’t an option.
                                  So the IDRF came up with a plan the Iowa Legislature couldn’t help but ap-
                              prove. The plan was simple: Build a data warehouse that would be entirely funded
                              using the additional tax revenue it generated by catching tax scofflaws.
                                  Development of the data warehouse began in November 1999, and it became
                              operational five months later. The system combines data from the department’s
                              own tax and accounts receivable systems, tax files shared by the federal Internal
                              Revenue Service, the Iowa Workforce Development Agency, and a number of other
                              sources. Revenue- and finance-department employees analyze the data using com-
                              mercially available reporting software.
                                  In the three years since it went live, the IDRF data warehouse has generated $28
                              million in tax revenue and is expected to generate $10 million each year from now
                              on. There’s no question the project has paid for itself many times over, and the state
                              of Iowa is sold on the value of data warehousing. The next step is to use the data
                              warehouse to better understand why taxpayers might be in noncompliance. That will
                              involve analyzing taxpayer demographics and changes in tax laws and policies. This
                              phase of the project is also expected to generate revenues for the state while simulta-
                              neously helping to improve the tax laws for the citizens of Iowa [12, 13].
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien

Weitere ähnliche Inhalte

Was ist angesagt?

Management Information System & Technology
Management Information System & TechnologyManagement Information System & Technology
Management Information System & Technology
Akash Jauhari
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 
Databases
DatabasesDatabases
Databases
UMaine
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Information system
Information systemInformation system
Information system
hiddensoul
 
Enterprise Information Systems
Enterprise Information SystemsEnterprise Information Systems
Enterprise Information Systems
Goutama Bachtiar
 

Was ist angesagt? (20)

Management Information System & Technology
Management Information System & TechnologyManagement Information System & Technology
Management Information System & Technology
 
Manufacturing
ManufacturingManufacturing
Manufacturing
 
Introduction to Information System
Introduction to Information SystemIntroduction to Information System
Introduction to Information System
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse Requirements
 
Computer based information system
Computer based information systemComputer based information system
Computer based information system
 
Databases
DatabasesDatabases
Databases
 
Itm
ItmItm
Itm
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Information system
Information systemInformation system
Information system
 
Capturing Data Requirements
Capturing Data RequirementsCapturing Data Requirements
Capturing Data Requirements
 
Core Concepts in Information Systems
Core Concepts in Information SystemsCore Concepts in Information Systems
Core Concepts in Information Systems
 
Enterprise Information Systems
Enterprise Information SystemsEnterprise Information Systems
Enterprise Information Systems
 
Types of information system.
Types of information system.Types of information system.
Types of information system.
 
Mis 03 management information systems
Mis 03  management information systemsMis 03  management information systems
Mis 03 management information systems
 
Metadata in Business Intelligence
Metadata in Business IntelligenceMetadata in Business Intelligence
Metadata in Business Intelligence
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Management Information System (Full Notes)
Management Information System (Full Notes)Management Information System (Full Notes)
Management Information System (Full Notes)
 
information system and types umer amin slideshare
information system and types umer amin slideshareinformation system and types umer amin slideshare
information system and types umer amin slideshare
 
AtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White PapaerAtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White Papaer
 

Andere mochten auch

Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Andres Roa Gonzalez
 
Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...
Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...
Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...
Andres Roa Gonzalez
 
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - II Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - II ParteCapítulo VI. Sistemas de Información Gerencial, James O´Brien. - II Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - II Parte
Andres Roa Gonzalez
 
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - I Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - I ParteCapítulo VI. Sistemas de Información Gerencial, James O´Brien. - I Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - I Parte
Andres Roa Gonzalez
 
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...
Andres Roa Gonzalez
 
Debe mi Compañía Implantar un sistema empresarial
Debe mi Compañía Implantar un sistema empresarialDebe mi Compañía Implantar un sistema empresarial
Debe mi Compañía Implantar un sistema empresarial
Andres Roa Gonzalez
 
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Andres Roa Gonzalez
 
Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Andres Roa Gonzalez
 
Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Andres Roa Gonzalez
 
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...
Andres Roa Gonzalez
 
Lectura Capitulo 2. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 2. Sistemas de Información Gerencial, James O´BrienLectura Capitulo 2. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 2. Sistemas de Información Gerencial, James O´Brien
Andres Roa Gonzalez
 
Lectura Capitulo 1. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 1. Sistemas de Información Gerencial, James O´BrienLectura Capitulo 1. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 1. Sistemas de Información Gerencial, James O´Brien
Andres Roa Gonzalez
 
Invitación Innovación SAP Colombia UAP
Invitación Innovación SAP Colombia UAPInvitación Innovación SAP Colombia UAP
Invitación Innovación SAP Colombia UAP
Andres Roa Gonzalez
 

Andere mochten auch (20)

Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo V. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
 
Chapter 7 o-brien 13e
Chapter 7   o-brien 13eChapter 7   o-brien 13e
Chapter 7 o-brien 13e
 
Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...
Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...
Capítulo VII. Sistemas de Información Gerencial, James O´Brien Por el Profeso...
 
Chapter 6 o-brien 13e
Chapter 6 o-brien 13eChapter 6 o-brien 13e
Chapter 6 o-brien 13e
 
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - II Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - II ParteCapítulo VI. Sistemas de Información Gerencial, James O´Brien. - II Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - II Parte
 
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - I Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - I ParteCapítulo VI. Sistemas de Información Gerencial, James O´Brien. - I Parte
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - I Parte
 
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION ERP. ...
 
Debe mi Compañía Implantar un sistema empresarial
Debe mi Compañía Implantar un sistema empresarialDebe mi Compañía Implantar un sistema empresarial
Debe mi Compañía Implantar un sistema empresarial
 
Implementacion SE Conceptos
Implementacion SE ConceptosImplementacion SE Conceptos
Implementacion SE Conceptos
 
Que es la Cadena de valor
Que es la Cadena de valorQue es la Cadena de valor
Que es la Cadena de valor
 
LOS SI EN LAS EMPRESAS
LOS SI EN LAS EMPRESASLOS SI EN LAS EMPRESAS
LOS SI EN LAS EMPRESAS
 
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
 
Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo XI. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
 
Chapter 11 - OBrien 13e
Chapter 11 - OBrien 13eChapter 11 - OBrien 13e
Chapter 11 - OBrien 13e
 
Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
Capítulo I. Sistemas de Información Gerencial, James O´Brien Por el Profesor ...
 
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION SCM. ...
 
Lectura Capitulo 2. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 2. Sistemas de Información Gerencial, James O´BrienLectura Capitulo 2. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 2. Sistemas de Información Gerencial, James O´Brien
 
Lectura Capitulo 1. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 1. Sistemas de Información Gerencial, James O´BrienLectura Capitulo 1. Sistemas de Información Gerencial, James O´Brien
Lectura Capitulo 1. Sistemas de Información Gerencial, James O´Brien
 
El balanced scorecard: ayudando a implantar la estrategia
El balanced scorecard: ayudando a implantar la estrategiaEl balanced scorecard: ayudando a implantar la estrategia
El balanced scorecard: ayudando a implantar la estrategia
 
Invitación Innovación SAP Colombia UAP
Invitación Innovación SAP Colombia UAPInvitación Innovación SAP Colombia UAP
Invitación Innovación SAP Colombia UAP
 

Ähnlich wie Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien

Foundations of BusinessIntelligence Databases and.docx
Foundations of BusinessIntelligence Databases and.docxFoundations of BusinessIntelligence Databases and.docx
Foundations of BusinessIntelligence Databases and.docx
hanneloremccaffery
 
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
SakkaravarthiS1
 
Application Of A New Database Management System
Application Of A New Database Management SystemApplication Of A New Database Management System
Application Of A New Database Management System
Pamela Wright
 

Ähnlich wie Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien (20)

Foundations of BusinessIntelligence Databases and.docx
Foundations of BusinessIntelligence Databases and.docxFoundations of BusinessIntelligence Databases and.docx
Foundations of BusinessIntelligence Databases and.docx
 
Database Systems Essay
Database Systems EssayDatabase Systems Essay
Database Systems Essay
 
Unit Ii
Unit IiUnit Ii
Unit Ii
 
Week 5
Week 5Week 5
Week 5
 
Week 5
Week 5Week 5
Week 5
 
Data Management
Data ManagementData Management
Data Management
 
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...DATABASE MANAGEMENT SYSTEMS  university course materials useful for students ...
DATABASE MANAGEMENT SYSTEMS university course materials useful for students ...
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdf
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptxIntroduction-to-Databases.pptx
Introduction-to-Databases.pptx
 
obrien13e_chap005.ppt
obrien13e_chap005.pptobrien13e_chap005.ppt
obrien13e_chap005.ppt
 
obrien13e_chap005.ppt
obrien13e_chap005.pptobrien13e_chap005.ppt
obrien13e_chap005.ppt
 
Database 1 Introduction
Database 1   IntroductionDatabase 1   Introduction
Database 1 Introduction
 
Module03
Module03Module03
Module03
 
Managing Data Strategically
Managing Data StrategicallyManaging Data Strategically
Managing Data Strategically
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Application Of A New Database Management System
Application Of A New Database Management SystemApplication Of A New Database Management System
Application Of A New Database Management System
 
Chap05 Data Resource Management
Chap05 Data Resource ManagementChap05 Data Resource Management
Chap05 Data Resource Management
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 
Database Essay
Database EssayDatabase Essay
Database Essay
 

Mehr von Andres Roa Gonzalez

Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...
Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...
Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...
Andres Roa Gonzalez
 
Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...
Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...
Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...
Andres Roa Gonzalez
 
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...
Andres Roa Gonzalez
 
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...
Andres Roa Gonzalez
 
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Andres Roa Gonzalez
 

Mehr von Andres Roa Gonzalez (8)

Chapter 10 o-brien_13e
Chapter 10   o-brien_13eChapter 10   o-brien_13e
Chapter 10 o-brien_13e
 
Entornos Virtuales - Second Life
Entornos Virtuales - Second LifeEntornos Virtuales - Second Life
Entornos Virtuales - Second Life
 
Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...
Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...
Capítulo X. Sistemas de Información Gerencial, James O´Brien. Por el Profesor...
 
Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...
Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...
Capítulo IX. Sistemas de Información Gerencial, James O´Brien. Por el Profeso...
 
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...
Capítulo VIII. Sistemas de Información Gerencial, James O´Brien. SESION CRM. ...
 
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...
Capítulo VI. Sistemas de Información Gerencial, James O´Brien. - Por el Profe...
 
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
Capítulo II. Sistemas de Información Gerencial, James O´Brien Por el Profesor...
 
IMPLEMENTACIÓN SE CONCEPTOS
IMPLEMENTACIÓN SE CONCEPTOSIMPLEMENTACIÓN SE CONCEPTOS
IMPLEMENTACIÓN SE CONCEPTOS
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien

  • 1. Management Challenges CHAPTER 5 Business Applications Module II Information Technologies Development Foundation Processes Concepts DATA RESOURCE MANAGEMENT Chapter Highlights Learning Objectives Section I After reading and studying this chapter, you should Technical Foundations of Database Management be able to: Real World Case: Harrah’s Entertainment and Others: 1. Explain the business value of implementing data Protecting the Data Jewels resource management processes and technologies Database Management in an organization. Fundamental Data Concepts 2. Outline the advantages of a database management Database Structures approach to managing the data resources of a Database Development business, compared to a file processing approach. Section II 3. Explain how database management software helps Managing Data Resources business professionals and supports the operations Real World Case: Emerson and Sanofi: Data Stewards and management of a business. Seek Data Conformity Data Resource Management 4. Provide examples to illustrate each of the Types of Databases following concepts: Data Warehouses and Data Mining a. Major types of databases. Traditional File Processing b. Data warehouses and data mining. The Database Management Approach c. Logical data elements. Real World Case: Acxiom Corporation: Data d. Fundamental database structures. Demands Respect e. Database development. 149
  • 2. 150 ● Module II / Information Technologies SECTION I Technical Foundations of Database Management Database Just imagine how difficult it would be to get any information from an information sys- tem if data were stored in an unorganized way, or if there were no systematic way to Management retrieve them. Therefore, in all information systems, data resources must be organized and structured in some logical manner so that they can be accessed easily, processed ef- ficiently, retrieved quickly, and managed effectively. Data structures and access meth- ods ranging from simple to complex have been devised to efficiently organize and access data stored by information systems. In this chapter, we will explore these con- cepts, as well as the managerial implications and value of data resource management. See Figure 5.1. Read the Real World Case on data resources in the casino gaming and hospitality industry. We can learn a lot from this case about the importance of protecting the data resources of the organization. Fundamental Before we go any further, let’s discuss some fundamental concepts about how data are organized in information systems. A conceptual framework of several levels of data has Data Concepts been devised that differentiates between different groupings, or elements, of data. Thus, data may be logically organized into characters, fields, records, files, and data- bases, just as writing can be organized in letters, words, sentences, paragraphs, and documents. Examples of these logical data elements are shown in Figure 5.2. Character The most basic logical data element is the character, which consists of a single alpha- betic, numeric, or other symbol. You might argue that the bit or byte is a more ele- mentary data element, but remember that those terms refer to the physical storage elements provided by the computer hardware, discussed in Chapter 3. Using that un- derstanding, one way to think of a character is that it is a byte used to represent a par- ticular character. From a user’s point of view (that is, from a logical as opposed to a physical or hardware view of data), a character is the most basic element of data that can be observed and manipulated. Field The next higher level of data is the field, or data item. A field consists of a grouping of related characters. For example, the grouping of alphabetic characters in a person’s name may form a name field (or typically, last name, first name, and middle initial fields), and the grouping of numbers in a sales amount forms a sales amount field. Specifically, a data field represents an attribute (a characteristic or quality) of some entity (object, person, place, or event). For example, an employee’s salary is an attribute that is a typical data field used to describe an entity who is an employee of a business. Generally speaking, fields are organized such that they represent some logi- cal order. For example, last_name, first_name, address, city, state, zipcode, and so on. Record All of the fields used to describe the attributes of an entity are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. An example is a person’s payroll record, which consists of data fields describing attributes such as the person’s name, Social Security number, and rate of pay. Fixed-length records contain a fixed number of fixed-length data fields. Variable-length records contain a variable number of fields and field lengths. Another way of looking at a record is that it represents a single instance of an entity. Each record in an employee file describes one specific employee. File A group of related records is a data file, or table. Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application
  • 3. Chapter 5 / Data Resource Management ● 151 1 REAL WORLD Harrah’s Entertainment and Others: CASE Protecting the Data Jewels I n the casino industry, one of the most valuable assets is the dossier that casinos keep on their affluent customers, the high rollers. But in 2003, casino operator Harrah’s Enter- tainment Inc. filed a lawsuit in Placer County, California, Superior Court charging that a former employee had copied lists. Through these documents, employees “acknowledge that they will be introduced to this information and agree not to disclose it on departure from the company,” says Suzanne Labrit, a partner at law firm Shutts & Bowen LLP in West Palm Beach, Florida. the records of up to 450 wealthy customers before leaving Although most states have enacted trade-secrets laws, the company to work at competitor Thunder Valley Casino Labrit says they have different attitudes about enforcing these in Lincoln, California. laws with regard to customer lists. “But as a starting point, at The complaint said the employee was seen printing the least you have this understanding [with employees] that the list—which included names, contact information, and credit customer information is being treated as confidential,” Labrit and account histories—from a Harrah’s database. It also says. Then if an employee leaves to work for a competitor and alleged that he tried to lure those players to Thunder Valley. uses this protected customer data, the employer will more The employee denies the charge of stealing Harrah’s trade likely be able to take legal action to stop the activity. “If you secrets, and the case was still pending at this writing, but don’t treat it as confidential information internally,” she says, many similar cases have been filed in the past 20 years, legal “the court will not treat it as confidential information, either.” experts say. It’s also important to educate employees about the While savvy companies are using business intelligence confidentiality of customer lists, because many people and customer relationship management systems to identify wrongly assume they’re public information, says Tim their most profitable customers, there’s a genuine danger Headley, a partner at the Houston law firm of Gardere of that information falling into the wrong hands. Broader Wynne Sewell LLP. “Most people think they can take the access to those applications and the trend toward employees lists with them,” he says. “You have to show that you’ve switching jobs more frequently have made protecting cus- kept it a secret and told employees it’s a valuable secret. tomer lists an even greater priority. [Customer lists] are at the core of how you bring revenue Fortunately, there are managerial, legal, and technologi- into the company. These are the decision-makers who are cal steps you can take to help prevent, or at least discourage, willing to buy your product.” departing employees from walking out the door with this From a management and process standpoint, organiza- vital information. tions should try to limit access to customer lists to only For starters, organizations should make sure that certain employees, such as sales representatives, who need the employees—particularly those with frequent access to cus- information to do their jobs. “If you make it broadly avail- tomer information—sign nondisclosure, noncompete, and able to employees, then it’s not considered confidential,” says nonsolicitation agreements that specifically mention customer Labrit. Physical security should also be considered, Labrit says. FIGURE 5.1 Visitors such as vendors shouldn’t be permitted to roam free in the hallways or into conference rooms. And security poli- cies, such as a requirement that all computer systems have strong password protection, should be strictly enforced. Companies should instantly shut down access to com- puters and networks when employees leave, whether the rea- son is a layoff or a move to a new job. At the exit interview, the employee should be reminded of any signed agreements and corporate policies regarding customer lists and other confidential information. Employees should be told to turn over anything, including data that belongs to the company. In addition, employers should track the activities of em- ployees who’ve given notice but will be around for a while. This includes monitoring systems to see if the employee is e-mailing company-owned documents outside the company. Some organizations rely on technology to help prevent While data management is a strategic initiative in the loss of customer lists and other critical data. Inflow Inc., every modern organization, those in the gaming a Denver-based provider of managed Web hosting services, industry believe their success lies in the protection uses a product from Opsware Inc. in Sunnyvale, California, and strategic management of their data resources. that lets managers control access to specific systems, such as databases, from a central location. Source: Jose Luis Palaez, Inc./Corbis.
  • 4. 152 ● Module II / Information Technologies The company also uses an e-mail-scanning service that Vijay Sonty, chief technology officer at advertising firm allows it to analyze messages that it suspects might contain Foote Cone & Belding Worldwide in New York, says losing proprietary files, says Lenny Monsour, general manager customer information to competitors is a growing concern, of application hosting and management. Inflow combines particularly in industries where companies go after many of the use of this technology with practices such as monitoring the same clients. employees who have access to data considered vital to the “We have a lot of account executives who are very close company. to the clients and have access to client lists,” Sonty says. “If A major financial services provider is using a firewall an account executive leaves to join a competitor, he can take from San Francisco-based Vontu Inc. that monitors out- all this confidential information.” The widespread sharing of bound e-mail, Webmail, Web posts, and instant messages to corporate data, such as customer contact information, has ensure that no confidential data leave the company. The made it easier for people to do their jobs, but it has also software includes search algorithms and can be customized increased the risk of losing confidential data, Sonty says. to automatically detect specific types of data such as lists on He says the firm, which mandates that some employees a spreadsheet or even something as granular as a customer’s sign noncompete agreements, is looking into policies and Social Security number. The firm began using the product guidelines regarding the proper use of customer informa- after it went through layoffs in 2000 and 2001. tion, as well as audit trails to see who’s accessing customer “Losing customer information was a primary concern of lists. “I think it makes good business sense to take precau- ours,” says the firm’s chief information security officer, who tions and steps to prevent this from happening,” Sonty says. asked to not be identified. “We were concerned about people “We could lose a lot of money if key people leave.” leaving and sending e-mail to their home accounts.” In fact, he says, before using the firewall, the company had trouble with departing employees taking intellectual property and Source: Adapted from Bob Violino, “Protecting the Data Jewels: Valuable using it in their new jobs at rival firms, which sometimes led Customer Lists,” Computerworld, July 19, 2004. Copyright © 2004 by to lawsuits. Computerworld Inc., Framingham, MA 01701. All rights reserved. CASE STUDY QUESTIONS REAL WORLD ACTIVITIES 1. Why have developments in IT helped to increase the 1. Companies are increasingly adopting a position that value of the data resources of many companies? data is an asset that must be managed with the same 2. How have these capabilities increased the security chal- level of attention as that of cash and other capital. lenges associated with protecting a company’s data Using the Internet, see if you can find examples of how resources? companies treat their data. Does there seem to be any relationship between companies that look at their data 3. How can companies use IT to meet the challenges of as an asset and companies that are highly successful in data resource security? their respective industries? The Real World Case above illustrates how valuable data re- estimated that any firm in the financial industry would have sources are to the casino industry. Break into small groups a life expectancy of less than 100 hours if they were placed in with your classmates, and discuss other industries where a position where they could not access their organizational their data are clearly their lifeblood. For example, it has been data. Do you agree with this estimate?
  • 5. Chapter 5 / Data Resource Management ● 153 FIGURE 5.2 Examples of the logical data elements in information systems. Note especially the examples of how data fields, records, files, and databases are related. Human Resource Database Payroll File Benefits File Employee Employee Employee Employee Record 1 Record 2 Record 3 Record 4 Name SS No. Salary Name SS No. Salary Name SS No. Insurance Name SS No. Insurance Field Field Field Field Field Field Field Field Field Field Field Field Jones T. A. 275-32-3874 20,000 Klugman J. L. 349-88-7913 28,000 Alvarez J.S. 542-40-3718 100,000 Porter M.L. 617-87-7915 50,000 for which they are primarily used, such as a payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file. Files are also classified by their permanence, for example, a payroll master file versus a payroll weekly transac- tion file. A transaction file, therefore, would contain records of all transactions occur- ring during a period and might be used periodically to update the permanent records contained in a master file. A history file is an obsolete transaction or master file retained for backup purposes or for long-term historical storage called archival storage. Database A database is an integrated collection of logically related data elements. A database consolidates records previously stored in separate files into a common pool of data elements that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of storage devices on which they are stored. Thus, databases contain data elements describing entities and relationships among entities. For example, Figure 5.3 outlines some of the entities and relationships in a FIGURE 5.3 Some of the entities and Electric Utility Database relationships in a simplified electric utility database. Note a few of the business Billing Payment applications that access the Entities: processing data in the database. Customers, meters, bills, payments, meter readings Meter Service Relationships: reading start / stop Bills sent to customers, customers make payments, customers use meters, . . . Source: Adapted from Michael V. Mannino, Database Application Development and Design (Burr Ridge, IL: McGraw-Hill/Irwin, 2001), p. 6.
  • 6. 154 ● Module II / Information Technologies database for an electric utility. Also shown are some of the business applications (billing, payment processing) that depend on access to the data elements in the database. Database The relationships among the many individual data elements stored in databases are based on one of several logical data structures, or models. Database management sys- Structures tem packages are designed to use a specific data structure to provide end users with quick, easy access to information stored in databases. Five fundamental database struc- tures are the hierarchical, network, relational, object-oriented, and multidimensional models. Simplified illustrations of the first three database structures are shown in Figure 5.4. Hierarchical Early mainframe DBMS packages used the hierarchical structure, in which the rela- Structure tionships between records form a hierarchy or treelike structure. In the traditional hierarchical model, all records are dependent and arranged in multilevel structures, FIGURE 5.4 Hierarchical Structure Example of three Department fundamental database Data Element structures. They represent three basic ways to develop and express the relationships among the Project A Project B data elements in a database. Data Element Data Element Employee 1 Employee 2 Data Element Data Element Network Structure Department A Department B Employee Employee Employee 1 2 3 Project Project A B Relational Structure Department Table Employee Table Deptno Dname Dloc Dmgr Empno Ename Etitle Esalary Deptno Dept A Emp 1 Dept A Dept B Emp 2 Dept A Dept C Emp 3 Dept B Emp 4 Dept B Emp 5 Dept C Emp 6 Dept B
  • 7. Chapter 5 / Data Resource Management ● 155 consisting of one root record and any number of subordinate levels. Thus, all of the relationships among records are one-to-many, since each data element is related to only one element above it. The data element or record at the highest level of the hierarchy (the department data element in this illustration) is called the root element. Any data element can be accessed by moving progressively downward from a root and along the branches of the tree until the desired record (for example, the employee data element) is located. Network Structure The network structure can represent more complex logical relationships and is still used by some mainframe DBMS packages. It allows many-to-many relationships among records; that is, the network model can access a data element by following one of several paths, because any data element or record can be related to any number of other data elements. For example, in Figure 5.4, departmental records can be related to more than one employee record, and employee records can be related to more than one project record. Thus, you could locate all employee records for a particular department, or all project records related to a particular employee. Relational Structure The relational model is the most widely used of the three database structures. It is used by most microcomputer DBMS packages, as well as by most midrange and mainframe systems. In the relational model, all data elements within the database are viewed as being stored in the form of simple two-dimensional tables sometimes referred to as relations. The tables in a relational database have rows and columns. Each row repre- sents a single record in the file, and each column represents a field. Figure 5.4 illustrates the relational database model with two tables representing some of the relationships among departmental and employee records. Other tables, or rela- tions, for this organization’s database might represent the data element relationships among projects, divisions, product lines, and so on. Database management system pack- ages based on the relational model can link data elements from various tables to provide information to users. For example, a manager might want to retrieve and display an employee’s name and salary from the employee table in Figure 5.4, and the name of the employee’s department from the department table, by using their common department number field (Deptno) to link or join the two tables. See Figure 5.5. The relational model can relate data in any one file with data in another file if both files share a com- mon data element or field. Because of this, information can be created by retrieving data from multiple files even if they are not all stored in the same physical location. Relational Three basic operations can be performed on a relational database to create useful sets Operations of data. The select operation is used to create a subset of records that meet a stated cri- terion. For example, a select operation might be used on an employee database to create a subset of records that contain all employees who make more than $30,000 per year and who have been with the company more than three years. Another way to think of the select operation is that it temporarily creates a table whose rows have records that meet the selection criteria. FIGURE 5.5 Department Table Employee Table Joining the Employee and Deptno Dname Dloc Dmgr Empno Ename Etitle Esalary Deptno Department tables in a Dept A Emp 1 Dept A relational database enables Dept B Emp 2 Dept A you to selectively access Dept C Emp 3 Dept B data in both tables at the Emp 4 Dept B same time. Emp 5 Dept C Emp 6 Dept B
  • 8. 156 ● Module II / Information Technologies The join operation can be used to temporarily combine two or more tables so that a user can see relevant data in a form that looks like it is all in one big table. Using this operation, a user can ask for data to be retrieved from multiple files or databases without having to go to each one separately. Finally, the project operation is used to create a subset of the columns contained in the temporary tables created by the select and join operations. Just as the select oper- ation creates a subset of records that meet stated criteria, the project operation creates a subset of the columns, or fields, that the user wants to see. Using a project operation, the user can decide not to view all of the columns in the table but only those that have data necessary to answer a particular question or to construct a specific report. Because of the widespread use of the relational model, an abundance of commer- cial products exists to create and manage them. Leading mainframe relational database applications include Oracle 10g from Oracle Corp. and DB2 from IBM. A very popu- lar midrange database application is SQL server from Microsoft. The most commonly used database application for the PC is Microsoft Access. Multidimensional The multidimensional model is a variation of the relational model that uses multidi- Structure mensional structures to organize data and express the relationships between data. You can visualize multidimensional structures as cubes of data and cubes within cubes of data. Each side of the cube is considered a dimension of the data. Figure 5.6 is an example that shows that each dimension can represent a different category, such as product type, region, sales channel, and time [5]. FIGURE 5.6 An example of the different dimensions of a multidimensional database. Denver Profit Los Angeles Total Expenses San Francisco Margin West COGS February March East West East Sales Actual Budget Actual Budget Actual Budget Actual Budget Sales Camera TV January TV February VCR March Audio Qtr 1 Margin Camera VCR January TV February VCR March Audio Qtr 1 April April Qtr 1 Qtr 1 March March February February Actual Budget Sales Margin January January Sales Margin Sales Margin TV VCR TV VCR TV East East Actual West Budget South Forecast Total Variance VCR East West Actual West Budget South Forecast Total Variance
  • 9. Chapter 5 / Data Resource Management ● 157 FIGURE 5.7 Bank Account Object The checking and savings Attributes account objects can inherit Customer common attributes and Balance operations from the bank Interest account object. Operations Deposit (amount) Withdraw (amount) Get owner Inheritance Inheritance Checking Account Object Savings Account Object Attributes Attributes Credit line Number of withdrawals Monthly statement Quarterly statement Operations Operations Calculate interest owed Calculate interest paid Print monthly statement Print quarterly statement Source: Adapted from Ivar Jacobsen, Maria Ericsson, and Ageneta Jacobsen, The Object Advantage: Business Process Reengineering with Object Technology (New York: ACM Press, 1995), p. 65. Copyright © 1995, Association for Computing Machinery. By permission. Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions. For example, a single cell may contain the total sales for a product in a region for a specific sales channel in a single month. A major benefit of multidimensional databases is that they are a compact and easy-to-understand way to visualize and manipulate data elements that have many interrelationships. So multidimensional databases have become the most popular database structure for the analytical databases that support online analytical processing (OLAP) applications, in which fast answers to complex business queries are expected. We discuss OLAP applications in Chapter 9. Object-Oriented The object-oriented model is considered to be one of the key technologies of a new Structure generation of multimedia Web-based applications. As Figure 5.7 illustrates, an object consists of data values describing the attributes of an entity, plus the operations that can be performed upon the data. This encapsulation capability allows the object- oriented model to more easily handle complex types of data (graphics, pictures, voice, text) than other database structures. The object-oriented model also supports inheritance; that is, new objects can be automatically created by replicating some or all of the characteristics of one or more parent objects. Thus, in Figure 5.7, the checking and savings account objects can both inherit the common attributes and operations of the parent bank account object. Such capabilities have made object-oriented database management systems (OODBMS) popular in computer-aided design (CAD) and in a growing number of applications. For exam- ple, object technology allows designers to develop product designs, store them as ob- jects in an object-oriented database, and replicate and modify them to create new product designs. In addition, multimedia Web-based applications for the Internet and corporate intranets and extranets have become a major application area for object technology. Object technology proponents argue that an object-oriented DBMS can work with complex data types such as document and graphic images, video clips, audio segments, and other subsets of Web pages much more efficiently than relational database management systems. However, major relational DBMS vendors have countered by
  • 10. 158 ● Module II / Information Technologies FIGURE 5.8 This claims analysis graphics display provided by the CleverPath enterprise portal is powered by the Jasmine ii object- oriented database management system of Computer Associates. Source: Courtesy of Computer Associates. adding object-oriented modules to their relational software. Examples include multi- media object extensions to IBM’s DB2, and Oracle’s object-based “cartridges” for Oracle 10g. See Figure 5.8. Evaluation of The hierarchical data structure was a natural model for the databases used for the Database Structures structured, routine types of transaction processing characteristic of many business op- erations in the early years of data processing and computing. Data for these operations can easily be represented by groups of records in a hierarchical relationship. However, as time progressed, there were many cases where information was needed about records that did not have hierarchical relationships. For example, in some organizations, employees from more than one department can work on more than one project (refer back to Figure 5.4). A network data structure could easily handle this many-to-many relationship, whereas a hierarchical model could not. As such, the more flexible net- work structure became popular for these types of business operations. However, like the hierarchical structure, because its relationships must be specified in advance, the network model was unable to easily handle ad hoc requests for information, thus pointing out the need for the relational model. Relational databases allow an end user to easily receive information in response to ad hoc requests. That’s because not all of the relationships between the data elements in a relationally organized database need to be specified when the database is created. Database management software (such as Oracle 10g, DB2, Access, and Approach) cre- ates new tables of data relationships by using parts of the data from several tables. Thus, relational databases are easier for programmers to work with and easier to main- tain than the hierarchical and network models. The major limitation of the relational model is that relational database manage- ment systems cannot process large amounts of business transactions as quickly and efficiently as those based on the hierarchical and network models, or process com- plex, high-volume applications as well as the object-oriented model. This performance gap has narrowed with the development of advanced relational database software with object-oriented extensions. The use of database management software based on the object-oriented and multidimensional models is growing steadily, as these tech- nologies are playing a greater role for OLAP and Web-based applications.
  • 11. Chapter 5 / Data Resource Management ● 159 Experian Experian Inc. (www.experian.com), a unit of London-based GUS PLC, runs one of Automotive: The the largest credit reporting agencies in the United States. But Experian wanted to expand its business beyond credit checks for automobile loans. If it could collect Business Value vehicle data from the various motor-vehicle departments in the United States and of Relational blend that with other data, such as change-of-address records, then its Experian Database Automotive division could sell the enhanced data to a variety of customers. For example, car dealers could use the data to make sure their inventory matches local Management buying preferences. And toll collectors could match license plates to addresses to find motorists who sail past tollbooths without paying. But to offer new services, Experian first needed a way to extract, transfer, and load data from the systems of 50 different U.S. state departments of motor vehicles (DMVs), plus Puerto Rico, into a single database. That was a big challenge. “Unlike the credit industry that writes to a common format, the DMVs do not,” says Ken Kauppila, vice president of IT at Experian Automotive in Costa Mesa, California. Of course, Experian didn’t want to replicate the hodgepodge of file formats it inherited when the project began in January 1999—175 formats among 18,000 files. So Kauppila decided to transform and map the data to a common relational database format. Fortunately, off-the-shelf software tools for extracting, transforming, and loading data (called ETL tools) make it economical to combine very large data repositories. Using ETL Extract from Evolutionary Technologies, Experian created a database that can incorporate vehicle information within 48 hours of its entry into any of the nation’s DMV computers. This is one of the areas in which data management soft- ware tools can excel, says Guy Creese, analyst at Aberdeen Group in Boston. “It can simplify the mechanics of multiple data feeds, and it can add to data quality, making fixes possible before errors are propagated to data warehouses,” he says. Using the ETL extraction and transformation tools along with IBM’s DB2 data- base system, Experian Automotive created a database that processes 175 million transactions per month and has created a variety of profitable new revenue streams. Experian’s automotive database is the 10th largest database in the world—now, with up to 16 billion rows of data. But the company says the relational database is man- aged by just three IT professionals. Experian says this demonstrates how efficiently database software like DB2 and the ETL tools can work with a large database to handle vast amounts of data quickly. Database Database management packages like Microsoft Access or Lotus Approach allow end users to easily develop the databases they need. See Figure 5.9. However, large orga- Development nizations usually place control of enterprisewide database development in the hands of database administrators (DBAs) and other database specialists. This improves the in- tegrity and security of organizational databases. Database developers use the data def- inition language (DDL) in database management systems like Oracle 10g or IBM’s DB2 to develop and specify the data contents, relationships, and structure of each database, and to modify these database specifications when necessary. Such information is cata- loged and stored in a database of data definitions and specifications called a data dictio- nary, or metadata repository, which is managed by the database management software and maintained by the DBA. A data dictionary is a database management catalog or directory containing metadata, that is, data about data. A data dictionary relies on a specialized database software component to manage a database of data definitions, that is, metadata about the structure, data elements, and other characteristics of an organization’s databases. For example, it contains the names and descriptions of all types of data records and their interrelationships, as well as information outlining requirements for end users’ access and use of application programs, and database maintenance and security.
  • 12. 160 ● Module II / Information Technologies FIGURE 5.9 Creating a database table using the Table Wizard of Microsoft Access. Source: Courtesy of Microsoft Corp. Data dictionaries can be queried by the database administrator to report the status of any aspect of a firm’s metadata. The administrator can then make changes to the definitions of selected data elements. Some active (versus passive) data dictionaries automatically enforce standard data element definitions whenever end users and ap- plication programs access an organization’s databases. For example, an active data dic- tionary would not allow a data entry program to use a nonstandard definition of a customer record, nor would it allow an employee to enter a name of a customer that exceeded the defined size of that data element. Developing a large database of complex data types can be a complicated task. Data- base administrators and database design analysts work with end users and systems analysts to model business processes and the data they require. Then they determine (1) what data definitions should be included in the database and (2) what structure or relationships should exist among the data elements. Data Planning and As Figure 5.10 illustrates, database development may start with a top-down data plan- Database Design ning process. Database administrators and designers work with corporate and end user management to develop an enterprise model that defines the basic business process of the enterprise. Then they define the information needs of end users in a business process, such as the purchasing/receiving process that all businesses have. Next, end users must identify the key data elements that are needed to perform their specific business activities. This frequently involves developing entity relationship diagrams (ERDs) that model the relationships among the many entities involved in business processes. For example, Figure 5.11 illustrates some of the relationships in a purchasing/receiving process. ERDs are simply graphical models of the various files and their relationships contained within a database system. End users and data- base designers could use database management or business modeling software to help them develop ERD models for the purchasing/receiving process. This would help identify what supplier and product data are required to automate their purchasing/ receiving and other business processes using enterprise resource management (ERM) or supply chain management (SCM) software. You will learn about ERDs and other data modeling tools in much greater detail if you ever take a course in systems analysis and design.
  • 13. Chapter 5 / Data Resource Management ● 161 FIGURE 5.10 Database development 1. Data Planning Physical Data Models involves data planning and Develops a model of business Storage representations and database design activities. processes access methods Data models that support business processes are used to develop databases that meet the information needs 5. Physical Design of users. Enterprise model of business Determines the data storage processes with documentation structures and access methods Logical Data Models 2. Requirements Specification E.g., relational, network, Defines information needs of end hierarchical, multidimensional, users in a business process or object-oriented models Description of users’ needs may 4. Logical Design be represented in natural Translates the conceptual language or using the tools of a models into the data model of particular design methodology a DBMS 3. Conceptual Design Conceptual Data Models Expresses all information Often expressed as entity requirements in the form of a relationship models high-level model Such user views are a major part of a data modeling process where the relation- ships between data elements are identified. Each data model defines the logical rela- tionships among the data elements needed to support a basic business process. For example, can a supplier provide more than one type of product to us? Can a customer have more than one type of account with us? Can an employee have several pay rates or be assigned to several project workgroups? Answering such questions will identify data relationships that have to be repre- sented in a data model that supports a business process. These data models then serve as logical frameworks (called schemas and subschemas) on which to base the physical de- sign of databases and the development of application programs to support the business processes of the organization. A schema is an overall logical view of the relationships FIGURE 5.11 Ordered on Supplies Purchase This entity relationship Product Supplier Order Item diagram illustrates some of the relationships among the Stocked as entities (product, supplier, Contains warehouse, etc.) in a purchasing/receiving business process. Purchase Product Holds Warehouse Order Stock
  • 14. 162 ● Module II / Information Technologies FIGURE 5.12 Example of the logical and physical database views and the software interface of a banking services information system. Installment Checking Savings Loan Application Application Application Logical User Views Checking and Installment Data elements and relationships (the subschemas) needed Savings Loan for checking, savings, or installment loan processing Data Model Data Model Data elements and relationships (the schema) Banking Services Data Model needed for the support of all bank services Software Interface Database Management System The DBMS provides access to the bank’s databases Physical Data Views Organization and location of data on the storage media Bank Databases among the data elements in a database, while the subschema is a logical view of the data relationships needed to support specific end user application programs that will access that database. Remember that data models represent logical views of the data and relationships of the database. Physical database design takes a physical view of the data (also called the internal view) that describes how data are to be physically stored and accessed on the storage devices of a computer system. For example, Figure 5.12 illustrates these dif- ferent database views and the software interface of a bank database processing system. In this example, checking, savings, and installment lending are the business processes whose data models are part of a banking services data model that serves as a logical data framework for all bank services. Aetna: Insuring On a daily basis the operational services central support area at Aetna Inc. is Tons of Data responsible for 21.8 tons of data (174.6 terabytes [TB]). Over 119.2TB reside on mainframe-connected disk drives, while the remaining 55.4TB sit on disks attached to midrange computers. Almost all of this data are located in the com- pany’s headquarters in Hartford, Connecticut—with most of the information in relational databases. To make matters even more interesting, outside customers have access to about 20TB of the information. Four interconnected data centers containing 14 mainframes and more than 1,000 midrange servers process the data. It takes more than 4,100 direct-access storage devices to hold Aetna’s key databases.
  • 15. Chapter 5 / Data Resource Management ● 163 Most of Aetna’s ever-growing mountain of data is health care information. The insurance company maintains records for both health maintenance organization participants and customers covered by insurance policies. Aetna has detailed records of providers, such as doctors, hospitals, dentists, and pharmacies, and it keeps track of all the claims it has processed. Some of Aetna’s larger customers send tapes containing insured employee data; the firm is moving toward using the Internet to collect such data. If managing gigabytes of data is like flying a hang glider, managing multiple terabytes of data is like piloting a space shuttle: a thousand times more complex. You can’t just extrapolate from experiences with small and medium data stores to understand how to successfully manage tons of data. Even an otherwise mundane operation such as backing up a database can be daunting if the time needed to finish copying the data exceeds the time available. Data integrity, backup, security, and availability are collectively the Holy Grail of dealing with large data stores. The sheer volume of data makes these goals a challenge, and a highly decentralized environment complicates matters even more. Developing and adhering to standardized data maintenance proce- dures always provide an organization with the best return on their data dollar investment [9, 11].
  • 16. 164 ● Module II / Information Technologies SECTION II Managing Data Resources Data Resource Data are a vital organizational resource that needs to be managed like other important business assets. Today’s business enterprises cannot survive or succeed without quality Management data about their internal operations and external environment. With each online mouse click, either a fresh bit of data is created or already-stored data are retrieved from all those business websites. All that’s on top of the heavy demand for indus- trial-strength data storage already in use by scores of big corporations. What’s driving the growth is a crushing imperative for corporations to analyze every bit of information they can extract from their huge data warehouses for competitive advantage. That has turned the data storage and management function into a key strategic role of the information age [8]. That’s why organizations and their managers need to practice data resource man- agement, a managerial activity that applies information systems technologies like data- base management, data warehousing, and other data management tools to the task of managing an organization’s data resources to meet the information needs of their busi- ness stakeholders. This chapter will show you the managerial implications of using data resource management technologies and methods to manage an organization’s data assets to meet business information requirements. Read the Real World Case on data administration. We can learn a lot from this case about the challenges of managing the data within an organization. See Figure 5.13. Types of Continuing developments in information technology and its business applications have resulted in the evolution of several major types of databases. Figure 5.14 illus- Databases trates several major conceptual categories of databases that may be found in many organizations. Let’s take a brief look at some of them now. Operational Operational databases store detailed data needed to support the business processes Databases and operations of a company. They are also called subject area databases (SADB), trans- action databases, and production databases. Examples are a customer database, human re- source database, inventory database, and other databases containing data generated by business operations. For example, a human resource database like that shown earlier in Figure 5.2 would include data identifying each employee and his or her time worked, compensation, benefits, performance appraisals, training and development status, and other related human resource data. Figure 5.15 illustrates some of the common oper- ational databases that can be created and managed for a small business using Microsoft Access database management software. Distributed Many organizations replicate and distribute copies or parts of databases to network Databases servers at a variety of sites. These distributed databases can reside on network servers on the World Wide Web, on corporate intranets or extranets, or on other company networks. Distributed databases may be copies of operational or analytical databases, hypermedia or discussion databases, or any other type of database. Replication and dis- tribution of databases are done to improve database performance at end user worksites. Ensuring that the data in an organization’s distributed databases are consistently and concurrently updated is a major challenge of distributed database management. Distributed databases have both advantages and disadvantages. One primary ad- vantage of a distributed database lies with the protection of valuable data. If all of an organization’s data reside in a single physical location, any catastrophic event like a fire or damage to the media holding the data would result in an equally catastrophic loss of use of that data. By having databases distributed in multiple locations, the negative impact of such an event can be minimized.
  • 17. Chapter 5 / Data Resource Management ● 165 2 REAL WORLD Emerson and Sanofi: Data CASE Stewards Seek Data Conformity A customer is a customer is a customer, right? Actu- ally, it’s not that simple. Just ask Emerson Process Management, an Emerson Electric Co. unit in Austin that supplies process automation products. In 2000 the company attempted to build a data warehouse to store cus- “It’s usually a seesaw effect,” says Chris Enger, formerly manager of information management at Philip Morris USA Inc. “When something goes wrong, they put someone in charge of data quality, and when things get better, they pull those resources away.” tomer information from over 85 countries. The effort failed Creating a data quality team requires gathering people in large part because the structure of the warehouse couldn’t with an unusual mix of business, technology, and diplomatic accommodate the many variations on customers’ names. skills. It’s even difficult to agree on a job title. In Rybeck’s For instance, different users in different parts of the world department, they’re called “data analysts,” but titles at other might identify Exxon as Exxon, Mobil, Esso, or ExxonMobil, companies include “data quality control supervisor,” “data to name a few variations. The warehouse would see them as coordinator,” or “data quality manager.” separate customers, and that would lead to inaccurate results “When you say you want a data analyst, they’ll come when business users performed queries. back with a DBA [database administrator]. But it’s not the That’s when the company hired Nancy Rybeck as data same at all,” Rybeck says. “It’s not the data structure, it’s the administrator. Rybeck is now leading a renewed data ware- content.” house project that ensures not only the standardization of At Emerson, data analysts in each business unit review customer names, but also the quality and accuracy of cus- data and correct errors before it’s put into the operational tomer data, including postal addresses, shipping addresses, systems. They also research customer relationships, loca- and province codes. tions, and corporate hierarchies; train overseas workers to fix To accomplish this, Emerson has done something unusual: data in their native languages; and serve as the main contact It has started to build a department with 6 to 10 full-time “data with the data administrator and database architect for new stewards” dedicated to establishing and maintaining the quality requirements and bug fixes. of data entered into the operational systems that feed the data As the leader of the group, Rybeck plays a role that warehouse. includes establishing and communicating data standards, The practice of having formal data stewards is uncom- ensuring data integrity is maintained during database con- mon. Most companies recognize the importance of data versions, and doing the logical design for the data ware- quality, but many treat it as a “find-and-fix” effort, to be con- house tables. ducted at the end of a project by someone in IT. Others The stewards have their work cut out for them. Bringing casually assign the job to the business users who deal with the together customer records from the 75 business units yielded data head-on. Still others may throw resources at improving a 75 percent duplication rate, misspellings, and fields with data only when a major problem occurs. incorrect or missing data. “Most of the divisions would have sworn they had great FIGURE 5.13 processes and standards in place,” Rybeck says. “But when you show them they entered the customer name 17 different ways, or someone had entered, ‘Loading dock open 8:00–4:00’ into the address field, they realize it’s not as clean as they thought.” Although the data steward may report to IT—as is the case at Emerson and at pharmaceuticals company Sanofi- Synthelabo Inc.—it’s not a job for someone steeped in tech- nical knowledge. Yet it’s not right for a businessperson who’s a technophobe, either. Seth Cohen is the first data quality control supervisor at Sanofi in New York. He was hired in 2003 to help design au- tomated processes to ensure the data quality of the customer knowledge base that Sanofi was beginning to build. Data stewards at Sanofi need to have business knowledge because they need to make frequent judgment calls, Cohen says. Indeed, judgment is a big part of the data steward’s job—including the ability to determine where you don’t need 100 percent perfection. Cohen says that task is one of the biggest challenges of the job. “One-hundred percent accuracy is just not achievable,” Source: Flying Colours Ltd./Digital Vision/Getty Images
  • 18. 166 ● Module II / Information Technologies he says. “Some things you’re just going to have to let go or didn’t see why he was “causing them so many headaches and you’d have a data warehouse with only 15 to 20 records.” adding several extra steps to the process,” he says. A good example is when Sanofi purchases data on doctors There are many political traps as well. Take the issue of that includes their birth dates, Cohen says. If a birth date is defining “customer address.” If data comes from a variety given as February 31 or the number of the month is listed as of sources, you’re likely to get different types of coding 13 but the rest of the data are good, do you throw out all of schemes, some of which overlap. the data or just figure the birth date isn’t all that important? People may also argue about how data should be pro- It comes down to knowing how much it costs to fix the duced, he says. Should field representatives enter it from data versus the payback. “You can pay millions of dollars a their laptops? Or should it first be independently checked for year to get it perfect, but if the returns are in the hundreds of quality? Should it be uploaded hourly or weekly? thousands, is it worth it?” asks Chuck Kelley, senior advisory Most of all, data stewards need to understand that data consultant at Navigator Systems Inc., a corporate perfor- quality is a journey, not a destination. “It’s not a one-shot mance management consultancy in Addison, Texas. deal—it’s ongoing,” Rybeck of Emerson says. “You can’t quit Data stewards also need to be politically astute, diplo- after the first task.” matic, and good at conflict resolution—in part because the Source: Adapted from Mary Brandel, “Data Stewards Seek Data Conformity,” environment isn’t always friendly. When Cohen joined Computerworld, March 15, 2004. Copyright © 2004 by Computerworld Inc., Sanofi, some questioned why he was there. In particular, IT Framingham, MA 01701. All rights reserved. CASE STUDY QUESTIONS REAL WORLD ACTIVITIES 1. Why is the role of a data steward considered to be 1. As discussed in the case, the role of data steward is innovative? Explain. relatively new, and its creation is motivated by the 2. What are the business benefits associated with the data desire to protect the valuable data assets of the firm. steward program at Emerson? There are many job descriptions in the modern organi- zation associated with the strategic management of data 3. How does effective data resource management resources. Using the Internet, see if you can find evi- contribute to the strategic goals of an organization? dence of other job roles that are focused on the man- Provide examples from Emerson and others. agement of an organization’s data. How might a person train for these new jobs? 2. As more and more data are collected stored, processed, and disseminated by organizations, new and innovative ways to manage them must be developed. Break into small groups with your classmates, and discuss how the data resource management methods of today will need to evolve as more types of data emerge. Will we ever get to the point where we can manage our data in a completely automated manner?
  • 19. Chapter 5 / Data Resource Management ● 167 FIGURE 5.14 Examples of some of the major types of databases used by organizations and end users. External Databases on the Internet and Online Client PC Services Network Server Distributed Databases Operational on Intranets Databases and Other of the Networks Organization End User Data Data Databases Warehouse Marts Another advantage of distributed databases is found in their storage requirements. Often, a large database system may be distributed into smaller databases based on some logical relationship between the data and the location. For example, a company with several branch operations may distribute its data so that each branch operation location is also the location of its branch database. Because multiple databases in a distributed system can be joined together, each location has control of its local data while all other locations can access any database in the company if so desired. Distributed databases are not without some challenges, however. The primary chal- lenge is the maintenance of data accuracy. If a company distributes its database to FIGURE 5.15 Examples of operational databases that can be created and managed for a small business by microcomputer database management software like Microsoft Access. Source: Courtesy of Microsoft Corp.
  • 20. 168 ● Module II / Information Technologies multiple locations, any change to the data in one location must somehow be updated in all other locations. This can be accomplished in one of two ways: replication or duplication. Updating a distributed database using replication involves using a specialized soft- ware application that looks at each distributed database and then finds the changes made to it. Once these changes have been identified, the replication process makes all of the distributed databases look the same by making the appropriate changes to each one. The replication process is very complex and, depending upon the number and size of the distributed databases, can consume a lot of time and computer resources. The duplication process, in contrast, is much less complicated. It basically identi- fies one database as a master and then duplicates that database at a prescribed time af- ter hours so that each distributed location has the same data. One drawback to the duplication process is that no changes can ever be made to any database other than the master to avoid having local changes overwritten during the duplication process. Nonetheless, properly used, duplication and replication can keep all distributed locations current with the latest data. One additional challenge associated with distributed databases is the extra com- puting power and bandwidth necessary to access multiple databases in multiple loca- tions. We will look more closely at the issue of bandwidth in Chapter 6 when we focus on telecommunications and networks. External Databases Access to a wealth of information from external databases is available for a fee from commercial online services, and with or without charge from many sources on the World Wide Web. Websites provide an endless variety of hyperlinked pages of multi- media documents in hypermedia databases for you to access. Data are available in the form of statistics on economic and demographic activity from statistical databanks. Or you can view or download abstracts or complete copies of hundreds of newspapers, magazines, newsletters, research papers, and other published material and other peri- odicals from bibliographic and full text databases. Whenever you use a search engine like Google or Yahoo to look up something on the Internet, you are using an external database—a very, very large one! Hypermedia The rapid growth of websites on the Internet and corporate intranets and extranets has Databases dramatically increased the use of databases of hypertext and hypermedia documents. A website stores such information in a hypermedia database consisting of hyper- linked pages of multimedia (text, graphic, and photographic images, video clips, audio segments, and so on). That is, from a database management point of view, the set of interconnected multimedia pages at a website is a database of interrelated hypermedia page elements, rather than interrelated data records [2]. Figure 5.16 shows how you might use a Web browser on your client PC to connect with a Web network server. This server runs Web server software to access and transfer the FIGURE 5.16 The components of a Web-based information system include Web browsers, servers, and hypermedia databases. The Internet Intranets Web Extranets HTML Browser Web XML Server Web Pages Software Image Files Video Files Audio Files Client PCs Network Hypermedia Server Database
  • 21. Chapter 5 / Data Resource Management ● 169 FIGURE 5.17 The components of a complete data warehouse system. Operational, External, and Other Databases Analytical Data Store Data Enterprise Management Warehouse Data Marts Data Acquisition Data Analysis (Capture, clean, (Query, report, transform, transport, analyze, mine, load/apply) deliver) Metadata Metadata Directory Management Warehouse Metadata Repository Web Information Design Systems Source: Adapted courtesy of Hewlett-Packard. Web pages you request. The website illustrated in Figure 5.17 uses a hypermedia database consisting of Web page content described by HTML (Hypertext Markup Language) code or XML (Extensible Markup Language) labels, image files, video files, and audio. The Web server software acts as a database management system to manage the transfer of hypermedia files for downloading by the multimedia plug-ins of your Web browser. Data A data warehouse stores data that have been extracted from the various operational, external, and other databases of an organization. It is a central source of the data that Warehouses have been cleaned, transformed, and cataloged so they can be used by managers and and Data other business professionals for data mining, online analytical processing, and other Mining forms of business analysis, market research, and decision support. (We’ll talk in depth about all of these activities in Chapter 9.) Data warehouses may be subdivided into data marts, which hold subsets of data from the warehouse that focus on specific aspects of a company, such as a department or a business process. Figure 5.17 illustrates the components of a complete data warehouse system. No- tice how data from various operational and external databases are captured, cleaned, and transformed into data that can be better used for analysis. This acquisition process might include activities like consolidating data from several sources, filtering out un- wanted data, correcting incorrect data, converting data to new data elements, and aggregating data into new data subsets. This data is then stored in the enterprise data warehouse, from where it can be moved into data marts or to an analytical data store that holds data in a more useful form for cer- tain types of analysis. Metadata (data that defines the data in the data warehouse) is stored in a metadata repository and cataloged by a metadata directory. Finally, a variety of ana- lytical software tools can be provided to query, report, mine, and analyze the data for delivery via Internet and intranet Web systems to business end users. See Figure 5.18. Revenue: Closing In the late 1990s the state of Iowa had a tax gap, a polite way of describing compa- the Gap with a nies and individuals who either didn’t file state tax returns or who underreported their earnings. To identify noncompliant taxpayers, the Iowa Department of Data Warehouse Revenue and Finance (IDRF) relied on a jumble of nonintegrated mainframe applications, file extracts, and over 20 disparate stand-alone systems (databases,
  • 22. 170 ● Module II / Information Technologies FIGURE 5.18 Applications Data Marts A data warehouse and its data mart subsets hold data that have been extracted Finance ERP from various operational databases for business analysis, market research, decision support, and data mining applications. Inventory control Marketing Logistics Data Warehouse Sales Shipping Accounting Purchasing CRM Management reporting mainframe data, and information on individual spreadsheets, to name a few). The real problem was that none of these systems could communicate with each other. What was needed was a central data warehouse to pull together information from all those systems for analysis. But getting funding from the state for such a large-scale project wasn’t an option. So the IDRF came up with a plan the Iowa Legislature couldn’t help but ap- prove. The plan was simple: Build a data warehouse that would be entirely funded using the additional tax revenue it generated by catching tax scofflaws. Development of the data warehouse began in November 1999, and it became operational five months later. The system combines data from the department’s own tax and accounts receivable systems, tax files shared by the federal Internal Revenue Service, the Iowa Workforce Development Agency, and a number of other sources. Revenue- and finance-department employees analyze the data using com- mercially available reporting software. In the three years since it went live, the IDRF data warehouse has generated $28 million in tax revenue and is expected to generate $10 million each year from now on. There’s no question the project has paid for itself many times over, and the state of Iowa is sold on the value of data warehousing. The next step is to use the data warehouse to better understand why taxpayers might be in noncompliance. That will involve analyzing taxpayer demographics and changes in tax laws and policies. This phase of the project is also expected to generate revenues for the state while simulta- neously helping to improve the tax laws for the citizens of Iowa [12, 13].