Más contenido relacionado




  2. • In some data warehouse implementations, a data mart is a miniature data warehouse; • In others, it is just one segment of the data warehouse. • Data marts are often used to provide information to functional segments of the organization.
  3. When Data Mart Appropiate • Data marts are sometimes designed as complete individual data warehouses and contribute to the overall organization as a member of a distributed data warehouse. • In other designs, data marts receive data from a master data warehouse through periodic updates, in which case the data mart functionality is often limited to presentation services for clients. • Data Marts are created for the following reasons – To speed up work by reducing the volume of data scanned – To structure data for a user access tool – To partition data in order to impose access control strategies – To segment data into different hardware platform
  4. DESIGN OF DATA MART • Regardless of the functionality provided by data marts, they must be designed as components of the master data warehouse so that data organization, format, and schemas are consistent throughout the data warehouse. • Inconsistent table designs, update mechanisms, or dimension hierarchies can prevent data from being reused throughout the data warehouse, and they can result in inconsistent reports from the same data • Example: – it is unlikely that summary reports produced from a finance department data mart that organizes the sales force by management reporting structure will agree with summary reports produced from a sales department data mart that organizes the same sales force by geographical region. Before designing for data mart we must confirm that data mart solution is appropiate for the solution – Identify whether there is a natural functional split within the organization – Identify whether there is a natural split of data – Data marts should be designed from the perspective that they are components of the data warehouse regardless of their individual functionality or construction – This provides consistency and usability of information throughout the organization.
  5. IDENTIFY FUNCTIONAL SPLIT • We must see if the split will help the organisational benefit or not • Example – athe retail sales in a organisation in which merchant is responsible for sales.Their berief could be to maximize the sales by ensuring adequate sales. • In practice the information would be of value of: – Sales transaction on a daily level or to monitor actual sales – Sales forecast on weekly basis – Stock position daily basis – Stock movement on a daily basis .
  6. Importance of Data Mart • Easy access to frequently needed data • Creates collective view by a group of users • Improves end-user response time • Ease of creation • Lower cost than implementing a full Data warehouse • Potential users are more clearly defined than in a full Data warehouse
  7. META DATA • Metadata is loosely defined as data about data. • Metadata is a concept that applies mainly to electronically archived or presented data and is used to describe the – a) definition, – b) structure and – c) administration of data files with all contents in context to ease the use of the captured and archived data for further use. – example: a web page may include metadata specifying what language it's written in, what tools were used to create it, where to go for more on the subject and so on
  8. What is Meta data • Metadata (meta data, or sometimes metainformation) is "data about other data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, such as a database schema. In data processing, metadata provides information about, or documentation of, other data managed within an application or environment. This commonly defines the structure or schema of the primary data. – metadata would document data about data elements or attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.). Metadata may include descriptive information about the context, quality and condition, or characteristics of the data. It may be recorded with high or low granularity Definition: Metadata contains information about that data or other data Metadata is structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities
  9. Why Metadata is important • Assume that the project team has completed successfully the development of first data mart.But the user can have several things in mind: – Are the predefined queries I look at – What are the various elements in data warehouse – Is there information about unit sales and unit costs by product – How can I browse and see what is available – From where did they get the data for data warehouse? From which source system – How old are data warehouse – When is the last time fresh data was brought in – Are there summaries by months and product
  10. • We can define meta data in terms data warehousing which includes: – Data about data – Table of content for data – Catalog for data – Data warehouse roadmap – Data warehouse directory
  11. Applications of Metadata • Libraries • Metadata has been used in various forms as a means of cataloging archived information. • Photographs • Metadata may be written into a digital photo file that will identify who owns it, copyright & contact information, what camera created the file, along with exposure information and descriptive information such as keywords about the photo, making the file searchable on the computer and/or the Internet • Web pages • Web pages often include metadata in the form of meta tags. Description and keywords meta tags are commonly used to describe the Web page's content. Most search engines use this data when adding pages to their search index.
  12. Critical Need of Data warehouse • Meta data is absolute need in building datawarehouse i.e – For Using data warehouse: • To run adhoc queries and formatting reports users need to know about the data in data warehouse. • The users should gain maximum from data ware house and ignorance of data should not give them wrong conclusion – For building the data warehouse: • For data extraction we must know the source system • Structures and content will help in determining mapping • As a Role of DBA if one needs to know about metadata for physical loading and staging. – Data Administration • Data Administration is not possible knowing the metadata • Metadata is absoultely necessary for building datawarehouse
  13. Data warehouse Metadata • Metadata systems in data warehouse are sometimes separated into two sections: 1.back room metadata that are used for Extract, transform, load functions to get OLTP data into a data warehouse 2.front room metadata that are used to label screens and create reports
  14. Business Intelligence metadata • Business Intelligence is the process of analyzing large amounts of corporate data, usually stored in large databases such as a Data Warehouse, tracking business performance, detecting patterns and trends, and helping enterprise business users make better decisions. Business Intelligence metadata describes how data is queried, filtered, analyzed, and displayed in Business Intelligence software tools, such as Reporting tools, OLAP tools, Data Mining tools. – Examples: • Data Mining metadata: The descriptions and structures of Data Sets, Algorithms, Queries • OLAP metadata: The descriptions and structures of Dimensions, Cubes, Measures (Metrics), Hierarchies, Levels, Drill Paths • Reporting metadata: The descriptions and structures of Reports, Charts, Queries, Data Sets, Filters, Variables, Expressions
  15. Building the data warehouse • To build the metadata when need the data for data warehouse extracted,the programmer needs to know – the source system,data structure – The data content – How to handle data • For DBA – Incremental loading – Last Compared data – Populating tables
  16. Administrating of Data warehouse • Add new summary table • Expand storage • Add information delivery to the users • When to schedule back ups • How o maintain security system • How to keep data definition up to date • How o verify external data ongoing basis
  17. Metadata used for Transformation and Load • Metadata may be used during data transformation and load to describe data any changes made to data. • The greater the difference in source the greater the requirement of metadata. • The advantages of storing metadata is any transformation takes place as source data changes it can be captured by metadata. • For source data the following information required – Source field(needs to be uniquely identified • Unique Identifier • Name • Type • Location – System – Object
  18. Data management • Meta data is required to describe the data as it resides in the data warehouse. • This is needed for warehouse manager to track and control all data movement. • Metadata is needed for all these things – Tables • Columns • Name • Type – Indexes • Columns – Name – Type – Views • Columns – Name – Type – Constraints(name,type,tables
  19. Data management • For each table the information stored are: – Table name(should be name in data dictionary – Columns • Column name • Reference identifier • Aggregation to be stored in the way table is stored with aggregation name and columns . • Similarly partition also need information like partition key and data range inside the table
  20. Data E T L • How to handle data changes • How to include new sources • Where to cleanse the data • How to change data cleansing method • How to switch to new data transformation technique • How to add new external data source • How to drop external data source • How merging and acquisition takes place Data Warehouse • How to add new summary table • How to expand storage • How to add new information tools for users • How to continue ongoing training • How to improve adhoc queries • When to schedule back ups • How to maintain security systems • How to monitor load distribution
  21. Why Metadata for vital end users • Meta data helps user to know the complexity of data and how it should be transformed into the information. • In a company when a business analyst analyses the reason for loss or profit ,he sees the following things: • Are the sales stored in individual transactions or summary totals. • Can sales be analyzed by product , promotion ,store and month. • Can the current month sales be compared to previous month sales • From where the sales come from , what is the source system. • How old are sales system and how does it get updated. – If the analyst is not sure of data he can not anlayze perfectly. – It would be perfect for a anlyst if he has a perfect road map of metadata.
  22. Metadata Vital for End users • Data Content • Summary Data • Business Dimensions • Business metrics • Navigation paths • Source systems • External data • Last update data • Report formats • OLAP data
  23. Who needs Metadata IT Professionals POWER USERS CASUAL USERS Information discovery Database,Tables,col umns ,server Databases,tables,co lumns Queries ,reports Meaning of Data Data structures ,data definations Cleanising functions Cleansing functions Transformations rules Data owners,filters Information Access SQL,3GL,4GL, Query tools Authorization requests, Information retreival
  24. Query Generation • Meta data is required by the query manager to enable generate queries. • The query manager generate metadata about the queries it has run • The metadata can be used build a history of all queries run and generate query profile.
  25. Query Generation • The metadata that is required for each query are: – Query • Tables accessed – Columns accessed » Name » Reference identifier • Restriction applied – Column name – Table name – Reference identifier – Restriction • Join criteria applied – Column name – Table name – Reference identifier
  26. Why Metadata is essential for IT • Beginning from data extraction to information delivery metadata is crucial. • The following is the need for IT to process data: – Source of data structures – Source platforms – Data extraction methods – External data – Data transformation rules – Data cleansing rules – Staging area structures – Dimensional models – OLAP Sytems – Query/report Design
  27. Automation of datawarehouse tasks • Tools performs major functions of data warehouse • Tools enables data movement ,transformation accordingly etc. • While designing data warehouse we must at the beginning see to create tool for metadata. • In backend processes each tools record it’s own metadata. – Source data structure definition – Data extraction – Initial Reformatting/merging – Preliminary data cleansing – Data transformation – Validation – Data warehouse structure definition – Load Merge creation
  28. Classification of Metadata types • Classification of metadata types by functional areas: – Data acquisition – Data storage – Information delivery
  29. • Acquisition process: – Data Extraction – Data transformation – Data cleansing – Data Integration – Data staging • Metadata Types: – Source system platforms – Source structure definition – Data extraction method – Data transformation rules – Data cleansing rules – External data sructures – External data definition – Summerization rules – Target physical and logical models
  30. Data Storage • The metadata used recorded by the process in data storage area is used for development ,administration and for user. • User would like to see what is the last time previous data loaded. • DBA will use the metadata for processes backup and incremental loads.
  31. Information Delivery • Information delivery – Report generation – Query processing – Complex Analysis • Metadata types: – Source systems – Source data definitions – Data extraction tools – Query templates – Preformatted reports – OLAP content
  32. Technical Metadata • Technical Metadata: – data about the processes, the tool sets, the repositories, the physical layers of data under the covers. Data about run-times, performance averages, table structures, indexes, constraints; data about relationships, sources and targets, up- time, system failure ratios, system resource utilization ratios, performance numbers
  33. Technical Metadata • List of questions Technical Metadat can answer – What database and tables exists – What are column for each table – What are keys and indexes – What are physical files – What load refresh schedules – What type aggregations are available – What is source to target mapping in data warehouse.
  34. Business Metadata • Better understand metadata by looking at a list of example: – Source systems – Source to target mapping – Data transformation business rules – Data transformation – Attributes and business definition – Query reporting tools – Predefined tools – Predefine reports – Report distribution information – Currency OLAP Report – Rules for analysis using OLAP report
  35. Behaviour of Business Metadata • How can I sign onto Metadata • Which part of data warehouse I can access. • What are part of definition I need on my part for query. • What are types of aggregation available for my metrics. • How Old are OLAP data. Should I wait for next update. • Benificaries: – Managers – Business analyst – Regular users
  36. Business MetaData: In IT, Business Metadata is adding additional text or statement around a particular word that adds value to data. Business Metadata is about creating definitions, business rules. For example, when tables and columns are created the following business metadata would be more useful for generating reports to functional and technical team. The advantage is of this business metadata is whether they are technical or non-technical, everybody would understand what is going on within the organization. Table’s Metadata: While creating a table, metadata for definition of a table, source system name, source entity names, business rules to transform the source table, and the usage of the table in reports should be added in order to make them available for taking metadata reports. Column’s Metadata: Similarly for columns, source column name (mapping), business rules to transform the source column name, and the usage of the column in reports should be added for taking metadata reports.
  37. Business rules In dataware house • In the course of designing and populating a data warehouse, some key questions must be answered about the data being incorporated in the warehouse. More often than not, many of these answers are not known at the outset of the project, but must be established if the data warehouse is to succeed. Interestingly, these for the most part represent the same contextual information about the data that business users of the warehouse will need to know to be able to fully understand the information provided, and to trust in its reliability. The questions include: • What are the valid values for the attributes of the data warehouse? • What are the valid data sources for the data warehouse? • When the data’s life cycle, in the operational world, should it be captured and sent to the data warehouse? • What are the “cleansing rules” for the source data? • What are the transformation rules to move the source data to the target database? • How was the data calculated in the operational database
  38. Example of business metadata •
  39. Difference between Technical metadata and business metadata • Metadata into technical (the tool-specific metadata used by IT and vendors) and business metadata (what a businessperson needs to know about what data represents). • The technology person thinks about a data column - how it's defined in a database, represented in a data model, mapped and transformed in the ETL tool and defined in the BI report. All of this, however, is very much related to how the tools store and process the data. The primary challenge is gathering and integrating the metadata across tools. • The businessperson thinks about where the data came from, its associated data quality level, how it was filtered from its source and what types of business rules and algorithms were applied to it. Most of this metadata is either not stored in the tools or needs some serious translation from technical terms to business language.
  40. METADATA MANAGEMENT • The requirements for Metadata management are: – Capturing and storing business • Changes of algorithm methodology occurs when data for several years stores. • Versioning must be maintained – Variety of Metadata sources • Different sources metadata available – Metadata integration • To be unified,merge to give a meaning to the end user. – Metadata standardization • Storage all the metadata should be in the same manner – Rippling through revisions • Revisions will occur as business rules changes – Metadata Exchange • End user should be able to exchange one meta data to another meta data. – Support for end user • Meat data must provide simple graphical and tabular representation to make-it easy to browse through.
  41. Challenges • Major challenges for Metadata management are: – Each software tool has it’s own propiriey of metadata.If we are using several tools ,how can we reconcile it. – No industry wide accepted standards exist for metadata formats – Preserving metadata version control uniformity in data warehouse is very much difficult. – Unifying data sources are very much difficult , since we have to deal with conflicting standards, formats , data naming conventions , units and measures.
  42. META DATA REPOSITORY • Metadata repository may be thought of two distinct information queries: – Technical Metadata – Business Metadata

Hinweis der Redaktion

  1. any university tutorials,lectures,notes,results.universities supported mgu,kerala,anna,cusat,annamalai,calicut...and more.file types of pdf,ppt,txt,doc,etc....