SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
1. What are Critical Success Factors?
Key areas of activity in which favorable results are necessary for a company to obtain its goal.
There are four basic types of CSFs which are:
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
2. What is data cube technology used for?
Data cubes are commonly used for easy interpretation of data. It is used to represent data along
with dimensions as some measures of business needs. Each dimension of the cube represents
some attribute of the database. E.g profit per day, month or year.
3. What is data cleaning?
Data cleaning is also known as data scrubbing.
Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy
and consistency, data integration is checked during data cleaning. Data cleaning can be applied
for a set of records or multiple sets of data which need to be merged.
4. Explain how to mine an OLAP cube.
An extension of data mining can be used for slicing the data the source cube in discovered data
mining. The case table is dimensioned at the time of mining a cube.
5. What are different stages of “Data mining”?
A stage of data mining is a logical process for searching large amount information for finding
important data.
Stage 1: Exploration: One will want to explore and prepare data. The goal of the exploration
stage is to find important variables and determine their nature.
Stage 2: pattern identification: Searching for patterns and choosing the one which allows making
best prediction, is the primary action in this stage.
Stage 3: Deployment stage. Until consistent pattern is found in stage 2, which is highly
predictive, this stage cannot be reached. The pattern found in stage 2, can be applied for the
purpose to see whether the desired outcome is achieved or not.
6. What are the different problems that “Data mining” can solve?
Data mining can be used in a variety of fields/industries like marketing of products and services,
AI, government intelligence.
The US FBI uses data mining for screening security and intelligence for identifying illegal and
incriminating e-information distributed over internet.
7. What is Data purging?
Deleting data from data warehouse is known as data purging. Usually junk data like rows with
null values or spaces are cleaned up.
Data purging is the process of cleaning this kind of junk values.
8. What is BUS schema?
A BUS schema is to identify the common dimensions across business processes, like identifying
conforming dimensions. It has conformed dimension and standardized definition of facts.
9. Define non-additive facts?
Non additive facts are facts that cannot be summed up for any dimensions present in fact table.
These columns cannot be added for producing any results.
10. What is conformed fact? What is conformed dimensions used for?
Conformed fact in a warehouse allows itself to have same name in separate tables. They can be
compared and combined mathematically. Conformed dimensions can be used across multiple
data marts. They have a static structure. Any dimension table that is used by multiple fact tables
can be conformed dimensions.
11. What is real time data-warehousing?
In real time data-warehousing, the warehouse is updated every time the system performs a
transaction. It reflects the real time business data. This means that when the query is fired in the
warehouse, the state of the business at that time will be returned.
Explain the use lookup tables and Aggregate tables?
An aggregate table contains summarized view of data.
Lookup tables, using the primary key of the target, allow updating of records based on the
lookup condition.
Define slowly changing dimensions (SCD)?
SCD are dimensions whose data changes very slowly. eg: city or an employee.
This dimension will change very slowly. The row of this data in the dimension can be either
replaced completely without any track of old record OR a new row can be inserted, OR the
change can be tracked
What is cube grouping?
A transformer built set of similar cubes is known as cube grouping. They are generally used in
creating smaller cubes that are based on the data in the level of dimension.
What is Data Warehousing?
A data warehouse can be considered as a storage area where relevant data is stored irrespective
of the source.
Data warehousing merges data from multiple sources into an easy and complete form.
What is Virtual Data Warehousing?
A virtual data warehouse provides a collective view of the completed data. I t can be considered
as a logical data model of the containing metadata
What is active data warehousing?
An active data warehouse represents a single state of the business. It considers the analytic
perspectives of customers and suppliers. It helps to deliver the updated data through reports
What is data modeling and data mining?
Data Modeling is a technique used to define and analyze the requirements of data that supports
organization’s business process. In simple terms, it is used for the analysis of data objects in
order to identify the relationships among these data objects in any business.
Data Mining is a technique used to analyze datasets to derive useful insights/information. It is
mainly used in retail, consumer goods, telecommunication and financial organizations that have
a strong consumer orientation in order to determine the impact on sales, customer satisfaction
and profitability.
What is the difference between data warehousing and business intelligence?
Data warehousing relates to all aspects of data management starting from the development,
implementation and operation of the data sets. It is a back up of all data relevant to business.(
data store).
Business Intelligence is used to analyze the data from the point of business to measure any
organization’s success.
The factors like sales, profitability, marketing campaign effectiveness, market shares and
operational efficiency etc are analyzed using Business Intelligence tools like Cognos,
Informatica etc.
What is snapshot in a data warehouse?
Snapshot refers to a complete visualization of data at the time of extraction. It occupies less
space and can be used to back up and restore data quickly.
What is ETL process in data warehousing?
ETL stands for Extraction, transformation and loading.
Extracting data from different sources such as flat files, databases or XML data, transforming
this data depending on the application’s needs and load this data into a data warehouse.
Explain the difference between data mining and data warehousing?
Data mining is a method for comparing large amounts of data for the purpose of finding patterns.
It is normally used for models and forecasting.
Data warehousing is the central repository for the data of several business systems in an
enterprise. Data from various resources extracted and organized in the data warehouse
selectively for analysis and accessibility.
What is an OLTP system and OLAP system?
OLTP = OnLine Transaction Processing.
Applications that supports and manages transactions which involve high volumes of data are
supported by OLTP system. OLTP is based on client-server architecture and supports
transactions across networks.
OLAP = OnLine Analytical Processing.
Business data analysis and complex calculations on low volumes of data are performed by
OLAP. An insight of data coming from various resources can be gained by a user with the
support of OLAP.
What are cubes?
Multi dimensional data is logically represented by Cubes in data warehousing. OLAP
environments view the data in the form of hierarchical cube. A data cube stores data in a
summarized version which helps in a faster analysis of data. The data is stored in such a way that
it allows reporting easily.
What is analysis service?
Analysis service provides a combined view of the data used in OLAP or Data mining
Explain sequence clustering algorithm?
Sequence clustering algorithm collects similar or related paths, sequences of data containing
events.
Explain time series algorithm in data mining?
Time series algorithm can be used to predict continuous values of data. Once the algorithm is
skilled to predict a series of data, it can predict the outcome of other series. E.g. forecast the
profit
What is XMLA?
XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical
systems, such as OLAP.
What is surrogate key? Explain it with an example.
A surrogate key is a unique identifier in database either for an entity in the modeled word or an
object in the database. Surrogate key is an internally generated key by the current system and is
invisible to the user. As several objects are available in the database corresponding to surrogate,
surrogate key cannot be utilized as primary key.
Eg: a sequential number can be a surrogate key.
What is the purpose of Factless Fact Table?
A tracking process or collecting status can be performed by using fact less fact tables. It does not
have numeric values that are aggregate.
What is a level of Granularity of a fact table?
The granularity is the lowest level of information stored in the fact table. The depth of data level
is known as granularity.
Eg:In date dimension the level could be year, month, quarter, period, week, day of granularity.
The process consists of the following two steps:
- Determining the dimensions that are to be included
- Determining the location to place the hierarchy of each dimension of information
Difference between star and snowflake schema.
A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table
is stored with a number of dimension tables. In a star schema, one dimension table can have
multiple sub dimensions. This means that in a star schema, the dimension table is independent
without any sub dimensions.
What is the difference between view and materialized view?
View:
• Tail raid data representation is provided by a view to access data from its table.
• Has logical structure cannot occupy space.
• Changes get affected in corresponding tables.
Materialized view
• Pre calculated data persists in it.
• Has physical data space occupation.
• Changes will not get affected in corresponding tables
What is Linked Cube with reference to data warehouse?
Linked cubes are the cubes that are linked in order to make the data remain constant.
1. What is the difference between OLAP and OLTP?
2. Tell me about your ETL workflow process?
3. What is the difference between Operational Database and Warehouse?
4. What type of approach you follow in your project?
5. What is the difference between Data Mart and data ware house?
6. In your project you are using which type of data base and how much space ?
7. Explain the test case template?
8. What is the difference between Severity and Priority?
9. What is the difference between SDLC and STLC?
10. What is the difference between Issue Log and Clarification Log?
11. What type of bugs you have faced in your project?
12. What is Banking?
13. Explain what are the types of Banking?
14. What is the difference between Dimension table and Fact table?
15. Explain SCD’s and their types? how it will be used?
16. Explain Bug reporting?
17. Are you using any models in SDLC?
18. Which process used in ETL Testing?
19. What is unit testing? who will do this?
20. Whats the difference between Incremental Load and Initial Load?
21. Through which document you have done your project?
22. Are you using Requirement tab in QC?
Types of Etl Bugs
1. User interface bugs/cosmetic bugs:-
Related to GUI of application
Navigation, spelling mistakes, font style, font size, colors, alignment.
2. BVA Related bug:-
Minimum and maximum values
3. ECP Related bug:-
Valid and invalid type
4. Input/output bugs:-
Valid values not accepted
Invalid values accepted
5. Calculation bugs:-
Mathematical errors
Final output is wrong
6. Load condition bugs:-
Does not allows multiple users
Does not allows customer expected load
7. Race condition bugs:-
System crash & hang
System cannot run client plat forms
8. Version control bugs:-
No logo matching
No version information available
This occurs usually in regression testing
9. H/W bugs:-
Device is not responding to the application
10. Source bugs:-
Mistakes in help documents
Types of ETL Testing :-
1) Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is mapped from
source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a) NOT NULL
b) UNIQUE
c) Primary Key
d) Foreign key
e) Check
f) Default
g) NULL
2) Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether it is
ascending order or descending order it doesn’t matter .Only count is required for Tester.
Due to lack of time a tester can follow this type of Testing.
3) Source to Target Data Validation Testing:
In this Testing, a tester can validate the each and every point of the source to target data.
Most of the financial projects, a tester can identify the decimal factors.
4) Threshold/Data Integrated Testing:
In this Testing, the Ranges of the data, A test Engineer can usually identifies the population
calculation and share marketing and business finance analysis (quarterly, halferly, Yearly)
MIN MAX RANGE
4 10 6
5) Field to Field Testing:
In the field to field testing, a test engineer can identify that how much space is occupied in the
database. The data is integrated in the table cum data types.
NOTE: To check the order of the columns and source column to target column.
6) Duplicate Check Testing:
In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time
the tester follows database queries why because huge amount of data is present in source and
Target tables.
Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL
HAVING COUNT (*) >1;
Note:
1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may
arise.
2) Sometimes, a developer can do mistakes while transferring the data from source to target at that
time duplicates may arise.
3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
7) Error/Exception Logical Testing:
1) Delimiter is available in Valid Tables
2) Delimiter is not available in invalid tables(Exception Tables)
8) Incremental and Historical Process Testing:
In the Incremental data, the historical data is not corrupted. When the historical data is corrupted
then this is the condition where bugs raise.
9) Control Columns and Defect Values Testing:
This is introduced by IBM
10) Navigation Testing:
Navigation Testing is the End user point of view testing. An end user cannot follow the friendly
of the application that navigation is called as bad or poor Navigation.
At the time of Testing, A tester can identify this type of navigation scenarios to avoid
unnecessary navigation.
11) Initialization testing:
A combination of hardware and software installed in platform is called the Initialization Testing
12) Transformation Testing:
At the time of mapping from source table to target table, Transformation is not in mapping
condition, then the Test Engineer raises bugs.
13) Regression Testing:
Code modification to fix a bug or to implement a new functionality which makes us to to find
errors.
These introduced errors are called regression. Identifying for regression effect is called
regression testing.
14) Retesting:
Re executing the failed test cases after fixing the bug.
15) System Integration Testing:
Integration testing: After the completion of programming process. Developer can integrate the
modules there are 3 models
a) Top Down
b) Bottom Up
c) Hybrid
Project
Here I am taking emp table as example. For this I will write test scenarios and test cases, that
means we are testing emp table.
Check List or Test Scenarios:-
1. To validate the data in table (emp)
2. To validate the table structure.
3. To validate the null values of the table.
4. To validate the null values of very attribute.
5. To check the duplicate values of the table.
6. To check the duplicate values of each attribute of the table
7. To check the field value or space (length of the field size)
8. To check the constraints (foreign ,primary key)
9. To check the name of the employer who has not earned any commission
10. To check the all employers who are work in dept no (Account dept,sales
dept)
11. To check the row count of each attribute.
12. To check the row count of the table.
13. To check the max salary from emp table.
14. To check the min salary from emp table.
http://etltestingguide.blogspot.com/p/sql.html
What is the Difference between a ODS and Staging Area
ODS :-Operational Data Store which contains data .
ods comes after the staging area
eg:-
In our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in
the Data warehouse. If the business(manager) asks for week level Granularity then we have to go to the
OLTP and summarize the day level to the week level which would be pain taking. So what we do is that
we maintain week level Granularity in the ODS for the data, for abt 30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :-
It comes after the ETL has finished. Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
The full form of ODS is Operational Data Store.ODS is a layer between the source and target
databases..ODS is used to store the recent data.
Staging layer is also a layer between the source and target databases..Staging layer is used for
cleansing purpose and store the data periodically.
ODS (Operational Data Source) is the first point in the Datawarehouse. Its store the real time
data of daily transactions as the first instance of Date.
Staging Area, is the later part which comes after the ODS. Here the Data is cleansed and
temporarily stored before loaded into the Datawarehouse.
ODS is a Open Data Source where it contains real time data (because we should apply any
changes on real time data right..!) so dump the real time data into ODS called Landing area later
we get the data into staging area here is the place where we do all transformation.
Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

Weitere ähnliche Inhalte

Was ist angesagt?

Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse ProjectSunny U Okoro
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schemaSayed Ahmed
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudMichael Rainey
 
Informatica PowerCenter
Informatica PowerCenterInformatica PowerCenter
Informatica PowerCenterRamy Mahrous
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanMadhu Nepal
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter ArchitectureBigClasses Com
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview trainingMondy Holten
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 

Was ist angesagt? (20)

Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Database testing
Database testingDatabase testing
Database testing
 
Informatica PowerCenter
Informatica PowerCenterInformatica PowerCenter
Informatica PowerCenter
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Informatica session
Informatica sessionInformatica session
Informatica session
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter Architecture
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 

Andere mochten auch

ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersH2Kinfosys
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testingGaruda Trainings
 
Accenture informatica interview question answers
Accenture informatica interview question answersAccenture informatica interview question answers
Accenture informatica interview question answersSweta Singh
 
Cts informatica interview question answers
Cts informatica interview question answersCts informatica interview question answers
Cts informatica interview question answersSweta Singh
 
Ibm informatica interview question answers
Ibm informatica interview question answersIbm informatica interview question answers
Ibm informatica interview question answersSweta Singh
 
Data warehouse master test plan
Data warehouse master test planData warehouse master test plan
Data warehouse master test planWayne Yaddow
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department FinalWayne Yaddow
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouseKomal Choudhary
 
Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014BigClasses.com
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Sql queries with answers
Sql queries with answersSql queries with answers
Sql queries with answersvijaybusu
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contentsManoj Jagtap
 
Testcase design techniques final
Testcase design techniques finalTestcase design techniques final
Testcase design techniques finalshraavank
 
Ods, edf, eav & global types
Ods, edf, eav & global typesOds, edf, eav & global types
Ods, edf, eav & global typesSTIinnsbruck
 

Andere mochten auch (17)

SQL for ETL Testing
SQL for ETL TestingSQL for ETL Testing
SQL for ETL Testing
 
ETL Testing Interview Questions and Answers
ETL Testing Interview Questions and AnswersETL Testing Interview Questions and Answers
ETL Testing Interview Questions and Answers
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testing
 
Accenture informatica interview question answers
Accenture informatica interview question answersAccenture informatica interview question answers
Accenture informatica interview question answers
 
Cts informatica interview question answers
Cts informatica interview question answersCts informatica interview question answers
Cts informatica interview question answers
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Ibm informatica interview question answers
Ibm informatica interview question answersIbm informatica interview question answers
Ibm informatica interview question answers
 
Data warehouse master test plan
Data warehouse master test planData warehouse master test plan
Data warehouse master test plan
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department Final
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Sql queries with answers
Sql queries with answersSql queries with answers
Sql queries with answers
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contents
 
Testcase design techniques final
Testcase design techniques finalTestcase design techniques final
Testcase design techniques final
 
Ods, edf, eav & global types
Ods, edf, eav & global typesOds, edf, eav & global types
Ods, edf, eav & global types
 

Ähnlich wie Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questionsSatyam Jaiswal
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P
 
Data warehouse
Data warehouseData warehouse
Data warehouse_123_
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxhajon27910
 

Ähnlich wie Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages (20)

Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questions
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Abstract
AbstractAbstract
Abstract
 
Bi assignment
Bi assignmentBi assignment
Bi assignment
 
Unit II.pdf
Unit II.pdfUnit II.pdf
Unit II.pdf
 
Data Mining
Data MiningData Mining
Data Mining
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptx
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 

Kürzlich hochgeladen

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Kürzlich hochgeladen (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

  • 1. 1. What are Critical Success Factors? Key areas of activity in which favorable results are necessary for a company to obtain its goal. There are four basic types of CSFs which are: Industry CSFs Strategy CSFs Environmental CSFs Temporal CSFs 2. What is data cube technology used for? Data cubes are commonly used for easy interpretation of data. It is used to represent data along with dimensions as some measures of business needs. Each dimension of the cube represents some attribute of the database. E.g profit per day, month or year. 3. What is data cleaning? Data cleaning is also known as data scrubbing. Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged. 4. Explain how to mine an OLAP cube. An extension of data mining can be used for slicing the data the source cube in discovered data mining. The case table is dimensioned at the time of mining a cube. 5. What are different stages of “Data mining”? A stage of data mining is a logical process for searching large amount information for finding important data. Stage 1: Exploration: One will want to explore and prepare data. The goal of the exploration stage is to find important variables and determine their nature. Stage 2: pattern identification: Searching for patterns and choosing the one which allows making best prediction, is the primary action in this stage. Stage 3: Deployment stage. Until consistent pattern is found in stage 2, which is highly predictive, this stage cannot be reached. The pattern found in stage 2, can be applied for the purpose to see whether the desired outcome is achieved or not.
  • 2. 6. What are the different problems that “Data mining” can solve? Data mining can be used in a variety of fields/industries like marketing of products and services, AI, government intelligence. The US FBI uses data mining for screening security and intelligence for identifying illegal and incriminating e-information distributed over internet. 7. What is Data purging? Deleting data from data warehouse is known as data purging. Usually junk data like rows with null values or spaces are cleaned up. Data purging is the process of cleaning this kind of junk values. 8. What is BUS schema? A BUS schema is to identify the common dimensions across business processes, like identifying conforming dimensions. It has conformed dimension and standardized definition of facts. 9. Define non-additive facts? Non additive facts are facts that cannot be summed up for any dimensions present in fact table. These columns cannot be added for producing any results. 10. What is conformed fact? What is conformed dimensions used for? Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. They have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions. 11. What is real time data-warehousing? In real time data-warehousing, the warehouse is updated every time the system performs a transaction. It reflects the real time business data. This means that when the query is fired in the warehouse, the state of the business at that time will be returned. Explain the use lookup tables and Aggregate tables? An aggregate table contains summarized view of data. Lookup tables, using the primary key of the target, allow updating of records based on the lookup condition.
  • 3. Define slowly changing dimensions (SCD)? SCD are dimensions whose data changes very slowly. eg: city or an employee. This dimension will change very slowly. The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked What is cube grouping? A transformer built set of similar cubes is known as cube grouping. They are generally used in creating smaller cubes that are based on the data in the level of dimension. What is Data Warehousing? A data warehouse can be considered as a storage area where relevant data is stored irrespective of the source. Data warehousing merges data from multiple sources into an easy and complete form. What is Virtual Data Warehousing? A virtual data warehouse provides a collective view of the completed data. I t can be considered as a logical data model of the containing metadata What is active data warehousing? An active data warehouse represents a single state of the business. It considers the analytic perspectives of customers and suppliers. It helps to deliver the updated data through reports What is data modeling and data mining? Data Modeling is a technique used to define and analyze the requirements of data that supports organization’s business process. In simple terms, it is used for the analysis of data objects in order to identify the relationships among these data objects in any business. Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation in order to determine the impact on sales, customer satisfaction and profitability. What is the difference between data warehousing and business intelligence? Data warehousing relates to all aspects of data management starting from the development, implementation and operation of the data sets. It is a back up of all data relevant to business.( data store).
  • 4. Business Intelligence is used to analyze the data from the point of business to measure any organization’s success. The factors like sales, profitability, marketing campaign effectiveness, market shares and operational efficiency etc are analyzed using Business Intelligence tools like Cognos, Informatica etc. What is snapshot in a data warehouse? Snapshot refers to a complete visualization of data at the time of extraction. It occupies less space and can be used to back up and restore data quickly. What is ETL process in data warehousing? ETL stands for Extraction, transformation and loading. Extracting data from different sources such as flat files, databases or XML data, transforming this data depending on the application’s needs and load this data into a data warehouse. Explain the difference between data mining and data warehousing? Data mining is a method for comparing large amounts of data for the purpose of finding patterns. It is normally used for models and forecasting. Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility. What is an OLTP system and OLAP system? OLTP = OnLine Transaction Processing. Applications that supports and manages transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports transactions across networks. OLAP = OnLine Analytical Processing. Business data analysis and complex calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.
  • 5. What are cubes? Multi dimensional data is logically represented by Cubes in data warehousing. OLAP environments view the data in the form of hierarchical cube. A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily. What is analysis service? Analysis service provides a combined view of the data used in OLAP or Data mining Explain sequence clustering algorithm? Sequence clustering algorithm collects similar or related paths, sequences of data containing events. Explain time series algorithm in data mining? Time series algorithm can be used to predict continuous values of data. Once the algorithm is skilled to predict a series of data, it can predict the outcome of other series. E.g. forecast the profit What is XMLA? XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical systems, such as OLAP. What is surrogate key? Explain it with an example. A surrogate key is a unique identifier in database either for an entity in the modeled word or an object in the database. Surrogate key is an internally generated key by the current system and is invisible to the user. As several objects are available in the database corresponding to surrogate, surrogate key cannot be utilized as primary key. Eg: a sequential number can be a surrogate key. What is the purpose of Factless Fact Table? A tracking process or collecting status can be performed by using fact less fact tables. It does not have numeric values that are aggregate. What is a level of Granularity of a fact table? The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity.
  • 6. Eg:In date dimension the level could be year, month, quarter, period, week, day of granularity. The process consists of the following two steps: - Determining the dimensions that are to be included - Determining the location to place the hierarchy of each dimension of information Difference between star and snowflake schema. A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table is stored with a number of dimension tables. In a star schema, one dimension table can have multiple sub dimensions. This means that in a star schema, the dimension table is independent without any sub dimensions. What is the difference between view and materialized view? View: • Tail raid data representation is provided by a view to access data from its table. • Has logical structure cannot occupy space. • Changes get affected in corresponding tables. Materialized view • Pre calculated data persists in it. • Has physical data space occupation. • Changes will not get affected in corresponding tables What is Linked Cube with reference to data warehouse? Linked cubes are the cubes that are linked in order to make the data remain constant. 1. What is the difference between OLAP and OLTP? 2. Tell me about your ETL workflow process? 3. What is the difference between Operational Database and Warehouse? 4. What type of approach you follow in your project? 5. What is the difference between Data Mart and data ware house? 6. In your project you are using which type of data base and how much space ? 7. Explain the test case template? 8. What is the difference between Severity and Priority?
  • 7. 9. What is the difference between SDLC and STLC? 10. What is the difference between Issue Log and Clarification Log? 11. What type of bugs you have faced in your project? 12. What is Banking? 13. Explain what are the types of Banking? 14. What is the difference between Dimension table and Fact table? 15. Explain SCD’s and their types? how it will be used? 16. Explain Bug reporting? 17. Are you using any models in SDLC? 18. Which process used in ETL Testing? 19. What is unit testing? who will do this? 20. Whats the difference between Incremental Load and Initial Load? 21. Through which document you have done your project? 22. Are you using Requirement tab in QC? Types of Etl Bugs 1. User interface bugs/cosmetic bugs:- Related to GUI of application Navigation, spelling mistakes, font style, font size, colors, alignment. 2. BVA Related bug:- Minimum and maximum values 3. ECP Related bug:- Valid and invalid type 4. Input/output bugs:- Valid values not accepted Invalid values accepted 5. Calculation bugs:- Mathematical errors Final output is wrong 6. Load condition bugs:-
  • 8. Does not allows multiple users Does not allows customer expected load 7. Race condition bugs:- System crash & hang System cannot run client plat forms 8. Version control bugs:- No logo matching No version information available This occurs usually in regression testing 9. H/W bugs:- Device is not responding to the application 10. Source bugs:- Mistakes in help documents Types of ETL Testing :- 1) Constraint Testing: In the phase of constraint testing, the test engineers identifies whether the data is mapped from source to target or not. The Test Engineer follows the below scenarios in ETL Testing process. a) NOT NULL b) UNIQUE c) Primary Key d) Foreign key e) Check f) Default g) NULL 2) Source to Target Count Testing: In the Source to Target data is matched or not. A Tester can check in this view whether it is ascending order or descending order it doesn’t matter .Only count is required for Tester. Due to lack of time a tester can follow this type of Testing. 3) Source to Target Data Validation Testing: In this Testing, a tester can validate the each and every point of the source to target data. Most of the financial projects, a tester can identify the decimal factors. 4) Threshold/Data Integrated Testing: In this Testing, the Ranges of the data, A test Engineer can usually identifies the population calculation and share marketing and business finance analysis (quarterly, halferly, Yearly) MIN MAX RANGE 4 10 6
  • 9. 5) Field to Field Testing: In the field to field testing, a test engineer can identify that how much space is occupied in the database. The data is integrated in the table cum data types. NOTE: To check the order of the columns and source column to target column. 6) Duplicate Check Testing: In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time the tester follows database queries why because huge amount of data is present in source and Target tables. Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL HAVING COUNT (*) >1; Note: 1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may arise. 2) Sometimes, a developer can do mistakes while transferring the data from source to target at that time duplicates may arise. 3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool). 7) Error/Exception Logical Testing: 1) Delimiter is available in Valid Tables 2) Delimiter is not available in invalid tables(Exception Tables) 8) Incremental and Historical Process Testing: In the Incremental data, the historical data is not corrupted. When the historical data is corrupted then this is the condition where bugs raise. 9) Control Columns and Defect Values Testing: This is introduced by IBM 10) Navigation Testing: Navigation Testing is the End user point of view testing. An end user cannot follow the friendly of the application that navigation is called as bad or poor Navigation. At the time of Testing, A tester can identify this type of navigation scenarios to avoid unnecessary navigation. 11) Initialization testing: A combination of hardware and software installed in platform is called the Initialization Testing 12) Transformation Testing: At the time of mapping from source table to target table, Transformation is not in mapping condition, then the Test Engineer raises bugs. 13) Regression Testing:
  • 10. Code modification to fix a bug or to implement a new functionality which makes us to to find errors. These introduced errors are called regression. Identifying for regression effect is called regression testing. 14) Retesting: Re executing the failed test cases after fixing the bug. 15) System Integration Testing: Integration testing: After the completion of programming process. Developer can integrate the modules there are 3 models a) Top Down b) Bottom Up c) Hybrid Project Here I am taking emp table as example. For this I will write test scenarios and test cases, that means we are testing emp table. Check List or Test Scenarios:- 1. To validate the data in table (emp) 2. To validate the table structure. 3. To validate the null values of the table. 4. To validate the null values of very attribute. 5. To check the duplicate values of the table. 6. To check the duplicate values of each attribute of the table 7. To check the field value or space (length of the field size) 8. To check the constraints (foreign ,primary key) 9. To check the name of the employer who has not earned any commission 10. To check the all employers who are work in dept no (Account dept,sales dept) 11. To check the row count of each attribute. 12. To check the row count of the table. 13. To check the max salary from emp table. 14. To check the min salary from emp table.
  • 11. http://etltestingguide.blogspot.com/p/sql.html What is the Difference between a ODS and Staging Area ODS :-Operational Data Store which contains data . ods comes after the staging area eg:- In our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse. If the business(manager) asks for week level Granularity then we have to go to the OLTP and summarize the day level to the week level which would be pain taking. So what we do is that we maintain week level Granularity in the ODS for the data, for abt 30 to 90 days. Note : Ods information would contain cleansed data only. ie after staging area Staging Area :- It comes after the ETL has finished. Staging Area consists of 1.Meta Data . 2.The work area where we apply our complex business rules. 3.Hold the data and do calculations. In other words we can say that its a temp work area. The full form of ODS is Operational Data Store.ODS is a layer between the source and target databases..ODS is used to store the recent data. Staging layer is also a layer between the source and target databases..Staging layer is used for cleansing purpose and store the data periodically. ODS (Operational Data Source) is the first point in the Datawarehouse. Its store the real time data of daily transactions as the first instance of Date. Staging Area, is the later part which comes after the ODS. Here the Data is cleansed and temporarily stored before loaded into the Datawarehouse. ODS is a Open Data Source where it contains real time data (because we should apply any changes on real time data right..!) so dump the real time data into ODS called Landing area later we get the data into staging area here is the place where we do all transformation.