Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

1. What are Critical Success Factors?
Key areas of activity in which favorable results are necessary for a company to obtain its goal.
There are four basic types of CSFs which are:
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
2. What is data cube technology used for?
Data cubes are commonly used for easy interpretation of data. It is used to represent data along
with dimensions as some measures of business needs. Each dimension of the cube represents
some attribute of the database. E.g profit per day, month or year.
3. What is data cleaning?
Data cleaning is also known as data scrubbing.
Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy
and consistency, data integration is checked during data cleaning. Data cleaning can be applied
for a set of records or multiple sets of data which need to be merged.
4. Explain how to mine an OLAP cube.
An extension of data mining can be used for slicing the data the source cube in discovered data
mining. The case table is dimensioned at the time of mining a cube.
5. What are different stages of “Data mining”?
A stage of data mining is a logical process for searching large amount information for finding
important data.
Stage 1: Exploration: One will want to explore and prepare data. The goal of the exploration
stage is to find important variables and determine their nature.
Stage 2: pattern identification: Searching for patterns and choosing the one which allows making
best prediction, is the primary action in this stage.
Stage 3: Deployment stage. Until consistent pattern is found in stage 2, which is highly
predictive, this stage cannot be reached. The pattern found in stage 2, can be applied for the
purpose to see whether the desired outcome is achieved or not.

6. What are the different problems that “Data mining” can solve?
Data mining can be used in a variety of fields/industries like marketing of products and services,
AI, government intelligence.
The US FBI uses data mining for screening security and intelligence for identifying illegal and
incriminating e-information distributed over internet.
7. What is Data purging?
Deleting data from data warehouse is known as data purging. Usually junk data like rows with
null values or spaces are cleaned up.
Data purging is the process of cleaning this kind of junk values.
8. What is BUS schema?
A BUS schema is to identify the common dimensions across business processes, like identifying
conforming dimensions. It has conformed dimension and standardized definition of facts.
9. Define non-additive facts?
Non additive facts are facts that cannot be summed up for any dimensions present in fact table.
These columns cannot be added for producing any results.
10. What is conformed fact? What is conformed dimensions used for?
Conformed fact in a warehouse allows itself to have same name in separate tables. They can be
compared and combined mathematically. Conformed dimensions can be used across multiple
data marts. They have a static structure. Any dimension table that is used by multiple fact tables
can be conformed dimensions.
11. What is real time data-warehousing?
In real time data-warehousing, the warehouse is updated every time the system performs a
transaction. It reflects the real time business data. This means that when the query is fired in the
warehouse, the state of the business at that time will be returned.
Explain the use lookup tables and Aggregate tables?
An aggregate table contains summarized view of data.
Lookup tables, using the primary key of the target, allow updating of records based on the
lookup condition.

Define slowly changing dimensions (SCD)?
SCD are dimensions whose data changes very slowly. eg: city or an employee.
This dimension will change very slowly. The row of this data in the dimension can be either
replaced completely without any track of old record OR a new row can be inserted, OR the
change can be tracked
What is cube grouping?
A transformer built set of similar cubes is known as cube grouping. They are generally used in
creating smaller cubes that are based on the data in the level of dimension.
What is Data Warehousing?
A data warehouse can be considered as a storage area where relevant data is stored irrespective
of the source.
Data warehousing merges data from multiple sources into an easy and complete form.
What is Virtual Data Warehousing?
A virtual data warehouse provides a collective view of the completed data. I t can be considered
as a logical data model of the containing metadata
What is active data warehousing?
An active data warehouse represents a single state of the business. It considers the analytic
perspectives of customers and suppliers. It helps to deliver the updated data through reports
What is data modeling and data mining?
Data Modeling is a technique used to define and analyze the requirements of data that supports
organization’s business process. In simple terms, it is used for the analysis of data objects in
order to identify the relationships among these data objects in any business.
Data Mining is a technique used to analyze datasets to derive useful insights/information. It is
mainly used in retail, consumer goods, telecommunication and financial organizations that have
a strong consumer orientation in order to determine the impact on sales, customer satisfaction
and profitability.
What is the difference between data warehousing and business intelligence?
Data warehousing relates to all aspects of data management starting from the development,
implementation and operation of the data sets. It is a back up of all data relevant to business.(
data store).

Business Intelligence is used to analyze the data from the point of business to measure any
organization’s success.
The factors like sales, profitability, marketing campaign effectiveness, market shares and
operational efficiency etc are analyzed using Business Intelligence tools like Cognos,
Informatica etc.
What is snapshot in a data warehouse?
Snapshot refers to a complete visualization of data at the time of extraction. It occupies less
space and can be used to back up and restore data quickly.
What is ETL process in data warehousing?
ETL stands for Extraction, transformation and loading.
Extracting data from different sources such as flat files, databases or XML data, transforming
this data depending on the application’s needs and load this data into a data warehouse.
Explain the difference between data mining and data warehousing?
Data mining is a method for comparing large amounts of data for the purpose of finding patterns.
It is normally used for models and forecasting.
Data warehousing is the central repository for the data of several business systems in an
enterprise. Data from various resources extracted and organized in the data warehouse
selectively for analysis and accessibility.
What is an OLTP system and OLAP system?
OLTP = OnLine Transaction Processing.
Applications that supports and manages transactions which involve high volumes of data are
supported by OLTP system. OLTP is based on client-server architecture and supports
transactions across networks.
OLAP = OnLine Analytical Processing.
Business data analysis and complex calculations on low volumes of data are performed by
OLAP. An insight of data coming from various resources can be gained by a user with the
support of OLAP.

What are cubes?
Multi dimensional data is logically represented by Cubes in data warehousing. OLAP
environments view the data in the form of hierarchical cube. A data cube stores data in a
summarized version which helps in a faster analysis of data. The data is stored in such a way that
it allows reporting easily.
What is analysis service?
Analysis service provides a combined view of the data used in OLAP or Data mining
Explain sequence clustering algorithm?
Sequence clustering algorithm collects similar or related paths, sequences of data containing
events.
Explain time series algorithm in data mining?
Time series algorithm can be used to predict continuous values of data. Once the algorithm is
skilled to predict a series of data, it can predict the outcome of other series. E.g. forecast the
profit
What is XMLA?
XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical
systems, such as OLAP.
What is surrogate key? Explain it with an example.
A surrogate key is a unique identifier in database either for an entity in the modeled word or an
object in the database. Surrogate key is an internally generated key by the current system and is
invisible to the user. As several objects are available in the database corresponding to surrogate,
surrogate key cannot be utilized as primary key.
Eg: a sequential number can be a surrogate key.
What is the purpose of Factless Fact Table?
A tracking process or collecting status can be performed by using fact less fact tables. It does not
have numeric values that are aggregate.
What is a level of Granularity of a fact table?
The granularity is the lowest level of information stored in the fact table. The depth of data level
is known as granularity.

Eg:In date dimension the level could be year, month, quarter, period, week, day of granularity.
The process consists of the following two steps:
- Determining the dimensions that are to be included
- Determining the location to place the hierarchy of each dimension of information
Difference between star and snowflake schema.
A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table
is stored with a number of dimension tables. In a star schema, one dimension table can have
multiple sub dimensions. This means that in a star schema, the dimension table is independent
without any sub dimensions.
What is the difference between view and materialized view?
View:
• Tail raid data representation is provided by a view to access data from its table.
• Has logical structure cannot occupy space.
• Changes get affected in corresponding tables.
Materialized view
• Pre calculated data persists in it.
• Has physical data space occupation.
• Changes will not get affected in corresponding tables
What is Linked Cube with reference to data warehouse?
Linked cubes are the cubes that are linked in order to make the data remain constant.
1. What is the difference between OLAP and OLTP?
2. Tell me about your ETL workflow process?
3. What is the difference between Operational Database and Warehouse?
4. What type of approach you follow in your project?
5. What is the difference between Data Mart and data ware house?
6. In your project you are using which type of data base and how much space ?
7. Explain the test case template?
8. What is the difference between Severity and Priority?

9. What is the difference between SDLC and STLC?
10. What is the difference between Issue Log and Clarification Log?
11. What type of bugs you have faced in your project?
12. What is Banking?
13. Explain what are the types of Banking?
14. What is the difference between Dimension table and Fact table?
15. Explain SCD’s and their types? how it will be used?
16. Explain Bug reporting?
17. Are you using any models in SDLC?
18. Which process used in ETL Testing?
19. What is unit testing? who will do this?
20. Whats the difference between Incremental Load and Initial Load?
21. Through which document you have done your project?
22. Are you using Requirement tab in QC?
Types of Etl Bugs
1. User interface bugs/cosmetic bugs:-
Related to GUI of application
Navigation, spelling mistakes, font style, font size, colors, alignment.
2. BVA Related bug:-
Minimum and maximum values
3. ECP Related bug:-
Valid and invalid type
4. Input/output bugs:-
Valid values not accepted
Invalid values accepted
5. Calculation bugs:-
Mathematical errors
Final output is wrong
6. Load condition bugs:-

Does not allows multiple users
Does not allows customer expected load
7. Race condition bugs:-
System crash & hang
System cannot run client plat forms
8. Version control bugs:-
No logo matching
No version information available
This occurs usually in regression testing
9. H/W bugs:-
Device is not responding to the application
10. Source bugs:-
Mistakes in help documents
Types of ETL Testing :-
1) Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is mapped from
source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a) NOT NULL
b) UNIQUE
c) Primary Key
d) Foreign key
e) Check
f) Default
g) NULL
2) Source to Target Count Testing:
In the Source to Target data is matched or not. A Tester can check in this view whether it is
ascending order or descending order it doesn’t matter .Only count is required for Tester.
Due to lack of time a tester can follow this type of Testing.
3) Source to Target Data Validation Testing:
In this Testing, a tester can validate the each and every point of the source to target data.
Most of the financial projects, a tester can identify the decimal factors.
4) Threshold/Data Integrated Testing:
In this Testing, the Ranges of the data, A test Engineer can usually identifies the population
calculation and share marketing and business finance analysis (quarterly, halferly, Yearly)
MIN MAX RANGE
4 10 6

5) Field to Field Testing:
In the field to field testing, a test engineer can identify that how much space is occupied in the
database. The data is integrated in the table cum data types.
NOTE: To check the order of the columns and source column to target column.
6) Duplicate Check Testing:
In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time
the tester follows database queries why because huge amount of data is present in source and
Target tables.
Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL
HAVING COUNT (*) >1;
Note:
1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may
arise.
2) Sometimes, a developer can do mistakes while transferring the data from source to target at that
time duplicates may arise.
3) Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).
7) Error/Exception Logical Testing:
1) Delimiter is available in Valid Tables
2) Delimiter is not available in invalid tables(Exception Tables)
8) Incremental and Historical Process Testing:
In the Incremental data, the historical data is not corrupted. When the historical data is corrupted
then this is the condition where bugs raise.
9) Control Columns and Defect Values Testing:
This is introduced by IBM
10) Navigation Testing:
Navigation Testing is the End user point of view testing. An end user cannot follow the friendly
of the application that navigation is called as bad or poor Navigation.
At the time of Testing, A tester can identify this type of navigation scenarios to avoid
unnecessary navigation.
11) Initialization testing:
A combination of hardware and software installed in platform is called the Initialization Testing
12) Transformation Testing:
At the time of mapping from source table to target table, Transformation is not in mapping
condition, then the Test Engineer raises bugs.
13) Regression Testing:

Code modification to fix a bug or to implement a new functionality which makes us to to find
errors.
These introduced errors are called regression. Identifying for regression effect is called
regression testing.
14) Retesting:
Re executing the failed test cases after fixing the bug.
15) System Integration Testing:
Integration testing: After the completion of programming process. Developer can integrate the
modules there are 3 models
a) Top Down
b) Bottom Up
c) Hybrid
Project
Here I am taking emp table as example. For this I will write test scenarios and test cases, that
means we are testing emp table.
Check List or Test Scenarios:-
1. To validate the data in table (emp)
2. To validate the table structure.
3. To validate the null values of the table.
4. To validate the null values of very attribute.
5. To check the duplicate values of the table.
6. To check the duplicate values of each attribute of the table
7. To check the field value or space (length of the field size)
8. To check the constraints (foreign ,primary key)
9. To check the name of the employer who has not earned any commission
10. To check the all employers who are work in dept no (Account dept,sales
dept)
11. To check the row count of each attribute.
12. To check the row count of the table.
13. To check the max salary from emp table.
14. To check the min salary from emp table.

http://etltestingguide.blogspot.com/p/sql.html
What is the Difference between a ODS and Staging Area
ODS :-Operational Data Store which contains data .
ods comes after the staging area
eg:-
In our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in
the Data warehouse. If the business(manager) asks for week level Granularity then we have to go to the
OLTP and summarize the day level to the week level which would be pain taking. So what we do is that
we maintain week level Granularity in the ODS for the data, for abt 30 to 90 days.
Note : Ods information would contain cleansed data only. ie after staging area
Staging Area :-
It comes after the ETL has finished. Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.
The full form of ODS is Operational Data Store.ODS is a layer between the source and target
databases..ODS is used to store the recent data.
Staging layer is also a layer between the source and target databases..Staging layer is used for
cleansing purpose and store the data periodically.
ODS (Operational Data Source) is the first point in the Datawarehouse. Its store the real time
data of daily transactions as the first instance of Date.
Staging Area, is the later part which comes after the ODS. Here the Data is cleansed and
temporarily stored before loaded into the Datawarehouse.
ODS is a Open Data Source where it contains real time data (because we should apply any
changes on real time data right..!) so dump the real time data into ODS called Landing area later
we get the data into staging area here is the place where we do all transformation.

Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (17)

Ähnlich wie Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages

Ähnlich wie Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Critical Success Factors, Data Cubes, Data Cleaning, Data Mining Stages