SlideShare ist ein Scribd-Unternehmen logo
1 von 80
1
WIPRO – APPLYING THOUGHT
2010
Summer Training
Report
Data Warehousing and Business Intelligence
Using Qlikview
Rahul Dubey
H T T P : / / W W W . W I P R O . C O M
DATA WAREHOUSING AND BUSINESS
INTELLIGENCE USING QLIKVIEW
A PROJECT TRAINING REPORT
Submitted By
RAHUL DUBEY
In partial fulfilment for the award of the degree of
Bachelor of Technology
In
COMPUTER SCIENCE AND ENGINEERING
ACKNOWLEDGEMENT
I, RAHUL DUBEY am grateful to Wipro InfoTech for providing me the opportunity
to work them and complete my 6 weeks of project training as a part of my B-Tech
(Comp Sc.) curriculum.
Also I would like to express my deep sense of gratitude to my project training guide
Mr. Nitish Vij, Solution Architect (DWH Practice) for his invaluable guidance and
suggestion during my training tenure. His experience has been immense help as were
his efforts in making us understand all the aspect of the project in a small frame of
time and showing us the right way.
It has been a great learning experience for me as got a chance to apply my knowledge
in a practical domain. This training and experience has not only enriched me with
technical knowledge but has also infested the maturity of thought and vision, the
attributes required to be successful software professional.
Last but not the least I would like sincerely thank Mr Ivor Egbert, Sr. Executive(TED)
For offering us this training and also my team mates fir their support and assistance
throughout this period.
DECLARATION BY THE CANDIDATE
I hereby declare that the work which is being presented in the dissertation entitled
“Data Warehouse and BI using Qlikview” in the partial fulfilment of the
requirement for the award of the degree of B-Tech in Computer Science and
Engineering, Jaypee University of Engg. & Tech is an authenticated record of my
work carried out during the period from 31st May 2010 till 9th July under the
supervision of Mr. Nitish Vij, Wipro InfoTech.
Place: Wipro InfoTech, Gurgaon Signature of the Candidate
Date :
BONAFIDE CERTIFICATE
This is to certify that the above statements made by the candidate are true to the best
of our knowledge and belief.
Place: Wipro InfoTech, Gurgaon Signature
Date:
ABOUT THE COMPANY
Wipro InfoTech is the leading strategic IT partner for the companies across India, The
Middle East and Asia-Pacific– offering integrated IT solutions. It plans, deploys,
sustains and maintains your IT lifecycle through our total outsourcing, consulting
service, business solutions and professional services. Wipro InfoTech helps you drive
momentum in your organisation – no matter what domain you are in.
Backed by the strong quality process and rich experience managing global clients
across various business verticals, it aligns IT strategies to your business goals. Along
with their best of breed technology partners, Wipro InfoTech also helps you with
hardware and IT infrastructure needs.
Wipro InfoTech is a part of USD 5 billion Limited (NYSE:WIT) with a market
captitalization of USD 24 billion. The various accreditations that we have achieved
for every service we offer reflect our commitment towards quality assurance. Wipro
InfoTech was the first global software company to achieve Level 5 SEI-CMM, the
world first IT company to achieve Six Sigma, as well as the world’s first company to
attain Level 5 PCMM. Currently, their presence extends to 9 regional offices in India
besides offices in the KSA, UAE, Taiwan, Malaysia, Singapore, Australia, and other
regions in Asia-Pacific and the Middle East.
THE SERVICES OFFERED BY THE COMPANY
In today’s world where IT infrastructure plays a key role in determining the success
of your business organisation, Wipro InfoTech helps to derive maximum value from
the success of your business organisation, Wipro InfoTech helps to derive maximum
value from the IT investments. They offer their clients the full array of IT lifecycle
services. From technology optimisation to mitigating risks, there is a constant demand
to evaluate, deploy and manage flexible responsive and economical solution.
Outsourcing non-core operations can help to transform the business into a leaner and
smarter organisation with greater adaptability to changing economic and business
trends.
In a maturing outsourcing market where both clients and vendors are becoming
increasingly adept at understanding the fundamentals needed to develop a lasting
relationship, Wipro InfoTech offers a partnership that goes beyond merely providing a
solution. Spurred on by the goal of creating new business process and innovative
models to help the customers gain new level of efficiency, differentiation, and
flexibility, Wipro InfoTech offers a Total Outsourcing Services(TOS).
This powerful service offering ensures dynamic solutions that offer total process
visibility resulting in pre-emptive solving of problems or issues even before they can
manifest and affect the business performance.
Their solutions eschew the immature model of offering ad hoc solutions that dwell on
pricing, labour arbitrage and granular level contracts within tower group solutions that
tend towards being strategic corporate initiative. This ensures delivery of results
against services levels, larger scope relationships that enable services providers to
respond quickly and flexibly transfer of day-to-day responsibilities.
At Wipro InfoTech they also offer consulting services as part of the advisory
expertise across various domains. Their various consulting practices enable you to
achieve execution excellence to help drive your business momentum despite
challenges arising from globalisation and the dynamics of customer loyalty.
Optimising IT resources through their services, they build a strong base to empower
your technology operations. This includes identifying pain areas, deploying the right
resources to upgrade or solve them, implementing strategic business and IT tools, as
well as managing the project lifecycle. All of these achieved through their focussed
their focused quality that complies with ISO 9000, Six Sigma, SEI CMM & PCMM
level 5 standards and processes.
With over two decades of experience Wipro InfoTech has a commanding lead in
leveraging critical IT services for clients in India, the Middle East and Asia-pacific.
Their services are further backed with strategic partnership with some of the top
global technology corporations – Oracle, Microsoft, SAP, IBM among others. Their
service offerings include:
 Consulting : Strategic Cost Reduction, Business Transformation, Security
Governance, Strategy, E-Governance.
 Business Solutions: Enterprise Applications, Solutions for Fast Emerging
Businesses, Application Development and Portals, Applications Maintenance,
Third Party Testing, Data Warehouse / Business Intelligence, Point Solutions.
 Professional Services: System Integration, Availability Services, Managed
Services.
 Total Outsourcing.
DATA WAREHOUSING AND BI PRACTICES AT WIPRO
INFOTECH
Data warehouses are an organization’s corporate memory , containing critical
information that is the basis for their business management and operations.
Organizations then require their data warehouse to be scalable, secure and stable with
the ability to optimize storage and retrieval of complex sets of data. Business
intelligence systems transform an organization’s ability to convert raw data into
information that makes online multidimensional transaction and analytical processing
possible. Data warehouse (DW) and business intelligence (BI) operations together
enable organizations to base crucial business decisions on actual data analyses.
At Wipro InfoTech, the DW/BI offerings provide an organization with direct access
to information analytics that will help them respond quickly to emergent business
opportunities and rapidly changing market trends. With India’s largest dedicated
DW/BI team of 2050+ consultants who bring 4350+ person years of experience,
Wipro InfoTech DW/BI solutions framework can be customized to address the
domain specific requirements.
They have extensive experience in the finance and insurance, retail, manufacturing,
energy and utilities, telecom, healthcare and government sectors. Such varied domain
experience, along with the alliance with global vendors in the field and cross-
technology competency drive the BI operations from a departmental to an enterprise-
wide initiative. As an end-to-end service provider, they consult, architect, integrate
and manager’s customer’s DW/BI operations to ensure that they stay ahead in today’s
competitive business environment.
Wipro’s DW/BI solutions framework includes:
 DW/BI consulting
Their consultants work with you to define your specific DW/BI requirements
through a comprehensive examination of your focus areas. We work to derive
a solution that factors in your investment plans and balances cost-efficiency
with required business benefits. The key modules are:
 Preparing business cases for BI/DW
 Business & information analysis
 Preparing BI & DW solution framework
 Arriving at roadmap for implementation
 BI & DW project management
 DW/BI architecture
They formulate a design of the proposed DW/BI solutions through aligning
requirements analyses with your goals and existing infrastructure. Our key
offerings include:
 Data acquisition from different legacy systems on various
platforms including Mainframes, AS/400, Unix, Digital OLTP
platforms as DB2, IMS. IDMS, VSAM, Oracle legacy
applications, Sybase, Informix and ERP packages as PeopleSoft
etc.
 Data modelling
 ETL architecture
 Metadata architecture and management
 Security architecture
 DW/BI integration
As a part of integration phase, Wipro InfoTech DW/BI team designs and
builds physical databases ensuring that appropriate disaster recovery plans are
in place. Our data mining implementation includes data cleaning, ETL,
visualization and enabling data access. The data mining tool selection and
creation of reporting environments are domain-specific and fulfil operational
requirements such as customer profiling, target marketing, compaign
effectiveness analysis and fraud detection and management. The reporting
environment that we have developed and deployed are feature-rich and make
multi-dimensional analyses possible across various types of data warehouse.
 DW/BI management
To ensure consistent performance as data warehouse scale in volume and used
to ensure maximum benefits, our DW/BI management offering includes:
 Data warehouse administration, maintenance and support
activities
 Capacity planning
 Data warehousing audit
 Performance tuning
ABSTRACT
Organizations are all looking to increase revenue, lower expenses, and improve
profitability by improving efficiency and effectiveness in their business process and
overall performance. Business Intelligence (BI) software vendors claim that they
have the technology that can provide this improvement. Vendors concentrate on
selling products or tools that can be used to build these solutions but rarely
concentrate on the problems the customer is trying to solve. As new requirements are
realized, new vendors are brought in, new tools are purchased and new consultants
arrive to make it work. Eventually, the corporate BI initiative becomes a collection of
disjointed point solutions using a combination of expensive monolithic commercial
applications and difficult to maintain custom code .Using this current approach, each
tool is designed problems must be broken into pieces and segregated into task like
Reporting, Analysis, Data Mining, Workflow etc. There is no application responsible
for initiating, managing, verifying or coordinating results.
People and procedure are called upon to make up for these deficiencies.
This report describes the QlikView Business intelligence Platforms: QlikView, is a
suite of powerful and rapidly deployable business intelligence software that enables
enterprises and their management to effectively and proactively monitor, manage and
optimize their business. QlikViewTM lets companies analyze all their data quickly
and efficiently. QlikView eliminates the need for data warehouses, data marts, and
OLAP cubes; instead it gives users rapid access to data from multiple sources in an
intuitive, dashboard-style interface. With QlikView, companies can turn data into
information, and information into better decisions.
INTRODUCTION
Business have begun to exploit the burgeoning data online to make better decisions
about their activities. Many of their activities are rather complicated, however and
certain types of information cannot be extracted using SQL.
DATABASE APPLICATIONS
Database applications can be broadly classified into two broad categories.
1. TRANSACTION PROCESSING SYSTEM: These are the systems that recoed
information about transaction, such as product sales information for companies.
2.DECISION SUPPORT SYSTEMS: They aim to get high level information out of
the detailed information stored in transaction processing systems and to use the high
level information to make a variety of decisions.
ISSUES INVOLVED
The storage and retrieval of data for decision support raises several issues:
 Although many decision-support queries can be written in SQL, others either
cannot be expressed in SQL or cannot be easily expressed in SQL.
 Database query languages are not suited to the performance of detailed
statistical analyses of data.
 Large companies have diverse sources of data that they need to use for making
business decisions. The sources may store the data in different schemas. For
performance reasons, the data sources usually will not permit other parts of the
company to retrieve data on demand.
INTRODUCTION TO DATA WAREHOUSING
A data warehouse is a repository of information gathered from multiple sources,
stored under a unified schema, at a single site. Once gathered, the data are stored for a
long time, permitting access to historical data. Thus data warehouse provide the user
single consolidated interface to data, making decision support queries easy to write.
Moreover, by accessing information for decision support from data warehouse, the
decision maker ensures that online transaction is not affected by decision support
workload.
According to Inmon,
Data warehouse is a powerful data base model that significantly enhances user’s
ability to quickly analyze large, multidimensional data sets. It cleanses and organizes
data to allow users to make business decisions based on facts. Hence, the data in data
warehouse must have strong analytical characteristics. Creating data to be analytical
requires that it must be.
1. Subject oriented.
2. Integrated.
3. Time referenced.
4. Non-volatile.
1. Subject oriented – Data warehouse group data by subject rather by activity. In
contrast transaction system allow data by activities – payroll processing,
shipping products, loan processing. Data organized around activities cannot
answer the questions like “ How many salaried employees are there who have
a tax deduction of Rs.”X” from their account across a branches of company”.
This would require heavy searching and aggregation of employee and account
records of all branches. In data warehouse information are subject oriented
like – employee, account, sales etc.
2. Integrated data – It refers to de-duplicating information and
merging it from many sources into one consistent location.
3. Time referenced – The most important characteristics of an analytical data is
its prior state of being. It refers to time valued characteristic. For e.g. The user
may ask that “what were the total sales of the product “A” for the past three
years on New year day across region “Y” ? ”.So we should know the sales
figures of the product on New year’s Day in all the branches of that particular
region.
4. Non-volatile – Data being non-volatile help users to dig deep in the history
and to arrive specific decision making based on facts.
Example: In order to store data, over the years, many application designers in each
branch have made their individual decisions as to how an application and database
should be built. So source systems will be different in naming conventions, variable
measurements, encoding structures, and physical attributes of data. Consider a bank
that has got several branches in several countries, has millions of customers and the
lines of business of the enterprise are savings, and loans. The following example
explains how the data is integrated from source systems to target systems.
Example of Source Data
Syste
m
Name
Attribute
Name
Column Name Datatype Values
Source
Syste
m 1
Customer
Applicatio
n Date
CUSTOMER_APPLICATION_DA
TE
NUMERIC(8,
0)
11012005
Source
Syste
m 2
Customer
Applicatio
n Date
CUST_APPLICATION_DATE DATE 11012005
Source
Syste
m 3
Applicatio
n Date
APPLICATION_DATE DATE
01NOV200
5
In the aforementioned example, attribute name, column name, datatype and values are
entirely different from one source system to another. This inconsistency in data can be
avoided by integrating the data into a data warehouse with good standards.
Example of Target Data(Data Warehouse)
Target
System
Attribute
Name
Column Name Datatype Values
Record
#1
Customer
Application
Date
CUSTOMER_APPLICATION_DATE DATE 01112005
Record
#2
Customer
Application
Date
CUSTOMER_APPLICATION_DATE DATE 01112005
Record
#3
Customer
Application
Date
CUSTOMER_APPLICATION_DATE DATE 01112005
In the above example of target data, attribute names, column names, and data types
are consistent throughout the target system. This is how data from various source
systems is integrated and accurately stored into the data warehouse.
However, the means to retrieve and analyze data, to extract, transform and load data,
and to manage the data dictionary are also considered essential components of a data
warehousing system. Many references to data warehousing use this broader context.
Thus, an expanded definition for data warehousing includes business intelligence
tools, tools to extract, transform, and load data into the repository, and tools to
manage and retrieve metadata.
Data warehousing arises in an organisation's need for reliable, consolidated, unique
and integrated reporting and analysis of its data, at different levels of aggregation.
The practical reality of most organisations is that their data infrastructure is made up
by a collection of heterogeneous systems. For example, an organisation might have
one system that handles customer-relationship, a system that handles employees,
systems that handle sales data or production data, yet another system for finance and
budgeting data, etc. In practice, these systems are often poorly or not at all integrated
and simple questions like: "How much time did sales person A spend on customer C,
how much did we sell to Customer C, was customer C happy with the provided
service, Did Customer C pay his bills" can be very hard to answer, even though the
information is available "somewhere" in the different data systems.
Yet another problem might be that the organisation is, internally, in disagreement
about which data is correct. For example, the sales department might have one view
of its costs, while the finance department has another view of that cost. In such cases
the organisation can spend unlimited time discussing who's got the correct view of the
data.
It is partly the purpose of data warehousing to bridge such problems. It is important to
note that in data warehousing the source data systems are considered as given: Even
though the data source system might have been made in such a manner that it's
difficult to extract integrated information, the "data warehousing answer" is not to
redesign the data source systems but rather to make the data appear consistent,
integrated and consolidated despite the problems in the underlying source systems.
Data warehousing achieves this by employing different data warehousing techniques,
creating one or more new data repositories (i.e. the data warehouse) whose data
model(s) support the needed reporting and analysis.
There are three types of data warehouses:
1. Enterprise Data Warehouse - An enterprise data warehouse provides a central
database for decision support throughout the enterprise.
2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but
unlike the real enterprise data warehouse, data is refreshed in near real time
and used for routine business activity. One of the typical applications of the
ODS (Operational Data Store) is to hold the recent data before migration to
the Data Warehouse. Typically, the ODS are not conceptually equivalent to
the Data Warehouse albeit do store the data that have a deeper level of the
history than that of the OLTP data.
3. Data Mart - Data mart is a subset of data warehouse and it supports a
particular region, business unit or business function. It basically describes an
approach in which each individual department implement its own management
information system often based on a relational database or a smaller
multidimensional or a spread sheet like system. However these systems once
in production are difficult to extend by the use of other department because
there are a inherit design limitation in building single site of business needs
and second is that its expansion may lead to disruption of existing users
COMPONENTS OF DATA WAREHOUSE
DWH architecture is a way of representing data, communication processing,
presentation that exist for end-user computing within the enterprise. The architecture
of a typical data warehouse consist of the parts performing the following function.
 Gathering of data.
 Storage of data.
 Querying and data.
Architecture, in the context of an organization’s data warehouse efforts, is a
conceptualization of how data warehouse is built. There is no right or wrong
architecture, rather multiple architecture is used to support various environments and
situations. The worthiness of the architecture can be judged in how conceptualization
sides in the building, maintenance, and usage of data warehouse.
One possible simple conceptualization of data warehouse architecture consist of
following interconnected parts.
 Source system – The goal of data warehousing is to free the information
locked up in the operational systems and to combine it with information form
the other, often external sources of data. Increasingly, large organization are
acquiring additional data from outside databases. It is very essential to identify
the right data sources and determine an efficient process to collect facts.
 Source data transport layer – It largely contributes to data trafficking, it
particularly represents the tools and process involved in transporting data from
the source systems to the enterprise warehouse system. Since the data is huge
,the interfaces with the source system have to be robust and scalable enough to
manage secured data transmission.
 Data quality Control and Data profiling layer- Often, data quality causes the
most concern in any data warehousing solution. Incomplete and inaccurate
data will jeopardize the success of the data warehouse. It is very essential to
measure the quality of data of the source and take corrective action even
before the information is processed and loaded into the target warehouse.
 Metadata Management Layer – Metadata is the information about data within
the enterprise, record description in COBOL program are metadata or create
statements in SQL. So, in order to be fully functional warehouse , it is
necessary to have a variety of metadata available.
 Data integration layer – It is involved in scheduling the various tasks that must
be accomplished to integrate data acquired from various sources. A lot
formatting and cleansing activities happen in this layer so that data is
consistent.
 Data processing layer- The warehouse is where the dimensionally modelled
data resides. In some cases one can think of the warehouse simply as a
transforms view of the operational data, but the models for the analytical
purpose.
 End user Reporting layer - Success of a data warehouse implementation
largely depends upon ease of access to valuable information, In that sense. The
end user reporting layer is very critical component.
DATA WAREHOUSE SCHEMAS
Data warehouse typically have schemas that are designed for data analysis using
tools such as OLAP tools.
Thus the data are usually multidimensional data with two types of attributes.
 Measure attributes – Given a relation used for data analysis some of the
attributes are identified as measure attributes since they measure some
value. For e.g. The attribute “NUMBER” of the sales relation is a measure
attribute because it measures number of unit sold.
 Dimension attribute – Some or all the attributes of the relation are
identified as dimension attributes , since they define the dimension on
which , measure attributes are viewed.
Table containing multidimensional data are called fact tables and are usually very
large .To minimize the storage requirement dimension attributes are usually short
identifiers that are foreign key into other tables called dimension tables.
TYPES OF SCHEMA
 STAR SCHEMA
The star schema (sometimes referenced as star join schema) is the simplest style of
data warehouse schema. The star schema consists of a few fact tables (possibly only
one, justifying the name) referencing any number of dimension tables. The star
schema is considered an important special case of the snowflake schema.
 SNOWFLAKE SCHEMA
A snowflake schema is a logical arrangement of tables in a multidimensional
database such that the entity relationship diagram resembles a snowflake in shape.
Closely related to the star schema, the snowflake schema is represented by centralized
fact tables which are connected to multiple dimensions. In the snowflake schema,
however, dimensions are normalized into multiple related tables whereas the star
schema's dimensions are de-normalized with each dimension being represented by a
single table. When the dimensions of a snowflake schema are elaborate, having
multiple levels of relationships, and where child tables have multiple parent tables
("forks in the road"), a complex snowflake shape starts to emerge. The "snow-flaking"
effect only affects the dimension tables and not the fact tables.
 CONSTELLATION SCHEMA
For each star schema or snowflake schema it is possible to construct a fact
constellation schema. This schema is more complex than star or snowflake
architecture, which is because it contains multiple fact tables. This allows
dimension tables to be shared amongst many fact tables.
That solution is very flexible, however it may be hard to manage and support.
The main disadvantage of the fact constellation schema is a more complicated
design because many variants of aggregation must be considered.
BENEFITS OF DATA WAREHOUSE
Some of the benefits that a data warehouse provides are as follows:
 A data warehouse provides a common data model for all data of interest
regardless of the data's source. This makes it easier to report and analyze
information than it would be if multiple data models were used to retrieve
information such as sales invoices, order receipts, general ledger charges, etc.
 Prior to loading data into the data warehouse, inconsistencies are identified
and resolved. This greatly simplifies reporting and analysis.
 Information in the data warehouse is under the control of data warehouse users
so that, even if the source system data is purged over time, the information in
the warehouse can be stored safely for extended periods of time.
 Because they are separate from operational systems, data warehouses provide
retrieval of data without slowing down operational systems.
 Data warehouses can work in conjunction with and, hence, enhance the value
of operational business applications, notably customer relationship
management (CRM) systems.
 Data warehouses facilitate decision support system applications such as trend
reports (e.g., the items with the most sales in a particular area within the last
two years), exception reports, and reports that show actual performance versus
goals.
DISADVANTAGES OF DATA WAREHOUSE
There are also disadvantages to using a data warehouse. Some of them are:
 Data warehouses are not the optimal environment for unstructured data.
 Because data must be extracted, transformed and loaded into the warehouse,
there is an element of latency in data warehouse data.
 Over their life, data warehouses can have high costs.
 Data warehouses can get outdated relatively quickly. There is a cost of
delivering suboptimal information to the organisation.
 There is often a fine line between data warehouses and operational systems.
Duplicate, expensive functionality may be developed. Or, functionality may be
developed in the data warehouse that, in retrospect, should have been
developed in the operational systems.
ETL TOOL IN DATA WAREHOUSE
Extract, transform, and load (ETL) is a process in database usage and especially in
data warehousing that involves:
 Extracting data from outside sources
 Transforming it to fit operational needs (which can include quality levels)
 Loading it into the end target (database or data warehouse).
Extract
The first part of an ETL process involves extracting the data from the source systems.
Most data warehousing projects consolidate data from different source systems. Each
separate system may also use a different data organization format. Common data
source formats are relational databases and flat files, but may include non-relational
database structures such as Information Management System (IMS) or other data
structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential
Access Method (ISAM), or even fetching from outside sources such as through web
spidering or screen-scraping. Extraction converts the data into a format for
transformation processing.
Transform
The transform stage applies a series of rules or functions to the extracted data from
the source to derive the data for loading into the end target. Some data sources will
require very little or even no manipulation of data. In other cases, one or more of the
following transformation types may be required to meet the business and technical
needs of the target database:
 Selecting only certain columns to load (or selecting null columns not to load).
For example, if source data has three columns (also called attributes) say roll-
no, age and salary then the extraction may take only roll-no and salary.
Similarly, extraction mechanism may ignore all those records where salary is
not present (salary = null).
 Translating coded values (e.g., if the source system stores 1 for male and 2 for
female, but the warehouse stores M for male and F for female), this calls for
automated data cleansing; no manual cleansing occurs during ETL
 Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M)
 Deriving a new calculated value (e.g., sale_amount = qty * unit_price)
 Filtering
 Sorting
 Joining data from multiple sources (e.g., lookup, merge)
 Aggregation (for example, rollup — summarizing multiple rows of data —
total sales for each store, and for each region, etc.)
 Transposing or pivoting (turning multiple columns into multiple rows or vice
versa)
 Splitting a column into multiple columns (e.g., putting a comma-separated list
specified as a string in one column as individual values in different columns)
 Disaggregation of repeating columns into a separate detail table (e.g., moving
a series of addresses in one record into single addresses in a set of records in a
linked address table)
 Lookup and validate the relevant data from tables or referential files for slowly
changing dimensions.
 Applying any form of simple or complex data validation. If validation fails, it
may result in a full, partial or no rejection of the data, and thus none, some or
all the data is handed over to the next step, depending on the rule design and
exception handling. Many of the above transformations may result in
exceptions, for example, when a code translation parses an unknown code in
the extracted data.
Load
Load phase loads the data into the end target, usually the data warehouse (DW).
Depending on the requirements of the organization, this process varies widely. Some
data warehouses may overwrite existing information with cumulative, frequently
updating extract data is done on daily, weekly or monthly. while other DW (or even
other parts of the same DW) may add new data in a historicized form, for example,
hourly. To understand this, consider a DW that is required to maintain sales record of
last one year. Then, the DW will overwrite any data that is older than a year with
newer data. However, the entry of data for any one year window will be made in a
historicized manner. The timing and scope to replace or append are strategic design
choices dependent on the time available and the business needs. More complex
systems can maintain a history and audit trail of all changes to the data loaded in the
DW.
Examples
For example, a financial institution might have information on a customer in several
departments and each department might have that customer's information listed in a
different way. The membership department might list the customer by name, whereas
the accounting department might list the customer by number. ETL can bundle all this
data and consolidate it into a uniform presentation, such as for storing in a database or
data warehouse.
ETL Tools
At present the most popular and widely used ETL tools and applications on the
market are:
# IBM Web-sphere Data Stage (Formerly known as Ascential Data Stage and Ardent
Data Stage)
# Informatica Power Centre
# Oracle Warehouse Builder
# Ab-Initio
# Pentaho Data Integration - Kettle Project (open source ETL)
# SAS ETL studio
# Cognos Decision stream
# Business Objects Data Integrator (BODI)
# Microsoft SQL Server Integration Services (SSIS)
OLTP(Online transaction processing )
Definition: Databases must often allow the real-time processing of SQL transactions
to support e-commerce and other time-critical applications. This type of processing is
known as online transaction processing (OLTP).
Online transaction processing, or OLTP, refers to a class of systems that facilitate
and manage transaction-oriented applications, typically for data entry and retrieval
transaction processing. The term is somewhat ambiguous; some understand a
"transaction" in the context of computer or database transactions, while others (such
as the Transaction Processing Performance Council) define it in terms of business or
commercial transactions. OLTP has also been used to refer to processing in which the
system responds immediately to user requests. An automatic teller machine (ATM)
for a bank is an example of a commercial transaction processing application.
The technology is used in a number of industries, including banking, airlines, mail
order, supermarkets, and manufacturing. Applications include electronic banking,
order processing, employee time clock systems, e-commerce. The most widely used
OLTP system is probably IBM's CICS.
Benefits
Online Transaction Processing has two key benefits: simplicity and efficiency.
Reduced paper trails and the faster, more accurate forecasts for revenues and expenses
are both examples of how OLTP makes things simpler for businesses.
Disadvantages
As with any information processing system, security and reliability are considerations.
Online transaction systems are generally more susceptible to direct attack and abuse
than their offline counterparts. When organizations choose to rely on OLTP,
operations can be severely impacted if the transaction system or database is
unavailable due to data corruption, systems failure, or network availability issues.
Additionally, like many modern online information technology solutions, some
systems require offline maintenance which further affects the cost-benefit analysis.
Contrasting Data warehouse and OLTP
One major difference between the types of system is that data warehouses are not
usually in third normal form (3NF), a type of data normalization common in OLTP
environments.
Data warehouses and OLTP systems have very different requirements. Here are some
examples of differences between typical data warehouses and OLTP systems:
 Workload
Data warehouses are designed to accommodate ad hoc queries. You might not
know the workload of your data warehouse in advance, so a data warehouse
should be optimized to perform well for a wide variety of possible query
operations.
OLTP systems support only predefined operations. Your applications might be
specifically tuned or designed to support only these operations.
 Data modifications
A data warehouse is updated on a regular basis by the ETL process (run
nightly or weekly) using bulk data modification techniques. The end users of a
data warehouse do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification
statements to the database. The OLTP database is always up to date, and
reflects the current state of each business transaction.
 Schema design
Data warehouses often use de-normalized or partially de-normalized schemas
(such as a star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize
update/insert/delete performance, and to guarantee data consistency.
 Typical operations
A typical data warehouse query scans thousands or millions of rows. For
example, "Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example,
"Retrieve the current order for this customer."
 Historical data
Data warehouses usually store many months or years of data. This is to
support historical analysis.
OLTP systems usually store data from only a few weeks or months. The
OLTP system stores only historical data as needed to successfully meet the
requirements of the current transaction.
OLAP(online analytical processing)
Online analytical processing, or OLAP is an approach to swiftly answer multi-
dimensional analytical queries. OLAP is part of the broader category of business
intelligence, which also encompasses relational reporting and data mining. The typical
applications of OLAP are in business reporting for sales, marketing, management
reporting, business process management (BPM), budgeting and forecasting, financial
reporting and similar areas. The term OLAP was created as a slight modification of
the traditional database term OLTP (Online Transaction Processing).
Databases configured for OLAP use a multidimensional data model, allowing for
complex analytical and ad-hoc queries with a rapid execution time. They borrow
aspects of navigational databases and hierarchical databases that are faster than
relational databases.
The output of an OLAP query is typically displayed in a matrix (or pivot) format. The
dimensions form the rows and columns of the matrix; the measures form the values.
At the core of any OLAP system is the concept of an OLAP cube (also called a
multidimensional cube or a hypercube). It consists of numeric facts called measures
which are categorized by dimensions. The cube metadata is typically created from a
star schema or snowflake schema of tables in a relational database. Measures are
derived from the records in the fact table and dimensions are derived from the
dimension tables.
Each measure can be thought of as having a set of labels, or meta-data associated with
it. A dimension is what describes these labels; it provides information about the
measure.
A simple example would be a cube that contains a store's sales as a measure, and
Date/Time as a dimension. Each Sale has a Date/Time label that describes more about
that sale.
Any number of dimensions can be added to the structure such as Store, Cashier, or
Customer by adding a column to the fact table. This allows an analyst to view the
measures along any combination of the dimensions.
For Example:
Sales Fact Table
+------------------------+
| sale_amount | time_id |
+------------------------+ Time Dimension
| 2008.08| 1234 |---+ +-----------------------------+
+------------------------+ | | time_id | timestamp |
| +-----------------------------+
+---->| 1234 | 20080902 12:35:43 |
+-----------------------------+
MOLAP(MULTIDIMENSIONAL OLAP)
MOLAP stands for Multidimensional Online Analytical Processing.
MOLAP is an alternative to the ROLAP (Relational OLAP) technology. While both
ROLAP and MOLAP analytic tools are designed to allow analysis of data through the
use of a multidimensional data model, MOLAP differs significantly in that it requires
the pre-computation and storage of information in the cube — the operation known as
processing. MOLAP stores this data in an optimized multidimensional array storage,
rather than in a relational database (i.e. in ROLAP).
Advantages of MOLAP
 Fast query performance due to optimized storage, multidimensional indexing
and caching.
 Smaller on-disk size of data compared to data stored in relational database due
to compression techniques.
 Automated computation of higher level aggregates of the data.
 It is very compact for low dimension data sets.
 Array model provides natural indexing
 Effective data extract achieved through the pre-structuring of aggregated data.
Disadvantages of MOLAP
 The processing step (data load) can be quite lengthy, especially on large data
volumes. This is usually remedied by doing only incremental processing, i.e.,
processing only the data which has changed (usually new data) instead of
reprocessing the entire data set.
 MOLAP tools traditionally have difficulty querying models with dimensions
with very high cardinality (i.e., millions of members).
 Some MOLAP products have difficulty updating and querying models with
more than ten dimensions. This limit differs depending on the complexity and
cardinality of the dimensions in question. It also depends on the number of
facts or measures stored. Other MOLAP products can handle hundreds of
dimensions.
 MOLAP approach introduces data redundancy.
ROLAP(RELATIONAL OLAP)
ROLAP stands for Relational Online Analytical Processing.
ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology.
While both ROLAP and MOLAP analytic tools are designed to allow analysis of data
through the use of a multidimensional data model, ROLAP differs significantly in that
it does not require the pre-computation and storage of information. Instead, ROLAP
tools access the data in a relational database and generate SQL queries to calculate
information at the appropriate level when an end user requests it. With ROLAP, it is
possible to create additional database tables (summary tables or aggregations) which
summarize the data at any desired combination of dimensions.
While ROLAP uses a relational database source, generally the database must be
carefully designed for ROLAP use. A database which was designed for OLTP will not
function well as a ROLAP database. Therefore, ROLAP still involves creating an
additional copy of the data. However, since it is a database, a variety of technologies
can be used to populate the database.
Advantages of ROLAP
 ROLAP is considered to be more scalable in handling large data volumes,
especially models with dimensions with very high cardinality (i.e. millions of
members).
 With a variety of data loading tools available, and the ability to fine tune the
ETL code to the particular data model, load times are generally much shorter
than with the automated MOLAP loads.
 The data is stored in a standard relational database and can be accessed by any
SQL reporting tool (the tool does not have to be an OLAP tool).
 ROLAP tools are better at handling non-aggregatable facts (e.g. textual
descriptions). MOLAP tools tend to suffer from slow performance when
querying these elements.
 By decoupling the data storage from the multi-dimensional model, it is
possible to successfully model data that would not otherwise fit into a strict
dimensional model.
 The ROLAP approach can leverage database authorization controls such as
row-level security, whereby the query results are filtered depending on preset
criteria applied, for example, to a given user or group of users (SQL WHERE
clause).
Disadvantages of ROLAP
 There is a consensus in the industry that ROLAP tools have slower
performance than MOLAP tools. However, see the discussion below about
ROLAP performance.
 The loading of aggregate tables must be managed by custom ETL code. The
ROLAP tools do not help with this task. This means additional development
time and more code to support.
 When the step of creating aggregate tables is skipped, the query performance
then suffers because the larger detailed tables must be queried. This can be
partially remedied by adding additional aggregate tables, however it is still not
practical to create aggregate tables for all combinations of
dimensions/attributes.
 ROLAP relies on the general purpose database for querying and caching, and
therefore several special techniques employed by MOLAP tools are not
available (such as special hierarchical indexing). However, modern ROLAP
tools take advantage of latest improvements in SQL language such as CUBE
and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP
extensions. These SQL improvements can mitigate the benefits of the MOLAP
tools.
 Since ROLAP tools rely on SQL for all of the computations, they are not
suitable when the model is heavy on calculations which don't translate well
into SQL. Examples of such models include budgeting, allocations, financial
reporting and other scenarios.
DIFFERENCE BETWEEN OLTP AND OLAP
OLTP OLAP
Current data &
Short database transactions
Current as well as historical data &
Long database transactions
Online update/insert/delete Batch update/insert/delete
Normalization is promoted De-normalization is promoted
High volume transactions Low volume transactions
Transaction recovery is necessary Transaction recovery is not necessary
Database size: 100 MB TO GB Database size :100 GB to TB
BUSINESS INTELLIGENCE
Business Intelligence (BI) refers to computer-based techniques used in spotting,
digging-out, and analyzing business data, such as sales revenue by products and/or
departments or associated costs and incomes
BI technologies provide historical, current, and predictive views of business
operations. Common functions of Business Intelligence technologies are reporting,
online analytical processing, analytics, data mining, business performance
management, benchmarking, text mining, and predictive analytics.
Business Intelligence often aims to support better business decision-making Thus a BI
system can be called a decision support system (DSS) Though the term business
intelligence is often used as a synonym for competitive intelligence, because they
both support decision making, BI uses technologies, processes, and applications to
analyze mostly internal, structured data and business processes while competitive
intelligence, is done by gathering, analyzing and disseminating information with or
without support from technology and applications, and focuses on all-source
information and data (unstructured or structured), mostly external to, but also internal
to a company, to support decision making.
The five key stages of Business Intelligence:
1. Data Sourcing
2. Data Analysis
3. Situation Awareness
4. Risk Assessment
5. Decision Support
Data sourcing
Business Intelligence is about extracting information from multiple sources of data.
The data might be: text documents - e.g. memos or reports or email messages;
photographs and images; sounds; formatted tables; web pages and URL lists. The key
to data sourcing is to obtain the information in electronic form. So typical sources of
data might include: scanners; digital cameras; database queries; web searches;
computer file access; etcetera.
Data analysis
Business Intelligence is about synthesizing useful knowledge from collections of data.
It is about estimating current trends, integrating and summarising disparate
information, validating models of understanding, and predicting missing information
or future trends. This process of data analysis is also called data mining or knowledge
discovery. Typical analysis tools might use:-
 Probability theory - e.g. classification, clustering and Bayesian networks.
 Statistical methods - e.g. regression.
 Operations research - e.g. queuing and scheduling.
 Artificial intelligence - e.g. neural networks and fuzzy logic.
Situation awareness
Business Intelligence is about filtering out irrelevant information, and setting the
remaining information in the context of the business and its environment. The user
needs the key items of information relevant to his or her needs, and summaries that
are syntheses of all the relevant data (market forces, government policy etc.).
Situation awareness is the grasp of the context in which to understand and make
decisions. Algorithms for situation assessment provide such syntheses automatically.
Risk assessment
Business Intelligence is about discovering what plausible actions might be taken, or
decisions made, at different times. It is about helping you weigh up the current and
future risk, cost or benefit of taking one action over another, or making one decision
versus another. It is about inferring and summarising your best options or choices.
Decision support
Business Intelligence is about using information wisely. It aims to provide warning
you of important events, such as takeovers, market changes, and poor staff
performance, so that you can take preventative steps. It seeks to help you analyse and
make better business decisions, to improve sales or customer satisfaction or staff
morale. It presents the information you need, when you need it.
MODELLING TECHNIQUES IN DATA WAREHOUSE
CONCEPTUAL DATA MODEL
A conceptual data model identifies the highest-level relationships between the
different entities. Features of conceptual data model include:
* Includes the important entities and the relationships among them.
* No attribute is specified.
* No primary key is specified.
The figure below is an example of a conceptual data model.
From the figure above, we can see that the only information shown via the conceptual
data model is the entities that describe the data and the relationships between those
entities. No other information is shown through the conceptual data model.
LOGICAL DATA MODEL
A logical data model describes the data in as much detail as possible, without regard
to how they will be physical implemented in the database. Features of a logical data
model include:
* Includes all entities and relationships among them.
* All attributes for each entity are specified.
* The primary key for each entity is specified.
* Foreign keys (keys identifying the relationship between different entities) are
specified.
* Normalization occurs at this level.
The steps for designing the logical data model are as follows:
1. Specify primary keys for all entities.
2. Find the relationships between different entities.
3. Find all attributes for each entity.
4. Resolve many-to-many relationships.
5. Normalization.
The figure below is an example of a logical data model.
Comparing the logical data model shown above with the conceptual data model
diagram, we see the main differences between the two:
* In a logical data model, primary keys are present, whereas in a conceptual data
model, no primary key is present.
* In a logical data model, all attributes are specified within an entity. No attributes
are specified in a conceptual data model.
* Relationships between entities are specified using primary keys and foreign keys
in a logical data model. In a conceptual data model, the relationships are simply
stated, not specified, so we simply know that two entities are related, but we do not
specify what attributes are used for this relationship.
PHYSICAL DATA MODEL
Physical data model. Physical data model represents how the model will be built in
the database. A physical database model shows all table structures, including column
name, column data type, column constraints, primary key, foreign key, and
relationships between tables. Features of a physical data model include:
* Specification all tables and columns.
* Foreign keys are used to identify relationships between tables.
* De-normalization may occur based on user requirements.
* Physical considerations may cause the physical data model to be quite different
from the logical data model.
* Physical data model will be different for different RDBMS. For example, data
type for a column may be different between My-SQL and SQL Server.
The steps for physical data model design are as follows:
1. Convert entities into tables.
2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical constraints / requirements.
The figure below is an example of a physical data model
Comparing the logical data model shown above with the logical data model diagram,
we see the main differences between the two:
* Entity names are now table names.
* Attributes are now column names.
* Data type for each column is specified. Data types can be different depending on
the actual database being used.
QLIKVIEW – A BUSINESS INTELLIGENCE TOOL
What is QlilView?
QlikView is a flagged product of QlikTech Company and can be classify it to the
category of Business Intelligence tools of the future. During the 2007 QlikTech
gained a title of the "coolest" vendor in BI and has been growing faster than any other
BI vendor. Recognized as a 'Visionary' in Gartner Group's annual Magic Quadrant
(2007). According to Gartner's predictions "By 2012, 70% of Global 1000
organizations will load detailed data into memory as the primary method to optimize
BI application performance (0.7 probability)" and the QlikView is one of the leader in
this market. QlikView creates endless possibilities of making ad hoc queries in a non-
hierarchical data structure. It is possible thanks to (AQL - Associative Query Logic) -
for automatically associating values in the internal QlikView database. QlikView
simplifies analysis for everyone. It makes it possible for anybody to create very
useful, accurate KPI, measurement reports and performance dashboards and make
accurate,strategicdecisions.
QlikView has over 4,500 customers in 58 countries, and adds 11 new customers each
day. In addition to hundreds of small and midsized companies, QlikTech's customers
include large corporations such as Pfizer, AstraZeneca, The Campbell Soup
Company, Top Flite, and 3M. QlikTech is privately held and venture backed by Accel
Partners, Jerusalem Venture Partners, and Industrifonden.
Qlikview is an easy to use and flexible business intelligence software that has been
around since 1993. It allows for interactive analysis and is easy to develop, implement
and train users on.
Other QlikTech products are, Qlikview Server which provides data analysis of
Qlikview data over the web and Qlikview Publisher which helps control the
distibution of Qlikview's applications.
QlikView, is a suite of powerful and rapidly deployable business intelligence software
that enables enterprises and their management to effectively and proactively monitor,
manage and optimize their business. QlikViewTM lets companies analyze all their
data quickly and efficiently. QlikView eliminates the need for data warehouses, data
marts, and OLAP cubes; instead it gives users rapid access to data from multiple
sources in an intuitive, dashboard-style interface. With QlikView, companies can turn
data into information, and information into better decisions.
QlikView is quick to implement, flexible and powerful to use, and easy to learn. It
provides a rapid return on investment and a low total cost of ownership compared to
traditional OLAP and reporting tools.
QlikView helps California Casualty improve efficiencies
25% improvement in sales conversions, 60% improvement in compliance
response time.
The Business Failure of Traditional Business Intelligence
A recent article in Intelligent Enterprise magazine captured the three major issues of today’s
traditional BI solutions: 1) Data reporting has been an afterthought to using core business
applications on a day-to-day basis, breaking the “link between insight and action.” 2)
Consolidating disparate tools into a suite isn’t enough to give business users the information
they need. Search and semantics need to be integrated. 3) “Build it and they will come”
doesn’t provide insight, just more technology. BI needs to answer the needs of decision
makers, and needs to deliver incremental successes along the way.
These factors, among others, have undoubtedly led to the dismal performance in BI
initiatives. According to a study published in DM Review, a leading business intelligence
publication, the average total implementation time for BI initiatives is 17 months, with five
months to deploy the first usable analytic application. The average total cost of
implementation is a staggering $12.8 million. And at best, according to the survey, is a 35%
success rate for internally built BI/DW systems purchased operational analytic applications
are considered successful a mere 13% of the time.
How long did it take to implement your business intelligence initiative?
These failed initiatives cost more than money and resources – they hamper business
performance in nearly every way. A summer 2005 survey of 385 finance and IT
executives, by CFO Research Services asked respondents to identify the drivers of
poor information quality (IQ). Nearly half the survey respondents – 45 percent – cite
disparate, non-integrated IT systems and the variability of business processes as an
acute problem that constrains management’s ability to work effectively and focus on
high-value activities. Approximately the same number agrees that finance and
business units alike spend too much time developing supplemental reports and
analysis.
A Revolution in Business Intelligence
The OLAP Tradition
Twenty years ago memory was expensive and processors were slow. Faced with these
constraints, developers at the time devised architecture for delivering results of multi-
dimensional analysis which relied on pre-calculating fixed analyses. Simply put, they pre-
calculated all measures across every possible combination of dimensions. For example, for
total sales by sales person and region, the system would calculate total sales for each sales
person for each region, and for every union of sales person and region. The results of these
calculations were stored and retrieved when an end user requested a particular “analysis.”
This is what is traditionally referred to as “calculating the cube” and the “cube” is the
mechanism which organizes and stores the results. Because the results were pre-calculated,
regardless of how long in took to calculate the results, the response time from the perspective
of the end user was instantaneous.
The Enabling Technology or Change Agent
Today, we have a fundamentally different technology platform available to us on
which to build business intelligence. Specifically three things have happened:
First, Moore’s Law has relentlessly beat its drum – resulting in processors which are
significantly faster today than they were twenty years ago and memory which is
significantly less expensive. The difference in price/performance for both factors is
well over a factor of 1,000 higher today than it was then.
Second, the mainstream availability of 64-bit processors raises the amount of memory
a computer can utilize. A 32-bit processor can use four gigabytes of memory at a
maximum, and a portion of that must be devoted to the operating system. A 64-bit
processor can use 17,179,869,184 gigabytes or 16 Exabyte of RAM – a factor of four
billion more. Of course, the practical limitation of computers available today is much
lower, but machines with 40, 80, or even 120 gigabytes of memory are readily
available for less than $30,000.
Third, hardware manufacturers have shifted from computers with few fast processors
to computers with multiple lower-power, lower-speed processors. The challenge
today is keeping computers operating at a reasonable temperature. Intel’ and AMD’s
stated strategy for achieving this goal is to equip computers with many lower power
processors working in parallel. Today it is common to find computers with 2, 4, 16,
32 or even 128 processors. In addition, newer processors have multiple “cores”
bundled on a single chip.
QlikView’s Premise: In Memory BI
QlikView was built with a simple architectural premise –
all data should be held in memory, and all calculations
should be performed when requested and not prior.
Twenty years ago this would have been impossible. In
1993, when QlikTech was founded, it was still a pretty
crazy idea. But now, the trends in the underlying
platform (referenced in the previous section) have lifted
the constraints so that organizations of all sizes can now
benefit.
QlikView’s patented technology is based on an extremely efficient, in memory data
model. High-speed associations occur as the user clicks in the applications and the
display is updated immediately, allowing users to work with millions of cells of data
and still respond to queries in less than a second.
As a result of this design, QlikView removes the need to pre-aggregate data, define
complex dimensional hierarchies and generate cubes. QlikView performs calculations
on the fly, giving the power of multidimensional analysis to every user, not just the
highly trained few. By taking full advantage of 64-bit technology’s memory capacity
QlikView can provide summary level metrics and record level detail on the same
architecture. Companies gain infinitely scalable business analysis solutions that
provide summary KPIs as well as highly granular, detailed analyses.
In its recent Research Note, leading analysts at Gartner reported on the value of this
approach: “Our research indicates that query performance using this in-memory
method is often just as fast as or faster than traditional aggregate-based architectures.
In-memory technology not only retrieves the data faster, but it also performs
calculations on the query results much faster than disk-based approaches…Therefore,
with in-memory technology, users can freely explore detailed data in an unfettered
manner without the limitations of a cube or aggregate table to receive good
performance.”
A Revolution in Benefits
The QlikView solution, because of its unique integrated components and because it
operates entirely in memory, offers some unique advantages over traditional OLAP:

Fast Time-to-Value: With traditional OLAP, constructing cubes is time consuming
and requires expert skills. This process can take months, and sometimes over a year.
In addition, the cube must be constructed before it can be calculated, a process which
itself can take hours. And, all this must occur before analysis or reporting can be
performed – before the user even sees answers to his questions. Because the data is
loaded in memory, creating analysis in QlikView takes seconds. There is no pre-
definition of what is a dimension – any data is available as a dimension and any data
is available as a measure. The time implementing QlikView is spent locating data, and
deciding what analysis is interesting or relevant to solving the business question.
Typically, this process takes only a week or two.

Easy to Use: The entire end user experience in QlikView is driven by the “click.”
End users enjoy using QlikView because it works the way their mind does. Each time
they want to review the data sliced a new way, they simply click on the data they want
to evaluate. Because QlikView operates in memory, with each click all data and
measures are recalculated to reflect the selection. Users can go from high level
aggregates (e.g., roll up of margin on all products in a specific line) to individual
records (e.g., which order was that?) in a click – without pre-defining the path to the
individual record. The QlikView UI uses color coding which provides instant
feedback to queries.

Powerful: Because queries and calculations are performed in memory, they are
extremely quick. In addition, QlikView is not constrained by the speed of the
underlying source. Even if the underlying data is stored in a system which has poor
query performance (for instance, a text file), the performance is always optimal
because the data is loaded in memory.
QlikView also compresses data as it is stored in memory, allowing large amounts of
data to be stored. Typically, there is a 10X reduction in size of the data once it’s in
memory.

Flexible: One of the major issues with traditional OLAP is that modifying an
analysis
requires changing the cube, a process which can take a very long time. In addition,
this process is typically controlled by IT. With QlikView, viewing analysis by a new
dimension or changing a measure can be performed by business professionals in
seconds. Standard interfaces, including ODBC and Web Services, mean that any data
source can be analyzed in QlikView. What’s more, users can do “local” or “desktop”
analysis, using the full data and interactivity of the application on laptops.

Scalable: QlikView is designed to scale easily in both the amount of data it can
handle, and the number of users working with it. It’s simple to deploy to thousands of
users – utilizing all available hardware power across all available processors and cores,
requiring only a web browser.
An Overview of QlikView
QlikView is revolutionizing business intelligence with fast, powerful and visual
analysis that’s simple to use. QlikView’s patented technology offers all of the features
of “traditional” analytics solutions – dashboards and alerts, multi-dimensional
analyses, slice-and-dice of data – without the limitations, cost or complexity of
traditional BI applications. QlikView solutions can be deployed in days, users can be
trained in minutes, and end users
get results instantly.
The QlikView Platform
QlikView offers all of the capabilities that traditionally required a complex and costly
suite of products, on a single unified platform. QlikView provides flexible ad-hoc
analysis capabilities, powerful analytic applications, and simple printable reports. This
allows organizations to deploy QlikView to everyone – highly skilled analysts doing
ad-hoc detailed reporting, executives requiring a dashboard of critical business
information and plant supervisors analyzing output performance. Further, QlikView
allows organizations to eliminate unused paper reports, and replace them with
demand-driven reporting.
QlikView Enterprise – For the developer
QlikView Enterprise is the complete developer’s tool for building QlikView
applications. QlikView Enterprise lets developers load disparate data sources for
access in a single application. The data load script supports over 150 functions for
data cleansing, manipulation and aggregation. An intuitive, wizard-driven interface
allows powerful, visually interactive applications to be developed quickly.
QlikView Publisher – For distribution
QlikView Publisher ensures that the right information reaches the right user at the
right time. As the use of business analysis spreads throughout the organization,
controlling the distribution of analysis becomes increasingly important. QlikView
Publisher allows for complete control of the distribution of a company’s QlikView
applications, automating the data refresh process for QlikView application data. In
addition, it ensures that applications are distributed to the correct users when they
need them.
QlikView Server – For security
QlikView Server is the central source for truth in an organization. With today’s
distributed workforce, QlikView Server provides a simple way for organizations to
ensure that everyone has access to the latest data and analysis regardless of their
location. Regardless of the client chosen – zero-footprint DHTML, Windows,
ActiveX plug-in, or Java – QlikView Server provides accessto the latest version of
each QlikView application.
QlikView Professional – For the power user
QlikView Professional lets power-users build, change or modify the layout of existing
QlikView applications. QlikView Professional users can refresh existing data sources,
and can choose to work with either local applications or applications distributed via
QlikView Server. Power user scan work with local data, including offline enterprise
applications, with no limitations.
QlikView Analyzer – For the general user
QlikView Analyzer lets end-users connect to server-based QlikView applications.
QlikView Analyzer has a number of deployment options, including Java clients
(supporting Sun and MSFT-Java), plug-in for MSFT IE and AJAX zero footprint
clients. The installed Analyzer EXE client also provides offline analysis and reporting
capabilities.
QlikView Architecture
Most traditional databases are built upon a relational model. Records are broken apart
to reduce redundancy and key fields are used to put the records back together at the
time they are used. Database programmers are required to make tradeoffs between
increased speed at the cost of more space and more time to add or edit records, and
the database user often suffers based on these decisions. QlikView was built with a
simple architectural premise – all data should be held in memory, and all calculations
should be performed when requested and not prior. QlikTech’s goal is to deliver
powerful analytic and reporting solutions in a quarter of the time, at half the cost, and
with twice the value of competing OLAP cube (Online Analytical Processing)-based
products. QlikView is designed so that the entire application (data model included) is
held in RAM – this is what makes it uniquely efficient compared to traditional OLAP
cube-based applications. It creates an in-memory data model as it loads data from a
data source, enabling it to access millions of cells of data and still respond to queries
in less than a second.
High-speed associations occur as the user clicks in the various sheet objects and the
display is updated immediately. QlikView operates much faster and requires
significantly less space than an equivalent relational database because it optimizes the
data as it loads – removing redundant field data and automatically linking tables
together. Indexes are not required, making every field available as a search field
without any performance penalty. Because of this design, QlikView typically requires
a 1/10th of the space required for the same data represented in a relational model, i.e.
100GB data fits into 10GB of memory. There is no limit to the number of tables
allowed in an application, or to the number of fields, rows or cells in a single table.
RAM is the only factor that limits the size of an application. QlikView offers three
components in an integrated solution:
Fast Query Engine: Loading the data into memory allows QlikView to query, or
subset, the data instantly to only reveal the data which is relevant to a given user. In
addition, QlikView shows users the data which is excluded by a selection.

On Demand Calculation Engine: charts, graphs, and tables of all types in
QlikView are multidimensional analysis. That is, they show one or more measures
(e.g., metrics, KPIs, expressions, etc.) across one or more dimensions (example: total
sales by region). The major difference is that these calculations are performed as the
user clicks and never prior.

Visually Interactive User Interface (UI): QlikView offers hundreds of possible
chart and table types and varieties; there are list boxes for navigating dimensions;
statistic boxes; and many other UI elements. Every UI element can be clicked on to
query.
QlikView Technical Features
Data and Data Loading
QlikView loads data directly from most data sources (i.e., ODBC and OLEDB
sources, using vendor specific drivers), any text or table data file (i.e., delimited text
files, Excel files, XML files, etc.), and any formats, as well as data warehouses and
data marts (although these are not required). QlikView also offers a plug-in model for
loading custom data sources (web services).QlikView is designed to handle a
remarkable amount of data. There is no limit to the number of tables allowed in an
application. In addition, there is no limit to the number of fields, rows or cells in a
single table – QlikView can handle billions of unique values in a given field.
RAM is the only other factor that limits the size of an application. The maximum size
of a QlikView application is closely tied to the available RAM on the system where
the application will run. However, it is not as easy as looking at the size of a relational
database and comparing that to the RAM on the system to determine if the application
is appropriate for QlikView. As QlikView loads data from a source database, the
data is highly compressed and optimized, typically resulting in a QlikView
application of only 10% of the size of the original source.
Load Script
QlikView can load data that is stored in a variety of formats, as mentioned above.
Data can be loaded from generic tables, cross tables, mapping tables (data cleansing),
and interval matching tables. Tables can be joined, concatenated, sampled and linked
to external information such as other programs, bitmaps, URLs, etc.
In order to pull data from a data source, QlikView executes a load script. The load
script defines the source databases and tables and fields that should be loaded into
QlikView. In addition, you can calculate new variables and records using hundreds of
functions available in the script. In order to help you create a load script, QlikView
includes a wizard that will generate the script.
Visual Basic Script and JavaScript Support
Programmers can develop VBScript or JavaScript macros to add specific functionality
to an application. Macros can be attached to button objects that a user must click to
activate, or the macros can be attached to various QlikView events. For example, a
macro can be automatically invoked whenever an application is opened, when the
load script is executed, or when a selection is made in a list box.
Analysis Engine
As described earlier, QlikView’s In Memory Data Model forms the basis for every
QlikView application. It holds all data loaded down on a transaction level, and is part
of the QVW file (QlikView file format), which is loaded into RAM.
The Platform is optimized to run on every available Windows platform (32 & 64-bit),
and makes use of all available processing power and RAM for each specific platform.
The Selection Engine processes the user “point-and-click” and returns the associated
values to that query. It provides sub-second response times on queries made to the In
Memory Data Model
The Chart & Table Engine handles the calculations and graphic display of the charts
in the user interface. It calculates multiple “cubes” in real time (one cube for each
graph in application), and promotes user selections directly in graphs.
Clients
Supported clients include an installed Windows EXE client that connects to QlikView
Server; an ActiveX component integrates other software. The Platform also allows for
an ActiveX plug-in for Microsoft Internet Explorer, AJAX zero footprint client and is
Java-client compatible with Mozilla based web browsers. An open interface enables
automated integration with QlikView.
Security
The data in a QlikView application is often confidential and then you need to control
the access to the data.
Authentication is any process by which you verify that someone is who they claim
they are. QlikView can either let the Windows operating system do the authentication
using the Windows log on, or prompt for a user ID and a password (different from the
Windows user ID and password) or use the QlikView serial number as a simple
authentication method.
Authorization is finding out if the person, once identified, is permitted to have the
resource. QlikView can let the Windows operating system do the authorization, by
allowing or disallowing a user, a group or a domain the access to the entire
application. If a finer granularity is needed, e.g. the user is only allowed to see
specific records or fields, the QlikView Publisher can be used to automate the creation
of a set of applications, i.e. one application per user group.
QlikView Application and User Interface
The QlikView interface is designed to provide perfect data overview in multiple
dimensions – simplifying analysis and reporting for everyone. It presents data
intuitively, allowing users to question anything and everything, from all types of
objects (e.g., list boxes, graphs, tables) and to any aspect of the underlying data –
regardless of where the data is located in a hierarchy.
Key Elements of the User Interface
Sheets & Tabs
In QlikView, analysis is made on sheets navigated through tabs (similar to Excel).
Each sheet can hold several sheet objects (list boxes, graphs, tables etc) to analyze the
underlying data model. All sheets are interconnected, meaning that selections made
on one sheet affect all other objects on all other sheets.
List Box
The basic building block of a QlikView application is the list box. A list box is a
movable, resizable object that presents the data taken from a single column of a table.
Rather than listing the duplicate values, only unique values are presented. If desired,
the number of occurrences of each distinct value can also be listed.
Multi Box
The multi box can hold several fields in a single object. Selections can be made
through dropdown lists by clicking or text search and select. The multi box displays
values only in a single selected mode.
Charts & Gauges
In QlikView, the results of a selection or query can be displayed in graph. Typically, a
graph holds one or more expressions which are recalculated each time a selection is
made. The result can be displayed as a bar chart, line chart, heat chart, grid chart,
scatter chart, or as speedometer or gauge. All graphs are fully interactive, which
means that you can make selections or queries directly by point-and-click or by
“painting” the area of interest.
Tables
Just as with graphical representation of data (in graphs), the result of an analysis can
be displayed in a table. QlikView provides the ability to display data in powerful
Pivot tables and Straight tables. These tables are fully interactive, which means that
you can make selections directly in the tables or by drop down selection in the graph
dimensions. Using a table box, QlikView can display any combination of fields in a
single object, regardless of what source database table they came from. This feature is
useful when providing listings of any kind. The table box can be sorted by any field or
combination.
Reports & Send to Excel
QlikView has an integrated report editor for ease-of-use of application specific
reports. The reports are dynamically updated as the user makes selections. Power
users can also easily create reports by a simple drag-and-drop procedure. All data
displayed in the GUI is ready to be exported at any time to Excel or other applications
by a simple click of a button.
User Navigation and Analysis
Point-and-Click Queries
Asking and answering questions is a simple matter of point and click. The user forms
a query in QlikView simply by clicking the mouse on a field value or other item of
interest. In a list box, the user clicks on one or more values of interest to select them.
QlikView immediately responds to the mouse click and updates all objects displayed
on the current sheet.
Multiple Sort Options
Since each field of data can be displayed in its own list box, it makes sense that you
would want to sort each list box independently of all others. When you are scrolling
through a list box, you want the values to appear in some sorted order appropriate to
that field. QlikView allows you to sort each list box independently and according to
multiple sort specifications.
One or more of the following algorithms can apply to each list box in either ascending
or descending order:
State: Selected and optional values can be sorted from the top or bottom of the
list box
Expression: Values are sorted by the result of evaluating any entered expression
Frequency: Values are sorted by frequency of occurrence
Numeric Value: Values are sorted according to their numeric value
Text: Values are sorted alphabetically
Load Order: Values are sorted according to the way they occurred in the original
source database
Powerful Searching
Fortunately, QlikView allows you to search through the list as simply and quickly as
typing on the keyboard. Select any list box, or open a multi box or drop down list, and
start typing. QlikView immediately begins searching through the list to find values
matching your criteria. Single character and multi-character wildcards are supported,
as well as greater than and less than symbols to enable searching for numeric and date
ranges.
Rapid Application Design and Deployment
Simple applications can be created within just a few minutes using QlikView’s
wizards. More complex applications integrating data from various sources and
displaying trend analysis charts and pivot tables may take a little bit longer.
The best way to understand how simple it is to create and use a QlikView application
is to step through the process involved:
Step 1: Locate the Data Source
The first step in creating an application in QlikView is to determine what data you
wish to load. While it is possible to include inline data in the QlikView load script,
application data will almost always come from an existing file, spreadsheet or
database. You may load data from a single source file or database, or you may load
and integrate data from many different sources at the same time.
The source file will typically be arranged with each record of the file containing one
record of data. However, QlikView can work with data in practically any format,
including generic databases, cross-tables, hierarchical databases, multi-dimensional
databases, etc. The first row may or may not contain field labels, although you can
always choose to set or change the labels in the wizard or in the script. If the data will
come from a text file, each file will typically be treated as a single table. When
working with spreadsheets, each tabbed sheet will be treated as a table.
Step 2: Create the Load Script
Once the source data has been determined, a load script must be created to copy the
data from the data source into QlikView’s associative database. Creating the load
script is simplified by the use of wizards that construct script statements for supported
file types.
Step 3: Execute the Load Script
After the load script is complete, the script must be executed either by using the
“Run” button in the Edit Script dialog, or by selecting “Reload,” available on both the
toolbar and the File menu. During the load process, QlikView examines each
statement in the load script and processes it in sequential order. At the completion of
the load script, a copy of all of the data referenced in the load script is loaded and
available in the QlikView application.
Step 4: Place Objects on a Sheet
In order to use the data in the QlikView application, you must place list boxes or other
objects on one or more sheets. The actual objects that should be used and how they
should be grouped into sheets depends on the specific application.
Step 5: Start Using the Application
As soon as the first object is created on a sheet, the application is available for use.
All objects are automatically associated together, and clicking in any object initiates a
query.
Step 6: Add More Sheets and Objects as Required
Finally, continue to add and arrange objects on sheets until the application achieves
the functionality desired. You may wish to add more customization to the load script
by taking advantage of QlikView’s “Expression Engine,” or you may wish to add
macros to automate certain actions.
Main features and benefits of QlikView:
 Use of an in-memory data model
 Allows instant, in memory, manipulation of massive datasets
 Does not require high cost hardware
 Automated data integration and a graphical analytical environment attractive for
customers
 Fast and powerful visualization capabilities
 Ease of use - end users require almost no training
 Highly scalable - near instant response time on very huge data volumes
 Fast implementation - customers are live in less than one month, and most in a week
 Flexible - allows unlimited dimensions and measures and can be modified in
seconds
 Integrated - all in one solution : dashboards, power analysis and simply reporting on
a single architecture
 Low cost - shorter implementations result in cost saving and fast return on
investment
 Risk free - available as a fully-functional free trial download
REPORTS USING
BUSINESS INTELLIGENCE TOOL
“ QLIKVIEW”
A REPORT ON BISLERI
Mineral Water under the name 'Bisleri' was first introduced in Mumbai in glass bottles
in two varieties - bubbly & still in 1965 by Bisleri Ltd., a company of Italian origin.
This company was started by Signor Felice Bisleri who first brought the idea of
selling bottled water in India.
Parle bought over Bisleri (India) Ltd. In 1969 & started bottling Mineral water in
glass bottles under the brand name 'Bisleri'. Later Parle switched over to PVC non-
returnable bottles & finally advanced to PET containers.
Since 1995 Mr. Ramesh J. Chauhan has started expanding Bisleri operations
substantially and the turn over has multiplied more than 20 times over a period of 10
years and the average growth rate has been around 40% over this period. Presently we
have 8 plants & 11 franchisees all over India. We have our presence covering the
entire span of India. In our future ventures we look to put up four more plants in 06-
07. We command a 60% market share of the organized market. Overwhelming
popularity of 'Bisleri' & the fact that we pioneered bottled water in India, has made us
synonymous to Mineral water & a household name. When you think of bottled water,
you think Bisleri.
Bisleri value their customers & therefore have developed 8 unique pack sizes to suit
the need of every individual. We are present in 250ml cups, 250ml bottles, 500ml, 1L,
1.5L, 2L which are the non-returnable packs & 5L, 20L which are the returnable
packs. Till date the Indian consumer has been offered Bisleri water, however in their
effort to bring to you something refreshingly new, they have introduced Bisleri
Natural Mountain Water - water brought to us from the foothills of the mountains
situated in Himachal Pradesh. Hence our product range now comprises of two
variants : Bisleri with added minerals & Bisleri Mountain Water.
It is their commitment to offer every Indian pure & clean drinking water. Bisleri
Water is put through multiple stages of purification, organized & finally packed for
consumption. . Rigorous R&D & stringent quality controls has made us a market
leader in the bottled water segment. Strict hygiene conditions are maintained in all
plants.
In their endeavour to maintain strict quality controls each unit purchases performs &
caps only from approved vendors. They produce our own bottles in-house. They have
recently procured the latest world class state of the art machineries that puts us at par
with International standards. This has not only helped us improve packaging quality
but has also reduced raw material wastage & doubled production capacity. You can be
rest assured that you are drinking safe & pure water when you consume Bisleri.
Bisleri is free of impurities & 100% safe. Enjoy the Sweet taste of Purity !
BISLERI PRODUCTS
Bisleri value their customers & therefore have developed 8 unique pack sizes to suit
the need of every individual. We are present in 250ml cups, 250ml bottles, 500ml, 1L,
1.5L, 2L which are the non-returnable packs & 5L, 20L which are the returnable
packs.
Bisleri with added Minerals
Bisleri Mineral Water contains minerals such as magnesium sulphate and potassium
bicarbonate which are essential minerals for healthy living. They not only maintain
the pH balance of the body but also help in keeping you fit and energetic at all times.
Bisleri Mountain Water
Bisleri Natural Mountain emanates from a natural spring, located in Uttaranchal and
Himachal nestled in the vast Shivalik Mountain ranges. Lauded as today's 'fountain of
youth', Bisleri Natural Mountain Water resonates with the energy and vibrancy
capable of taking you back to nature. Bisleri Natural Water is bottled in its two plants
in Uttaranchal and Himachal Pradesh and is available in six different pack sizes of
250ml, 500ml, 1 litre, 1.5 litre, 2 litre and 5 litres.
TECHNOLOGICAL ASPECTS
Here we are creating excel files as a database which will be linked to
qlikview(business intelligence tool) to improve your decision system support systems.
Excels sheet created are as follows:
1.ZONE SHEET
In the Zone excel sheet I have taken only 3 regions of NCR i.e Gurgaon, Delhi,
Noida. Each region is divided into 100 places in its circle. It also consists of the place
population its area and population growth. In short it provides full information about
the geography.
2.TRANSACTION SHEET
In Transaction sheet I have provided with 5 years of data (month, day) along with
salesman ID, Transaction ID, Retailer , Retailer ID and at the last Sales.
3.SALESMAN
In salesman sheet we provide the Salesman ID with Salesman Name Along with the
Distributor ID with whom salesman is related to.
4.RETAILER
Retailer excel sheet consist of Retailer ID of the Retailers their name, Zone & Region
to which they are associated to.
DESCRIPTION OF REPORT
Bisleri report is implemented on the business intelligence tool named “Qlikview”.
This report is basically for 3 regions in NCR i.e Gurgaon, Delhi & Noida. I have
taken around 100 regions in each zone with their population, population growth &
area of the region. I have designed the excel sheet in such a way that corresponding
region is displayed with their population, population growth & name of the zone, as it
was an small level report so nothing much was done with it.
Bisleri or any other mineral water supply company has Distributors in the particular
regions and a number of salesman executive works under them which are responsible
for delivery of the orders home to home. They keep track of the delivery in the
respective regions and are responsible for sale in their regions, once in a week i.e
Saturday they have to report to the office submitting their sales report which they
have done in whole week for e.g if the Bisleri office is in Delhi salesman executive
from Noida Gurgaon and Delhi reports to the office every Saturday along with their
daily sales report sheet, they are checked accordingly there of their performance and
sales they have done whole week, is they are giving benefit to the company is the
sales increasing in that regions.
DATA COLLECTION
Regarding data collection truly speaking no company want to give their data to any
outsider, but somehow I managed to get data from a sales executive named “Rajneesh
Mani Tripathi” who was working in Bisleri itself.
DESCRIPTION OF IMPLEMENTATION ON TOOL
 INTRODUCTION
This is the first page or home page of the tool, as we were making the report
on Bisleri so I have planned to put the background of Bisleri only.
 BACKGROUND
In the background section I have described about Bisleri its circulation and
about the report it also gives a small description about Qlikview and tabs
which redirects to Bisleri homepage.
 HOW TO WORK
This tab describes how the selection are made or how to work on this software
suppose an unknown person to computer or who is not in IT field can easily
study and use this tool making it user friendly.
 GEOGRAPHY
Geography tab gives the description about the area it shows Zone with their
corresponding regions, their area and population along with their growth is
shown graphically.
If we select Delhi zone with cannot place as its region it shows its population
in thousand with graph.
 RETAILER
Retailer tab describes the list of retailers with their Retailer ID’s in which
zone and region they are working and their sales shown graphically.
If the selection is made of Agarwal Restaurant of the Id 1097 its
corresponding sales is shown graphically with its zone and region.
 SALES
Sales tab shows Zone, Regions Years of sale day and month of sale with sales
shown graphically by corresponding salesman, bookmark with the retailer
shown to which particular sales man is associated.
 TABLES
There is a table showing sales of the different salesman.
CONCLUSION
This Bisleri project was a bench mark for me to design other reports, using the
concepts of Bisleri report it would be more comfortable to make other reports. This
report was a practice report I have learnt through this small report how to work on this
tool It was quite interesting toll to be worked upon.
I have learnt through this report how to draw graphs make different tables and list,
multi, table boxes. Through this report I have also got the idea of working of mineral
water supply companies how they work. Last but not the least working on this project
was a nice Experience which I would use to make other reports in near future.
QLIKVIEW REPORT ON
“LG”
OVERVIEW OF THE COMPANY
LG believe that technological innovation is the key to success in the marketplace.
Founded in 1958, led the way in bringing advanced digital products and applied
technologies to customers. With commitment to innovation and assertive global
business policies aim to become a worldwide leader in advanced digital technology.
The trajectory of LG Electronics, its growth and diversification, has always been
grounded in the company ethos of making our customers' lives ever better and easier-
happier, even-through increased functionality and fun.
Since its founding in 1958, LG Electronics has led the way to an ever-more advanced
digital era. Along the way, our constantly evolving technological expertise has lent
itself to many new products and applied technologies. Moving forward into the 21st
century, LG continues to on its path to becoming the finest global electronics
company, bar none.
LG Electronics is pursuing the vision of becoming a true global digital leader,
attracting customers worldwide through its innovative products and design. The
company’s goal is to rank among the top 3 consumer electronics and
telecommunications companies in the world by 2010. To achieve this, we have
embraced the idea of “Great Company, Great People,” recognizing that only great
people can create a great company.
Facts & Figures
 Established In : Jan 1997
 Managing Director : Mr. Moon B. Shin
 Corporate Office :Plot no51, Udyog Vihar, Surajpur Kasna Road, Greater
Noida (UP)
 Corporate Website : http://www.lgindia.com
 Number of Employees: 3000+
Business Areas & Main Products
Home Entertainment
Plasma Display Panels, LCD TV , Colour TVs, Audios, Home Theater System, DVD
Recorder/Player
Home Appliances
Refrigerators, Washing Machines, Microwaves, Vacuum Cleaners
AC
Split AC, Windows AC, Commercial AC’s
Business Solutions
LCD monitors, CRT monitors, Network Monitors, Graphic Monitor, Optical Storage
Devices, LED Projectors, NAS( Network attached Storage) and Digital signage
GSM
Color Screen GSM Handsets, Camera Phones, Touch Screen Phones, 3G Phones
PERFORMANCE AND GROWTH RATE
TECHNOLOGICAL ASPECTS
Here we are creating excel files as a database which will be linked to
qlikview(business intelligence tool) to improve your decision system support systems.
Excels sheet created are as follows:
1.ZONE EXCEL SHEET:
Zone excel sheet consist of name of 20 states with consist of 10 regions each along
with zone id it also comprises of the population of different regions.
2.TRANSACTION EXCEL SHEET:
Transaction excel sheet consist of the fields as name of all the Retailers in different
regions in all 20 states and their sales of respective years from 2005-2009 with a total
of all the sales too.
3. RETAILERS EXCEL SHEET:
Retailer excel sheet consist of Retailer ID and the name of respective Retailers in
different regions across the country.
4.PROFIT & LOSS EXCEL SHEET:
In this sheet had given a profit and loss id so as the star schema should be made
easily. This sheet comprises of Profit & Loss ID, Quantity supplied, inventories of 5
years (2005-2009) and I have decided a threshold so that we would be able to
determine the profit or loss made by the different retailers.
5.PRODUCT EXCEL SHEET:
It comprises of Product ID name of the products with their specifications, category,
series & model number
6.OFFERS EXCEL SHEET:
There is a offer sheet as well which consist of the model number of different products
their original price, festival on which the discount is given by the company, discount
rate with its new price after the deduction.
7.MANAGERS EXCEL SHEET:
Manager excel sheet consist of Manager ID, name of the managers and their contact
number so that they can be called by the company when needed.
8.INVENTORY EXCEL SHEET:
Inventory excel sheet consist of Zone ID, Retailers ID, Managers ID, Product ID,
Profit & Loss ID, Quantity supplied per year by the retailers, Threshold, and
inventories retailers have from 2005 - 2009
9. LG PICTURES OR GRAPHICS EXCEL SHEET:
In LG pictures excel sheet I have given the model number of the products and the path
to graphics folder in which pictures of different items are saved.
DESCRIPTION OF REPORT
LG report implemented on qlikview is on a wide scale, this report is nearly for all
over India in this report I have taken consist of 20 states each states consist of 10
regions which thereby consist of 5-6 retailers, each retailers have 5 years of sales data
2005 – 2009 their profit and loss is also determined on the basis of sales made by
them in corresponding years.
Nearly all the products are shown with their series model number and there pictures is
also shown. There is also a Manager which is appointed to each region so that if
needed would contact directly to the regional Manager so that one can know about the
sales and problems going out there in their regions, their contact number is also given
along with their Manager ID
There are also discounts given on different offers on different festivals and new year,
original price with their discounted rate and discounted price is also shown along with
their current selection.
DESCRIPTION OF IMPLEMENTATION ON TOOL
 STARTING PAGE
Starting page shows LG logo with a get started tab which directs you to next
page,
 BACKGROUND TAB:
Background tab gives the description about the application or the report made
for LG specifications, useful links which redirects you to qlikview community
LG homepage and to learn more & there are some points described about
qlikview
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)
Training Report (2)

Weitere ähnliche Inhalte

Was ist angesagt?

Imeptech : Group brochure
Imeptech : Group brochureImeptech : Group brochure
Imeptech : Group brochureSatya Patri
 
Negotiating Unstructured Data to Accelerate Intelligent Automation
Negotiating Unstructured Data to Accelerate Intelligent AutomationNegotiating Unstructured Data to Accelerate Intelligent Automation
Negotiating Unstructured Data to Accelerate Intelligent AutomationMindfields Global
 
Mindtree: Shift to Continuous Delivery
Mindtree: Shift to Continuous DeliveryMindtree: Shift to Continuous Delivery
Mindtree: Shift to Continuous DeliveryMindtree Ltd.
 
Reed, Christopher Resume 2016 Final
Reed, Christopher Resume 2016 FinalReed, Christopher Resume 2016 Final
Reed, Christopher Resume 2016 FinalChristopher Reed
 
Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...
Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...
Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...Jade Global
 
1.Innova Zurich
1.Innova Zurich1.Innova Zurich
1.Innova ZurichErmando
 
Lessons learnt on setting up and scaling an automation CoE
Lessons learnt on setting up and scaling an automation CoELessons learnt on setting up and scaling an automation CoE
Lessons learnt on setting up and scaling an automation CoEMindfields Global
 
Innovate xerox digitek-webinar-ppt
Innovate xerox digitek-webinar-pptInnovate xerox digitek-webinar-ppt
Innovate xerox digitek-webinar-pptInTwo Comm
 
Bilytica - Corporate Introduction - Jan 2015
Bilytica - Corporate Introduction - Jan 2015Bilytica - Corporate Introduction - Jan 2015
Bilytica - Corporate Introduction - Jan 2015Hannah Naser
 
Nasscom Chennai Emerge Forum Key People Issues By Sriram Of Avp Hr , Cogn...
Nasscom Chennai Emerge Forum   Key People Issues   By Sriram Of Avp Hr , Cogn...Nasscom Chennai Emerge Forum   Key People Issues   By Sriram Of Avp Hr , Cogn...
Nasscom Chennai Emerge Forum Key People Issues By Sriram Of Avp Hr , Cogn...Nasscom Chennai
 
Building it infrastructure framework that drives innovation and business perf...
Building it infrastructure framework that drives innovation and business perf...Building it infrastructure framework that drives innovation and business perf...
Building it infrastructure framework that drives innovation and business perf...GlobalStep
 

Was ist angesagt? (16)

Imeptech : Group brochure
Imeptech : Group brochureImeptech : Group brochure
Imeptech : Group brochure
 
Vibrant media & Solutions
Vibrant media & SolutionsVibrant media & Solutions
Vibrant media & Solutions
 
Negotiating Unstructured Data to Accelerate Intelligent Automation
Negotiating Unstructured Data to Accelerate Intelligent AutomationNegotiating Unstructured Data to Accelerate Intelligent Automation
Negotiating Unstructured Data to Accelerate Intelligent Automation
 
Mindtree: Shift to Continuous Delivery
Mindtree: Shift to Continuous DeliveryMindtree: Shift to Continuous Delivery
Mindtree: Shift to Continuous Delivery
 
Reed, Christopher Resume 2016 Final
Reed, Christopher Resume 2016 FinalReed, Christopher Resume 2016 Final
Reed, Christopher Resume 2016 Final
 
Cognizant details
Cognizant detailsCognizant details
Cognizant details
 
Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...
Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...
Jade Global Supports QuickStart Salesforce Marketing Cloud for Auto Glass Rep...
 
1.Innova Zurich
1.Innova Zurich1.Innova Zurich
1.Innova Zurich
 
Lessons learnt on setting up and scaling an automation CoE
Lessons learnt on setting up and scaling an automation CoELessons learnt on setting up and scaling an automation CoE
Lessons learnt on setting up and scaling an automation CoE
 
Demotic ppt
Demotic pptDemotic ppt
Demotic ppt
 
Innovate xerox digitek-webinar-ppt
Innovate xerox digitek-webinar-pptInnovate xerox digitek-webinar-ppt
Innovate xerox digitek-webinar-ppt
 
Project Ford
Project FordProject Ford
Project Ford
 
Bilytica - Corporate Introduction - Jan 2015
Bilytica - Corporate Introduction - Jan 2015Bilytica - Corporate Introduction - Jan 2015
Bilytica - Corporate Introduction - Jan 2015
 
Nasscom Chennai Emerge Forum Key People Issues By Sriram Of Avp Hr , Cogn...
Nasscom Chennai Emerge Forum   Key People Issues   By Sriram Of Avp Hr , Cogn...Nasscom Chennai Emerge Forum   Key People Issues   By Sriram Of Avp Hr , Cogn...
Nasscom Chennai Emerge Forum Key People Issues By Sriram Of Avp Hr , Cogn...
 
Building it infrastructure framework that drives innovation and business perf...
Building it infrastructure framework that drives innovation and business perf...Building it infrastructure framework that drives innovation and business perf...
Building it infrastructure framework that drives innovation and business perf...
 
Ejyle Corporate Profile
Ejyle Corporate ProfileEjyle Corporate Profile
Ejyle Corporate Profile
 

Andere mochten auch

Wolfram|alpha 5cos908sin270
Wolfram|alpha 5cos908sin270Wolfram|alpha 5cos908sin270
Wolfram|alpha 5cos908sin270A Jorge Garcia
 
Car Car Tips For CA Drivers [Infographic]
Car Car Tips For CA Drivers [Infographic]Car Car Tips For CA Drivers [Infographic]
Car Car Tips For CA Drivers [Infographic]INFINITI Mission Viejo
 
Fatal Accidente II
Fatal Accidente IIFatal Accidente II
Fatal Accidente IIDavidSP1996
 
Test
TestTest
Testjes52
 
CV of Md. Golam Robbani_ RUET -IPE
CV of Md. Golam Robbani_ RUET -IPECV of Md. Golam Robbani_ RUET -IPE
CV of Md. Golam Robbani_ RUET -IPEGolam Robbani
 
Legal Rights of Breastfeeding Mothers
Legal Rights of Breastfeeding MothersLegal Rights of Breastfeeding Mothers
Legal Rights of Breastfeeding MothersJenny
 
Infraestructura de los puertos
Infraestructura de los puertosInfraestructura de los puertos
Infraestructura de los puertosJorge Monserratte
 
Map history-networks-shorter
Map history-networks-shorterMap history-networks-shorter
Map history-networks-shorterMason Porter
 
Manufacturing with Internet of Things
Manufacturing with Internet of ThingsManufacturing with Internet of Things
Manufacturing with Internet of ThingsConnected Futures
 

Andere mochten auch (13)

Wolfram|alpha 5cos908sin270
Wolfram|alpha 5cos908sin270Wolfram|alpha 5cos908sin270
Wolfram|alpha 5cos908sin270
 
Car Car Tips For CA Drivers [Infographic]
Car Car Tips For CA Drivers [Infographic]Car Car Tips For CA Drivers [Infographic]
Car Car Tips For CA Drivers [Infographic]
 
Review
ReviewReview
Review
 
Fatal Accidente II
Fatal Accidente IIFatal Accidente II
Fatal Accidente II
 
Adjectif
AdjectifAdjectif
Adjectif
 
Test
TestTest
Test
 
809022
809022809022
809022
 
Da vinci pp final
Da vinci pp finalDa vinci pp final
Da vinci pp final
 
CV of Md. Golam Robbani_ RUET -IPE
CV of Md. Golam Robbani_ RUET -IPECV of Md. Golam Robbani_ RUET -IPE
CV of Md. Golam Robbani_ RUET -IPE
 
Legal Rights of Breastfeeding Mothers
Legal Rights of Breastfeeding MothersLegal Rights of Breastfeeding Mothers
Legal Rights of Breastfeeding Mothers
 
Infraestructura de los puertos
Infraestructura de los puertosInfraestructura de los puertos
Infraestructura de los puertos
 
Map history-networks-shorter
Map history-networks-shorterMap history-networks-shorter
Map history-networks-shorter
 
Manufacturing with Internet of Things
Manufacturing with Internet of ThingsManufacturing with Internet of Things
Manufacturing with Internet of Things
 

Ähnlich wie Training Report (2)

Nityo Infotech - Company Profile 1.1
Nityo Infotech - Company Profile 1.1Nityo Infotech - Company Profile 1.1
Nityo Infotech - Company Profile 1.1Muhambiga Selvaraja
 
Integral Fusion
Integral FusionIntegral Fusion
Integral Fusionovaise
 
Incedo corporate brochure brochure
Incedo corporate brochure brochureIncedo corporate brochure brochure
Incedo corporate brochure brochureIncedo
 
Best Software Development Company |Salesforce Consulting Services in Singapor...
Best Software Development Company |Salesforce Consulting Services in Singapor...Best Software Development Company |Salesforce Consulting Services in Singapor...
Best Software Development Company |Salesforce Consulting Services in Singapor...InfoDrive Solutions
 
JDSProfile0221.pdf
JDSProfile0221.pdfJDSProfile0221.pdf
JDSProfile0221.pdfdjkpandian
 
Remote Access Storyboard 2.0
Remote  Access  Storyboard 2.0Remote  Access  Storyboard 2.0
Remote Access Storyboard 2.0Mohammed Al-Ragom
 
ScaleFocus Digital Transformation in Finance Expertise
ScaleFocus Digital Transformation in Finance ExpertiseScaleFocus Digital Transformation in Finance Expertise
ScaleFocus Digital Transformation in Finance ExpertiseScaleFocus
 
INSIGHT Business Solutions
INSIGHT Business Solutions INSIGHT Business Solutions
INSIGHT Business Solutions Mitri J. Muna
 
Indrivo Company Presentation
Indrivo Company PresentationIndrivo Company Presentation
Indrivo Company PresentationEugen Lupusor
 
Pinnacle - ERP Software Dubai, IT Support and Infrastructure
Pinnacle - ERP Software Dubai, IT Support and InfrastructurePinnacle - ERP Software Dubai, IT Support and Infrastructure
Pinnacle - ERP Software Dubai, IT Support and InfrastructurePinnacle Computer Systems
 
Tips hosting limited corporate profile
Tips hosting limited corporate profileTips hosting limited corporate profile
Tips hosting limited corporate profileTips Hosting Ltd.
 
Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...HashRoot Technologies
 

Ähnlich wie Training Report (2) (20)

Bob Tech Solutions Pvt Ltd Bangalore Reviews
Bob Tech Solutions Pvt Ltd Bangalore ReviewsBob Tech Solutions Pvt Ltd Bangalore Reviews
Bob Tech Solutions Pvt Ltd Bangalore Reviews
 
Bob Tech Solutions Company Reviews
Bob Tech Solutions Company ReviewsBob Tech Solutions Company Reviews
Bob Tech Solutions Company Reviews
 
Nityo Infotech - Company Profile 1.1
Nityo Infotech - Company Profile 1.1Nityo Infotech - Company Profile 1.1
Nityo Infotech - Company Profile 1.1
 
Integral Fusion
Integral FusionIntegral Fusion
Integral Fusion
 
Integral Fusion
Integral FusionIntegral Fusion
Integral Fusion
 
Integral Fusion
Integral FusionIntegral Fusion
Integral Fusion
 
Top Brochure Designers in Hyderabad
Top Brochure Designers in HyderabadTop Brochure Designers in Hyderabad
Top Brochure Designers in Hyderabad
 
Incedo corporate brochure brochure
Incedo corporate brochure brochureIncedo corporate brochure brochure
Incedo corporate brochure brochure
 
Best Software Development Company |Salesforce Consulting Services in Singapor...
Best Software Development Company |Salesforce Consulting Services in Singapor...Best Software Development Company |Salesforce Consulting Services in Singapor...
Best Software Development Company |Salesforce Consulting Services in Singapor...
 
JDSProfile0221.pdf
JDSProfile0221.pdfJDSProfile0221.pdf
JDSProfile0221.pdf
 
ARITA Brochure
ARITA Brochure  ARITA Brochure
ARITA Brochure
 
Remote Access Storyboard 2.0
Remote  Access  Storyboard 2.0Remote  Access  Storyboard 2.0
Remote Access Storyboard 2.0
 
ScaleFocus Digital Transformation in Finance Expertise
ScaleFocus Digital Transformation in Finance ExpertiseScaleFocus Digital Transformation in Finance Expertise
ScaleFocus Digital Transformation in Finance Expertise
 
INSIGHT Business Solutions
INSIGHT Business Solutions INSIGHT Business Solutions
INSIGHT Business Solutions
 
Indrivo Company Presentation
Indrivo Company PresentationIndrivo Company Presentation
Indrivo Company Presentation
 
Pinnacle - ERP Software Dubai, IT Support and Infrastructure
Pinnacle - ERP Software Dubai, IT Support and InfrastructurePinnacle - ERP Software Dubai, IT Support and Infrastructure
Pinnacle - ERP Software Dubai, IT Support and Infrastructure
 
K soft corporate
K soft corporateK soft corporate
K soft corporate
 
Tips hosting limited corporate profile
Tips hosting limited corporate profileTips hosting limited corporate profile
Tips hosting limited corporate profile
 
Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...
 
Overview
OverviewOverview
Overview
 

Training Report (2)

  • 1. 1 WIPRO – APPLYING THOUGHT 2010 Summer Training Report Data Warehousing and Business Intelligence Using Qlikview Rahul Dubey H T T P : / / W W W . W I P R O . C O M
  • 2. DATA WAREHOUSING AND BUSINESS INTELLIGENCE USING QLIKVIEW A PROJECT TRAINING REPORT Submitted By RAHUL DUBEY In partial fulfilment for the award of the degree of Bachelor of Technology In COMPUTER SCIENCE AND ENGINEERING
  • 3. ACKNOWLEDGEMENT I, RAHUL DUBEY am grateful to Wipro InfoTech for providing me the opportunity to work them and complete my 6 weeks of project training as a part of my B-Tech (Comp Sc.) curriculum. Also I would like to express my deep sense of gratitude to my project training guide Mr. Nitish Vij, Solution Architect (DWH Practice) for his invaluable guidance and suggestion during my training tenure. His experience has been immense help as were his efforts in making us understand all the aspect of the project in a small frame of time and showing us the right way. It has been a great learning experience for me as got a chance to apply my knowledge in a practical domain. This training and experience has not only enriched me with technical knowledge but has also infested the maturity of thought and vision, the attributes required to be successful software professional. Last but not the least I would like sincerely thank Mr Ivor Egbert, Sr. Executive(TED) For offering us this training and also my team mates fir their support and assistance throughout this period.
  • 4. DECLARATION BY THE CANDIDATE I hereby declare that the work which is being presented in the dissertation entitled “Data Warehouse and BI using Qlikview” in the partial fulfilment of the requirement for the award of the degree of B-Tech in Computer Science and Engineering, Jaypee University of Engg. & Tech is an authenticated record of my work carried out during the period from 31st May 2010 till 9th July under the supervision of Mr. Nitish Vij, Wipro InfoTech. Place: Wipro InfoTech, Gurgaon Signature of the Candidate Date :
  • 5. BONAFIDE CERTIFICATE This is to certify that the above statements made by the candidate are true to the best of our knowledge and belief. Place: Wipro InfoTech, Gurgaon Signature Date:
  • 6. ABOUT THE COMPANY Wipro InfoTech is the leading strategic IT partner for the companies across India, The Middle East and Asia-Pacific– offering integrated IT solutions. It plans, deploys, sustains and maintains your IT lifecycle through our total outsourcing, consulting service, business solutions and professional services. Wipro InfoTech helps you drive momentum in your organisation – no matter what domain you are in. Backed by the strong quality process and rich experience managing global clients across various business verticals, it aligns IT strategies to your business goals. Along with their best of breed technology partners, Wipro InfoTech also helps you with hardware and IT infrastructure needs. Wipro InfoTech is a part of USD 5 billion Limited (NYSE:WIT) with a market captitalization of USD 24 billion. The various accreditations that we have achieved for every service we offer reflect our commitment towards quality assurance. Wipro InfoTech was the first global software company to achieve Level 5 SEI-CMM, the world first IT company to achieve Six Sigma, as well as the world’s first company to attain Level 5 PCMM. Currently, their presence extends to 9 regional offices in India besides offices in the KSA, UAE, Taiwan, Malaysia, Singapore, Australia, and other regions in Asia-Pacific and the Middle East.
  • 7. THE SERVICES OFFERED BY THE COMPANY In today’s world where IT infrastructure plays a key role in determining the success of your business organisation, Wipro InfoTech helps to derive maximum value from the success of your business organisation, Wipro InfoTech helps to derive maximum value from the IT investments. They offer their clients the full array of IT lifecycle services. From technology optimisation to mitigating risks, there is a constant demand to evaluate, deploy and manage flexible responsive and economical solution. Outsourcing non-core operations can help to transform the business into a leaner and smarter organisation with greater adaptability to changing economic and business trends. In a maturing outsourcing market where both clients and vendors are becoming increasingly adept at understanding the fundamentals needed to develop a lasting relationship, Wipro InfoTech offers a partnership that goes beyond merely providing a solution. Spurred on by the goal of creating new business process and innovative models to help the customers gain new level of efficiency, differentiation, and flexibility, Wipro InfoTech offers a Total Outsourcing Services(TOS). This powerful service offering ensures dynamic solutions that offer total process visibility resulting in pre-emptive solving of problems or issues even before they can manifest and affect the business performance. Their solutions eschew the immature model of offering ad hoc solutions that dwell on pricing, labour arbitrage and granular level contracts within tower group solutions that tend towards being strategic corporate initiative. This ensures delivery of results against services levels, larger scope relationships that enable services providers to respond quickly and flexibly transfer of day-to-day responsibilities. At Wipro InfoTech they also offer consulting services as part of the advisory expertise across various domains. Their various consulting practices enable you to achieve execution excellence to help drive your business momentum despite challenges arising from globalisation and the dynamics of customer loyalty. Optimising IT resources through their services, they build a strong base to empower your technology operations. This includes identifying pain areas, deploying the right resources to upgrade or solve them, implementing strategic business and IT tools, as well as managing the project lifecycle. All of these achieved through their focussed their focused quality that complies with ISO 9000, Six Sigma, SEI CMM & PCMM level 5 standards and processes. With over two decades of experience Wipro InfoTech has a commanding lead in leveraging critical IT services for clients in India, the Middle East and Asia-pacific. Their services are further backed with strategic partnership with some of the top global technology corporations – Oracle, Microsoft, SAP, IBM among others. Their service offerings include:
  • 8.  Consulting : Strategic Cost Reduction, Business Transformation, Security Governance, Strategy, E-Governance.  Business Solutions: Enterprise Applications, Solutions for Fast Emerging Businesses, Application Development and Portals, Applications Maintenance, Third Party Testing, Data Warehouse / Business Intelligence, Point Solutions.  Professional Services: System Integration, Availability Services, Managed Services.  Total Outsourcing. DATA WAREHOUSING AND BI PRACTICES AT WIPRO INFOTECH Data warehouses are an organization’s corporate memory , containing critical information that is the basis for their business management and operations. Organizations then require their data warehouse to be scalable, secure and stable with the ability to optimize storage and retrieval of complex sets of data. Business intelligence systems transform an organization’s ability to convert raw data into information that makes online multidimensional transaction and analytical processing possible. Data warehouse (DW) and business intelligence (BI) operations together enable organizations to base crucial business decisions on actual data analyses. At Wipro InfoTech, the DW/BI offerings provide an organization with direct access to information analytics that will help them respond quickly to emergent business opportunities and rapidly changing market trends. With India’s largest dedicated DW/BI team of 2050+ consultants who bring 4350+ person years of experience, Wipro InfoTech DW/BI solutions framework can be customized to address the domain specific requirements. They have extensive experience in the finance and insurance, retail, manufacturing, energy and utilities, telecom, healthcare and government sectors. Such varied domain experience, along with the alliance with global vendors in the field and cross- technology competency drive the BI operations from a departmental to an enterprise- wide initiative. As an end-to-end service provider, they consult, architect, integrate and manager’s customer’s DW/BI operations to ensure that they stay ahead in today’s competitive business environment. Wipro’s DW/BI solutions framework includes:  DW/BI consulting Their consultants work with you to define your specific DW/BI requirements through a comprehensive examination of your focus areas. We work to derive a solution that factors in your investment plans and balances cost-efficiency with required business benefits. The key modules are:  Preparing business cases for BI/DW  Business & information analysis  Preparing BI & DW solution framework
  • 9.  Arriving at roadmap for implementation  BI & DW project management  DW/BI architecture They formulate a design of the proposed DW/BI solutions through aligning requirements analyses with your goals and existing infrastructure. Our key offerings include:  Data acquisition from different legacy systems on various platforms including Mainframes, AS/400, Unix, Digital OLTP platforms as DB2, IMS. IDMS, VSAM, Oracle legacy applications, Sybase, Informix and ERP packages as PeopleSoft etc.  Data modelling  ETL architecture  Metadata architecture and management  Security architecture  DW/BI integration As a part of integration phase, Wipro InfoTech DW/BI team designs and builds physical databases ensuring that appropriate disaster recovery plans are in place. Our data mining implementation includes data cleaning, ETL, visualization and enabling data access. The data mining tool selection and creation of reporting environments are domain-specific and fulfil operational requirements such as customer profiling, target marketing, compaign effectiveness analysis and fraud detection and management. The reporting environment that we have developed and deployed are feature-rich and make multi-dimensional analyses possible across various types of data warehouse.  DW/BI management To ensure consistent performance as data warehouse scale in volume and used to ensure maximum benefits, our DW/BI management offering includes:  Data warehouse administration, maintenance and support activities  Capacity planning  Data warehousing audit  Performance tuning
  • 10. ABSTRACT Organizations are all looking to increase revenue, lower expenses, and improve profitability by improving efficiency and effectiveness in their business process and overall performance. Business Intelligence (BI) software vendors claim that they have the technology that can provide this improvement. Vendors concentrate on selling products or tools that can be used to build these solutions but rarely concentrate on the problems the customer is trying to solve. As new requirements are realized, new vendors are brought in, new tools are purchased and new consultants arrive to make it work. Eventually, the corporate BI initiative becomes a collection of disjointed point solutions using a combination of expensive monolithic commercial applications and difficult to maintain custom code .Using this current approach, each tool is designed problems must be broken into pieces and segregated into task like Reporting, Analysis, Data Mining, Workflow etc. There is no application responsible for initiating, managing, verifying or coordinating results. People and procedure are called upon to make up for these deficiencies. This report describes the QlikView Business intelligence Platforms: QlikView, is a suite of powerful and rapidly deployable business intelligence software that enables enterprises and their management to effectively and proactively monitor, manage and optimize their business. QlikViewTM lets companies analyze all their data quickly and efficiently. QlikView eliminates the need for data warehouses, data marts, and OLAP cubes; instead it gives users rapid access to data from multiple sources in an intuitive, dashboard-style interface. With QlikView, companies can turn data into information, and information into better decisions.
  • 11. INTRODUCTION Business have begun to exploit the burgeoning data online to make better decisions about their activities. Many of their activities are rather complicated, however and certain types of information cannot be extracted using SQL. DATABASE APPLICATIONS Database applications can be broadly classified into two broad categories. 1. TRANSACTION PROCESSING SYSTEM: These are the systems that recoed information about transaction, such as product sales information for companies. 2.DECISION SUPPORT SYSTEMS: They aim to get high level information out of the detailed information stored in transaction processing systems and to use the high level information to make a variety of decisions. ISSUES INVOLVED The storage and retrieval of data for decision support raises several issues:  Although many decision-support queries can be written in SQL, others either cannot be expressed in SQL or cannot be easily expressed in SQL.  Database query languages are not suited to the performance of detailed statistical analyses of data.  Large companies have diverse sources of data that they need to use for making business decisions. The sources may store the data in different schemas. For performance reasons, the data sources usually will not permit other parts of the company to retrieve data on demand.
  • 12. INTRODUCTION TO DATA WAREHOUSING A data warehouse is a repository of information gathered from multiple sources, stored under a unified schema, at a single site. Once gathered, the data are stored for a long time, permitting access to historical data. Thus data warehouse provide the user single consolidated interface to data, making decision support queries easy to write. Moreover, by accessing information for decision support from data warehouse, the decision maker ensures that online transaction is not affected by decision support workload. According to Inmon, Data warehouse is a powerful data base model that significantly enhances user’s ability to quickly analyze large, multidimensional data sets. It cleanses and organizes data to allow users to make business decisions based on facts. Hence, the data in data warehouse must have strong analytical characteristics. Creating data to be analytical requires that it must be. 1. Subject oriented. 2. Integrated. 3. Time referenced. 4. Non-volatile. 1. Subject oriented – Data warehouse group data by subject rather by activity. In contrast transaction system allow data by activities – payroll processing, shipping products, loan processing. Data organized around activities cannot answer the questions like “ How many salaried employees are there who have a tax deduction of Rs.”X” from their account across a branches of company”. This would require heavy searching and aggregation of employee and account records of all branches. In data warehouse information are subject oriented like – employee, account, sales etc. 2. Integrated data – It refers to de-duplicating information and merging it from many sources into one consistent location. 3. Time referenced – The most important characteristics of an analytical data is its prior state of being. It refers to time valued characteristic. For e.g. The user may ask that “what were the total sales of the product “A” for the past three years on New year day across region “Y” ? ”.So we should know the sales figures of the product on New year’s Day in all the branches of that particular region. 4. Non-volatile – Data being non-volatile help users to dig deep in the history and to arrive specific decision making based on facts. Example: In order to store data, over the years, many application designers in each branch have made their individual decisions as to how an application and database should be built. So source systems will be different in naming conventions, variable measurements, encoding structures, and physical attributes of data. Consider a bank that has got several branches in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The following example explains how the data is integrated from source systems to target systems.
  • 13. Example of Source Data Syste m Name Attribute Name Column Name Datatype Values Source Syste m 1 Customer Applicatio n Date CUSTOMER_APPLICATION_DA TE NUMERIC(8, 0) 11012005 Source Syste m 2 Customer Applicatio n Date CUST_APPLICATION_DATE DATE 11012005 Source Syste m 3 Applicatio n Date APPLICATION_DATE DATE 01NOV200 5 In the aforementioned example, attribute name, column name, datatype and values are entirely different from one source system to another. This inconsistency in data can be avoided by integrating the data into a data warehouse with good standards. Example of Target Data(Data Warehouse) Target System Attribute Name Column Name Datatype Values Record #1 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005 Record #2 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005 Record #3 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005 In the above example of target data, attribute names, column names, and data types are consistent throughout the target system. This is how data from various source systems is integrated and accurately stored into the data warehouse. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.
  • 14. Data warehousing arises in an organisation's need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation. The practical reality of most organisations is that their data infrastructure is made up by a collection of heterogeneous systems. For example, an organisation might have one system that handles customer-relationship, a system that handles employees, systems that handle sales data or production data, yet another system for finance and budgeting data, etc. In practice, these systems are often poorly or not at all integrated and simple questions like: "How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills" can be very hard to answer, even though the information is available "somewhere" in the different data systems. Yet another problem might be that the organisation is, internally, in disagreement about which data is correct. For example, the sales department might have one view of its costs, while the finance department has another view of that cost. In such cases the organisation can spend unlimited time discussing who's got the correct view of the data. It is partly the purpose of data warehousing to bridge such problems. It is important to note that in data warehousing the source data systems are considered as given: Even though the data source system might have been made in such a manner that it's difficult to extract integrated information, the "data warehousing answer" is not to redesign the data source systems but rather to make the data appear consistent, integrated and consolidated despite the problems in the underlying source systems. Data warehousing achieves this by employing different data warehousing techniques, creating one or more new data repositories (i.e. the data warehouse) whose data model(s) support the needed reporting and analysis. There are three types of data warehouses: 1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise. 2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real enterprise data warehouse, data is refreshed in near real time and used for routine business activity. One of the typical applications of the ODS (Operational Data Store) is to hold the recent data before migration to the Data Warehouse. Typically, the ODS are not conceptually equivalent to the Data Warehouse albeit do store the data that have a deeper level of the history than that of the OLTP data. 3. Data Mart - Data mart is a subset of data warehouse and it supports a particular region, business unit or business function. It basically describes an approach in which each individual department implement its own management information system often based on a relational database or a smaller multidimensional or a spread sheet like system. However these systems once in production are difficult to extend by the use of other department because there are a inherit design limitation in building single site of business needs and second is that its expansion may lead to disruption of existing users
  • 15. COMPONENTS OF DATA WAREHOUSE DWH architecture is a way of representing data, communication processing, presentation that exist for end-user computing within the enterprise. The architecture of a typical data warehouse consist of the parts performing the following function.  Gathering of data.  Storage of data.  Querying and data. Architecture, in the context of an organization’s data warehouse efforts, is a conceptualization of how data warehouse is built. There is no right or wrong architecture, rather multiple architecture is used to support various environments and situations. The worthiness of the architecture can be judged in how conceptualization sides in the building, maintenance, and usage of data warehouse. One possible simple conceptualization of data warehouse architecture consist of following interconnected parts.  Source system – The goal of data warehousing is to free the information locked up in the operational systems and to combine it with information form the other, often external sources of data. Increasingly, large organization are acquiring additional data from outside databases. It is very essential to identify the right data sources and determine an efficient process to collect facts.  Source data transport layer – It largely contributes to data trafficking, it particularly represents the tools and process involved in transporting data from the source systems to the enterprise warehouse system. Since the data is huge ,the interfaces with the source system have to be robust and scalable enough to manage secured data transmission.  Data quality Control and Data profiling layer- Often, data quality causes the most concern in any data warehousing solution. Incomplete and inaccurate data will jeopardize the success of the data warehouse. It is very essential to measure the quality of data of the source and take corrective action even before the information is processed and loaded into the target warehouse.  Metadata Management Layer – Metadata is the information about data within the enterprise, record description in COBOL program are metadata or create statements in SQL. So, in order to be fully functional warehouse , it is necessary to have a variety of metadata available.  Data integration layer – It is involved in scheduling the various tasks that must be accomplished to integrate data acquired from various sources. A lot formatting and cleansing activities happen in this layer so that data is consistent.  Data processing layer- The warehouse is where the dimensionally modelled data resides. In some cases one can think of the warehouse simply as a
  • 16. transforms view of the operational data, but the models for the analytical purpose.  End user Reporting layer - Success of a data warehouse implementation largely depends upon ease of access to valuable information, In that sense. The end user reporting layer is very critical component. DATA WAREHOUSE SCHEMAS Data warehouse typically have schemas that are designed for data analysis using tools such as OLAP tools. Thus the data are usually multidimensional data with two types of attributes.  Measure attributes – Given a relation used for data analysis some of the attributes are identified as measure attributes since they measure some value. For e.g. The attribute “NUMBER” of the sales relation is a measure attribute because it measures number of unit sold.  Dimension attribute – Some or all the attributes of the relation are identified as dimension attributes , since they define the dimension on which , measure attributes are viewed. Table containing multidimensional data are called fact tables and are usually very large .To minimize the storage requirement dimension attributes are usually short identifiers that are foreign key into other tables called dimension tables.
  • 17. TYPES OF SCHEMA  STAR SCHEMA The star schema (sometimes referenced as star join schema) is the simplest style of data warehouse schema. The star schema consists of a few fact tables (possibly only one, justifying the name) referencing any number of dimension tables. The star schema is considered an important special case of the snowflake schema.
  • 18.  SNOWFLAKE SCHEMA A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema's dimensions are de-normalized with each dimension being represented by a single table. When the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and where child tables have multiple parent tables ("forks in the road"), a complex snowflake shape starts to emerge. The "snow-flaking" effect only affects the dimension tables and not the fact tables.  CONSTELLATION SCHEMA For each star schema or snowflake schema it is possible to construct a fact constellation schema. This schema is more complex than star or snowflake architecture, which is because it contains multiple fact tables. This allows dimension tables to be shared amongst many fact tables. That solution is very flexible, however it may be hard to manage and support. The main disadvantage of the fact constellation schema is a more complicated design because many variants of aggregation must be considered.
  • 19. BENEFITS OF DATA WAREHOUSE Some of the benefits that a data warehouse provides are as follows:  A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.  Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.  Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.  Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.  Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.  Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.
  • 20. DISADVANTAGES OF DATA WAREHOUSE There are also disadvantages to using a data warehouse. Some of them are:  Data warehouses are not the optimal environment for unstructured data.  Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.  Over their life, data warehouses can have high costs.  Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organisation.  There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems. ETL TOOL IN DATA WAREHOUSE Extract, transform, and load (ETL) is a process in database usage and especially in data warehousing that involves:  Extracting data from outside sources  Transforming it to fit operational needs (which can include quality levels)  Loading it into the end target (database or data warehouse). Extract The first part of an ETL process involves extracting the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as Information Management System (IMS) or other data structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM), or even fetching from outside sources such as through web spidering or screen-scraping. Extraction converts the data into a format for transformation processing. Transform The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. Some data sources will require very little or even no manipulation of data. In other cases, one or more of the following transformation types may be required to meet the business and technical needs of the target database:  Selecting only certain columns to load (or selecting null columns not to load). For example, if source data has three columns (also called attributes) say roll- no, age and salary then the extraction may take only roll-no and salary.
  • 21. Similarly, extraction mechanism may ignore all those records where salary is not present (salary = null).  Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female), this calls for automated data cleansing; no manual cleansing occurs during ETL  Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M)  Deriving a new calculated value (e.g., sale_amount = qty * unit_price)  Filtering  Sorting  Joining data from multiple sources (e.g., lookup, merge)  Aggregation (for example, rollup — summarizing multiple rows of data — total sales for each store, and for each region, etc.)  Transposing or pivoting (turning multiple columns into multiple rows or vice versa)  Splitting a column into multiple columns (e.g., putting a comma-separated list specified as a string in one column as individual values in different columns)  Disaggregation of repeating columns into a separate detail table (e.g., moving a series of addresses in one record into single addresses in a set of records in a linked address table)  Lookup and validate the relevant data from tables or referential files for slowly changing dimensions.  Applying any form of simple or complex data validation. If validation fails, it may result in a full, partial or no rejection of the data, and thus none, some or all the data is handed over to the next step, depending on the rule design and exception handling. Many of the above transformations may result in exceptions, for example, when a code translation parses an unknown code in the extracted data. Load Load phase loads the data into the end target, usually the data warehouse (DW). Depending on the requirements of the organization, this process varies widely. Some data warehouses may overwrite existing information with cumulative, frequently updating extract data is done on daily, weekly or monthly. while other DW (or even other parts of the same DW) may add new data in a historicized form, for example, hourly. To understand this, consider a DW that is required to maintain sales record of last one year. Then, the DW will overwrite any data that is older than a year with newer data. However, the entry of data for any one year window will be made in a historicized manner. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data loaded in the DW. Examples For example, a financial institution might have information on a customer in several departments and each department might have that customer's information listed in a different way. The membership department might list the customer by name, whereas the accounting department might list the customer by number. ETL can bundle all this
  • 22. data and consolidate it into a uniform presentation, such as for storing in a database or data warehouse. ETL Tools At present the most popular and widely used ETL tools and applications on the market are: # IBM Web-sphere Data Stage (Formerly known as Ascential Data Stage and Ardent Data Stage) # Informatica Power Centre # Oracle Warehouse Builder # Ab-Initio # Pentaho Data Integration - Kettle Project (open source ETL) # SAS ETL studio # Cognos Decision stream # Business Objects Data Integrator (BODI) # Microsoft SQL Server Integration Services (SSIS) OLTP(Online transaction processing ) Definition: Databases must often allow the real-time processing of SQL transactions to support e-commerce and other time-critical applications. This type of processing is known as online transaction processing (OLTP). Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. The term is somewhat ambiguous; some understand a "transaction" in the context of computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions. OLTP has also been used to refer to processing in which the system responds immediately to user requests. An automatic teller machine (ATM) for a bank is an example of a commercial transaction processing application. The technology is used in a number of industries, including banking, airlines, mail order, supermarkets, and manufacturing. Applications include electronic banking, order processing, employee time clock systems, e-commerce. The most widely used OLTP system is probably IBM's CICS. Benefits Online Transaction Processing has two key benefits: simplicity and efficiency. Reduced paper trails and the faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes things simpler for businesses. Disadvantages As with any information processing system, security and reliability are considerations. Online transaction systems are generally more susceptible to direct attack and abuse than their offline counterparts. When organizations choose to rely on OLTP,
  • 23. operations can be severely impacted if the transaction system or database is unavailable due to data corruption, systems failure, or network availability issues. Additionally, like many modern online information technology solutions, some systems require offline maintenance which further affects the cost-benefit analysis. Contrasting Data warehouse and OLTP One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in OLTP environments. Data warehouses and OLTP systems have very different requirements. Here are some examples of differences between typical data warehouses and OLTP systems:  Workload Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations. OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations.  Data modifications A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse.
  • 24. In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.  Schema design Data warehouses often use de-normalized or partially de-normalized schemas (such as a star schema) to optimize query performance. OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency.  Typical operations A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month." A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."  Historical data Data warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction. OLAP(online analytical processing) Online analytical processing, or OLAP is an approach to swiftly answer multi- dimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining. The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases. The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the rows and columns of the matrix; the measures form the values.
  • 25. At the core of any OLAP system is the concept of an OLAP cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a column to the fact table. This allows an analyst to view the measures along any combination of the dimensions. For Example: Sales Fact Table +------------------------+ | sale_amount | time_id | +------------------------+ Time Dimension | 2008.08| 1234 |---+ +-----------------------------+ +------------------------+ | | time_id | timestamp | | +-----------------------------+ +---->| 1234 | 20080902 12:35:43 | +-----------------------------+ MOLAP(MULTIDIMENSIONAL OLAP) MOLAP stands for Multidimensional Online Analytical Processing. MOLAP is an alternative to the ROLAP (Relational OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, MOLAP differs significantly in that it requires the pre-computation and storage of information in the cube — the operation known as processing. MOLAP stores this data in an optimized multidimensional array storage, rather than in a relational database (i.e. in ROLAP). Advantages of MOLAP  Fast query performance due to optimized storage, multidimensional indexing and caching.  Smaller on-disk size of data compared to data stored in relational database due to compression techniques.  Automated computation of higher level aggregates of the data.  It is very compact for low dimension data sets.
  • 26.  Array model provides natural indexing  Effective data extract achieved through the pre-structuring of aggregated data. Disadvantages of MOLAP  The processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which has changed (usually new data) instead of reprocessing the entire data set.  MOLAP tools traditionally have difficulty querying models with dimensions with very high cardinality (i.e., millions of members).  Some MOLAP products have difficulty updating and querying models with more than ten dimensions. This limit differs depending on the complexity and cardinality of the dimensions in question. It also depends on the number of facts or measures stored. Other MOLAP products can handle hundreds of dimensions.  MOLAP approach introduces data redundancy. ROLAP(RELATIONAL OLAP) ROLAP stands for Relational Online Analytical Processing. ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the pre-computation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database. Advantages of ROLAP  ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (i.e. millions of members).  With a variety of data loading tools available, and the ability to fine tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads.  The data is stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).
  • 27.  ROLAP tools are better at handling non-aggregatable facts (e.g. textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements.  By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.  The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users (SQL WHERE clause). Disadvantages of ROLAP  There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance.  The loading of aggregate tables must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support.  When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes.  ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in SQL language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP tools.  Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.
  • 28. DIFFERENCE BETWEEN OLTP AND OLAP OLTP OLAP Current data & Short database transactions Current as well as historical data & Long database transactions Online update/insert/delete Batch update/insert/delete Normalization is promoted De-normalization is promoted High volume transactions Low volume transactions Transaction recovery is necessary Transaction recovery is not necessary Database size: 100 MB TO GB Database size :100 GB to TB
  • 29. BUSINESS INTELLIGENCE Business Intelligence (BI) refers to computer-based techniques used in spotting, digging-out, and analyzing business data, such as sales revenue by products and/or departments or associated costs and incomes BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive analytics. Business Intelligence often aims to support better business decision-making Thus a BI system can be called a decision support system (DSS) Though the term business intelligence is often used as a synonym for competitive intelligence, because they both support decision making, BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence, is done by gathering, analyzing and disseminating information with or without support from technology and applications, and focuses on all-source information and data (unstructured or structured), mostly external to, but also internal to a company, to support decision making. The five key stages of Business Intelligence: 1. Data Sourcing 2. Data Analysis 3. Situation Awareness 4. Risk Assessment 5. Decision Support Data sourcing Business Intelligence is about extracting information from multiple sources of data. The data might be: text documents - e.g. memos or reports or email messages; photographs and images; sounds; formatted tables; web pages and URL lists. The key to data sourcing is to obtain the information in electronic form. So typical sources of data might include: scanners; digital cameras; database queries; web searches; computer file access; etcetera. Data analysis Business Intelligence is about synthesizing useful knowledge from collections of data. It is about estimating current trends, integrating and summarising disparate information, validating models of understanding, and predicting missing information or future trends. This process of data analysis is also called data mining or knowledge discovery. Typical analysis tools might use:-  Probability theory - e.g. classification, clustering and Bayesian networks.
  • 30.  Statistical methods - e.g. regression.  Operations research - e.g. queuing and scheduling.  Artificial intelligence - e.g. neural networks and fuzzy logic. Situation awareness Business Intelligence is about filtering out irrelevant information, and setting the remaining information in the context of the business and its environment. The user needs the key items of information relevant to his or her needs, and summaries that are syntheses of all the relevant data (market forces, government policy etc.). Situation awareness is the grasp of the context in which to understand and make decisions. Algorithms for situation assessment provide such syntheses automatically. Risk assessment Business Intelligence is about discovering what plausible actions might be taken, or decisions made, at different times. It is about helping you weigh up the current and future risk, cost or benefit of taking one action over another, or making one decision versus another. It is about inferring and summarising your best options or choices. Decision support Business Intelligence is about using information wisely. It aims to provide warning you of important events, such as takeovers, market changes, and poor staff performance, so that you can take preventative steps. It seeks to help you analyse and make better business decisions, to improve sales or customer satisfaction or staff morale. It presents the information you need, when you need it.
  • 31. MODELLING TECHNIQUES IN DATA WAREHOUSE CONCEPTUAL DATA MODEL A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include: * Includes the important entities and the relationships among them. * No attribute is specified. * No primary key is specified. The figure below is an example of a conceptual data model. From the figure above, we can see that the only information shown via the conceptual data model is the entities that describe the data and the relationships between those entities. No other information is shown through the conceptual data model. LOGICAL DATA MODEL A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include: * Includes all entities and relationships among them. * All attributes for each entity are specified. * The primary key for each entity is specified. * Foreign keys (keys identifying the relationship between different entities) are specified. * Normalization occurs at this level. The steps for designing the logical data model are as follows:
  • 32. 1. Specify primary keys for all entities. 2. Find the relationships between different entities. 3. Find all attributes for each entity. 4. Resolve many-to-many relationships. 5. Normalization. The figure below is an example of a logical data model. Comparing the logical data model shown above with the conceptual data model diagram, we see the main differences between the two: * In a logical data model, primary keys are present, whereas in a conceptual data model, no primary key is present. * In a logical data model, all attributes are specified within an entity. No attributes are specified in a conceptual data model. * Relationships between entities are specified using primary keys and foreign keys in a logical data model. In a conceptual data model, the relationships are simply stated, not specified, so we simply know that two entities are related, but we do not specify what attributes are used for this relationship. PHYSICAL DATA MODEL Physical data model. Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include:
  • 33. * Specification all tables and columns. * Foreign keys are used to identify relationships between tables. * De-normalization may occur based on user requirements. * Physical considerations may cause the physical data model to be quite different from the logical data model. * Physical data model will be different for different RDBMS. For example, data type for a column may be different between My-SQL and SQL Server. The steps for physical data model design are as follows: 1. Convert entities into tables. 2. Convert relationships into foreign keys. 3. Convert attributes into columns. 4. Modify the physical data model based on physical constraints / requirements. The figure below is an example of a physical data model Comparing the logical data model shown above with the logical data model diagram, we see the main differences between the two: * Entity names are now table names. * Attributes are now column names. * Data type for each column is specified. Data types can be different depending on the actual database being used.
  • 34. QLIKVIEW – A BUSINESS INTELLIGENCE TOOL What is QlilView? QlikView is a flagged product of QlikTech Company and can be classify it to the category of Business Intelligence tools of the future. During the 2007 QlikTech gained a title of the "coolest" vendor in BI and has been growing faster than any other BI vendor. Recognized as a 'Visionary' in Gartner Group's annual Magic Quadrant (2007). According to Gartner's predictions "By 2012, 70% of Global 1000 organizations will load detailed data into memory as the primary method to optimize BI application performance (0.7 probability)" and the QlikView is one of the leader in this market. QlikView creates endless possibilities of making ad hoc queries in a non- hierarchical data structure. It is possible thanks to (AQL - Associative Query Logic) - for automatically associating values in the internal QlikView database. QlikView simplifies analysis for everyone. It makes it possible for anybody to create very useful, accurate KPI, measurement reports and performance dashboards and make accurate,strategicdecisions. QlikView has over 4,500 customers in 58 countries, and adds 11 new customers each day. In addition to hundreds of small and midsized companies, QlikTech's customers include large corporations such as Pfizer, AstraZeneca, The Campbell Soup Company, Top Flite, and 3M. QlikTech is privately held and venture backed by Accel Partners, Jerusalem Venture Partners, and Industrifonden. Qlikview is an easy to use and flexible business intelligence software that has been around since 1993. It allows for interactive analysis and is easy to develop, implement and train users on. Other QlikTech products are, Qlikview Server which provides data analysis of Qlikview data over the web and Qlikview Publisher which helps control the distibution of Qlikview's applications. QlikView, is a suite of powerful and rapidly deployable business intelligence software that enables enterprises and their management to effectively and proactively monitor, manage and optimize their business. QlikViewTM lets companies analyze all their data quickly and efficiently. QlikView eliminates the need for data warehouses, data marts, and OLAP cubes; instead it gives users rapid access to data from multiple sources in an intuitive, dashboard-style interface. With QlikView, companies can turn data into information, and information into better decisions. QlikView is quick to implement, flexible and powerful to use, and easy to learn. It provides a rapid return on investment and a low total cost of ownership compared to traditional OLAP and reporting tools.
  • 35. QlikView helps California Casualty improve efficiencies 25% improvement in sales conversions, 60% improvement in compliance response time. The Business Failure of Traditional Business Intelligence A recent article in Intelligent Enterprise magazine captured the three major issues of today’s traditional BI solutions: 1) Data reporting has been an afterthought to using core business applications on a day-to-day basis, breaking the “link between insight and action.” 2) Consolidating disparate tools into a suite isn’t enough to give business users the information they need. Search and semantics need to be integrated. 3) “Build it and they will come” doesn’t provide insight, just more technology. BI needs to answer the needs of decision makers, and needs to deliver incremental successes along the way. These factors, among others, have undoubtedly led to the dismal performance in BI initiatives. According to a study published in DM Review, a leading business intelligence publication, the average total implementation time for BI initiatives is 17 months, with five months to deploy the first usable analytic application. The average total cost of implementation is a staggering $12.8 million. And at best, according to the survey, is a 35% success rate for internally built BI/DW systems purchased operational analytic applications are considered successful a mere 13% of the time. How long did it take to implement your business intelligence initiative? These failed initiatives cost more than money and resources – they hamper business performance in nearly every way. A summer 2005 survey of 385 finance and IT executives, by CFO Research Services asked respondents to identify the drivers of poor information quality (IQ). Nearly half the survey respondents – 45 percent – cite disparate, non-integrated IT systems and the variability of business processes as an
  • 36. acute problem that constrains management’s ability to work effectively and focus on high-value activities. Approximately the same number agrees that finance and business units alike spend too much time developing supplemental reports and analysis. A Revolution in Business Intelligence The OLAP Tradition Twenty years ago memory was expensive and processors were slow. Faced with these constraints, developers at the time devised architecture for delivering results of multi- dimensional analysis which relied on pre-calculating fixed analyses. Simply put, they pre- calculated all measures across every possible combination of dimensions. For example, for total sales by sales person and region, the system would calculate total sales for each sales person for each region, and for every union of sales person and region. The results of these calculations were stored and retrieved when an end user requested a particular “analysis.” This is what is traditionally referred to as “calculating the cube” and the “cube” is the mechanism which organizes and stores the results. Because the results were pre-calculated, regardless of how long in took to calculate the results, the response time from the perspective of the end user was instantaneous. The Enabling Technology or Change Agent Today, we have a fundamentally different technology platform available to us on which to build business intelligence. Specifically three things have happened: First, Moore’s Law has relentlessly beat its drum – resulting in processors which are significantly faster today than they were twenty years ago and memory which is significantly less expensive. The difference in price/performance for both factors is well over a factor of 1,000 higher today than it was then. Second, the mainstream availability of 64-bit processors raises the amount of memory a computer can utilize. A 32-bit processor can use four gigabytes of memory at a maximum, and a portion of that must be devoted to the operating system. A 64-bit processor can use 17,179,869,184 gigabytes or 16 Exabyte of RAM – a factor of four billion more. Of course, the practical limitation of computers available today is much lower, but machines with 40, 80, or even 120 gigabytes of memory are readily available for less than $30,000. Third, hardware manufacturers have shifted from computers with few fast processors to computers with multiple lower-power, lower-speed processors. The challenge today is keeping computers operating at a reasonable temperature. Intel’ and AMD’s stated strategy for achieving this goal is to equip computers with many lower power processors working in parallel. Today it is common to find computers with 2, 4, 16, 32 or even 128 processors. In addition, newer processors have multiple “cores” bundled on a single chip.
  • 37. QlikView’s Premise: In Memory BI QlikView was built with a simple architectural premise – all data should be held in memory, and all calculations should be performed when requested and not prior. Twenty years ago this would have been impossible. In 1993, when QlikTech was founded, it was still a pretty crazy idea. But now, the trends in the underlying platform (referenced in the previous section) have lifted the constraints so that organizations of all sizes can now benefit. QlikView’s patented technology is based on an extremely efficient, in memory data model. High-speed associations occur as the user clicks in the applications and the display is updated immediately, allowing users to work with millions of cells of data and still respond to queries in less than a second. As a result of this design, QlikView removes the need to pre-aggregate data, define complex dimensional hierarchies and generate cubes. QlikView performs calculations on the fly, giving the power of multidimensional analysis to every user, not just the highly trained few. By taking full advantage of 64-bit technology’s memory capacity QlikView can provide summary level metrics and record level detail on the same architecture. Companies gain infinitely scalable business analysis solutions that provide summary KPIs as well as highly granular, detailed analyses. In its recent Research Note, leading analysts at Gartner reported on the value of this approach: “Our research indicates that query performance using this in-memory
  • 38. method is often just as fast as or faster than traditional aggregate-based architectures. In-memory technology not only retrieves the data faster, but it also performs calculations on the query results much faster than disk-based approaches…Therefore, with in-memory technology, users can freely explore detailed data in an unfettered manner without the limitations of a cube or aggregate table to receive good performance.” A Revolution in Benefits The QlikView solution, because of its unique integrated components and because it operates entirely in memory, offers some unique advantages over traditional OLAP:  Fast Time-to-Value: With traditional OLAP, constructing cubes is time consuming and requires expert skills. This process can take months, and sometimes over a year. In addition, the cube must be constructed before it can be calculated, a process which itself can take hours. And, all this must occur before analysis or reporting can be performed – before the user even sees answers to his questions. Because the data is loaded in memory, creating analysis in QlikView takes seconds. There is no pre- definition of what is a dimension – any data is available as a dimension and any data is available as a measure. The time implementing QlikView is spent locating data, and deciding what analysis is interesting or relevant to solving the business question. Typically, this process takes only a week or two.  Easy to Use: The entire end user experience in QlikView is driven by the “click.” End users enjoy using QlikView because it works the way their mind does. Each time they want to review the data sliced a new way, they simply click on the data they want to evaluate. Because QlikView operates in memory, with each click all data and measures are recalculated to reflect the selection. Users can go from high level aggregates (e.g., roll up of margin on all products in a specific line) to individual records (e.g., which order was that?) in a click – without pre-defining the path to the individual record. The QlikView UI uses color coding which provides instant feedback to queries.  Powerful: Because queries and calculations are performed in memory, they are extremely quick. In addition, QlikView is not constrained by the speed of the underlying source. Even if the underlying data is stored in a system which has poor query performance (for instance, a text file), the performance is always optimal because the data is loaded in memory. QlikView also compresses data as it is stored in memory, allowing large amounts of data to be stored. Typically, there is a 10X reduction in size of the data once it’s in memory.  Flexible: One of the major issues with traditional OLAP is that modifying an analysis requires changing the cube, a process which can take a very long time. In addition, this process is typically controlled by IT. With QlikView, viewing analysis by a new dimension or changing a measure can be performed by business professionals in seconds. Standard interfaces, including ODBC and Web Services, mean that any data
  • 39. source can be analyzed in QlikView. What’s more, users can do “local” or “desktop” analysis, using the full data and interactivity of the application on laptops.  Scalable: QlikView is designed to scale easily in both the amount of data it can handle, and the number of users working with it. It’s simple to deploy to thousands of users – utilizing all available hardware power across all available processors and cores, requiring only a web browser. An Overview of QlikView QlikView is revolutionizing business intelligence with fast, powerful and visual analysis that’s simple to use. QlikView’s patented technology offers all of the features of “traditional” analytics solutions – dashboards and alerts, multi-dimensional analyses, slice-and-dice of data – without the limitations, cost or complexity of traditional BI applications. QlikView solutions can be deployed in days, users can be trained in minutes, and end users get results instantly.
  • 40. The QlikView Platform QlikView offers all of the capabilities that traditionally required a complex and costly suite of products, on a single unified platform. QlikView provides flexible ad-hoc analysis capabilities, powerful analytic applications, and simple printable reports. This allows organizations to deploy QlikView to everyone – highly skilled analysts doing ad-hoc detailed reporting, executives requiring a dashboard of critical business information and plant supervisors analyzing output performance. Further, QlikView allows organizations to eliminate unused paper reports, and replace them with demand-driven reporting. QlikView Enterprise – For the developer QlikView Enterprise is the complete developer’s tool for building QlikView applications. QlikView Enterprise lets developers load disparate data sources for access in a single application. The data load script supports over 150 functions for data cleansing, manipulation and aggregation. An intuitive, wizard-driven interface allows powerful, visually interactive applications to be developed quickly. QlikView Publisher – For distribution QlikView Publisher ensures that the right information reaches the right user at the right time. As the use of business analysis spreads throughout the organization, controlling the distribution of analysis becomes increasingly important. QlikView Publisher allows for complete control of the distribution of a company’s QlikView applications, automating the data refresh process for QlikView application data. In addition, it ensures that applications are distributed to the correct users when they need them. QlikView Server – For security QlikView Server is the central source for truth in an organization. With today’s distributed workforce, QlikView Server provides a simple way for organizations to
  • 41. ensure that everyone has access to the latest data and analysis regardless of their location. Regardless of the client chosen – zero-footprint DHTML, Windows, ActiveX plug-in, or Java – QlikView Server provides accessto the latest version of each QlikView application. QlikView Professional – For the power user QlikView Professional lets power-users build, change or modify the layout of existing QlikView applications. QlikView Professional users can refresh existing data sources, and can choose to work with either local applications or applications distributed via QlikView Server. Power user scan work with local data, including offline enterprise applications, with no limitations. QlikView Analyzer – For the general user QlikView Analyzer lets end-users connect to server-based QlikView applications. QlikView Analyzer has a number of deployment options, including Java clients (supporting Sun and MSFT-Java), plug-in for MSFT IE and AJAX zero footprint clients. The installed Analyzer EXE client also provides offline analysis and reporting capabilities. QlikView Architecture Most traditional databases are built upon a relational model. Records are broken apart to reduce redundancy and key fields are used to put the records back together at the time they are used. Database programmers are required to make tradeoffs between increased speed at the cost of more space and more time to add or edit records, and the database user often suffers based on these decisions. QlikView was built with a simple architectural premise – all data should be held in memory, and all calculations should be performed when requested and not prior. QlikTech’s goal is to deliver powerful analytic and reporting solutions in a quarter of the time, at half the cost, and with twice the value of competing OLAP cube (Online Analytical Processing)-based products. QlikView is designed so that the entire application (data model included) is held in RAM – this is what makes it uniquely efficient compared to traditional OLAP cube-based applications. It creates an in-memory data model as it loads data from a data source, enabling it to access millions of cells of data and still respond to queries in less than a second. High-speed associations occur as the user clicks in the various sheet objects and the display is updated immediately. QlikView operates much faster and requires significantly less space than an equivalent relational database because it optimizes the data as it loads – removing redundant field data and automatically linking tables together. Indexes are not required, making every field available as a search field without any performance penalty. Because of this design, QlikView typically requires a 1/10th of the space required for the same data represented in a relational model, i.e. 100GB data fits into 10GB of memory. There is no limit to the number of tables allowed in an application, or to the number of fields, rows or cells in a single table. RAM is the only factor that limits the size of an application. QlikView offers three components in an integrated solution:
  • 42. Fast Query Engine: Loading the data into memory allows QlikView to query, or subset, the data instantly to only reveal the data which is relevant to a given user. In addition, QlikView shows users the data which is excluded by a selection.  On Demand Calculation Engine: charts, graphs, and tables of all types in QlikView are multidimensional analysis. That is, they show one or more measures (e.g., metrics, KPIs, expressions, etc.) across one or more dimensions (example: total sales by region). The major difference is that these calculations are performed as the user clicks and never prior.  Visually Interactive User Interface (UI): QlikView offers hundreds of possible chart and table types and varieties; there are list boxes for navigating dimensions; statistic boxes; and many other UI elements. Every UI element can be clicked on to query. QlikView Technical Features Data and Data Loading QlikView loads data directly from most data sources (i.e., ODBC and OLEDB sources, using vendor specific drivers), any text or table data file (i.e., delimited text files, Excel files, XML files, etc.), and any formats, as well as data warehouses and data marts (although these are not required). QlikView also offers a plug-in model for loading custom data sources (web services).QlikView is designed to handle a remarkable amount of data. There is no limit to the number of tables allowed in an application. In addition, there is no limit to the number of fields, rows or cells in a single table – QlikView can handle billions of unique values in a given field. RAM is the only other factor that limits the size of an application. The maximum size of a QlikView application is closely tied to the available RAM on the system where
  • 43. the application will run. However, it is not as easy as looking at the size of a relational database and comparing that to the RAM on the system to determine if the application is appropriate for QlikView. As QlikView loads data from a source database, the data is highly compressed and optimized, typically resulting in a QlikView application of only 10% of the size of the original source. Load Script QlikView can load data that is stored in a variety of formats, as mentioned above. Data can be loaded from generic tables, cross tables, mapping tables (data cleansing), and interval matching tables. Tables can be joined, concatenated, sampled and linked to external information such as other programs, bitmaps, URLs, etc. In order to pull data from a data source, QlikView executes a load script. The load script defines the source databases and tables and fields that should be loaded into QlikView. In addition, you can calculate new variables and records using hundreds of functions available in the script. In order to help you create a load script, QlikView includes a wizard that will generate the script. Visual Basic Script and JavaScript Support Programmers can develop VBScript or JavaScript macros to add specific functionality to an application. Macros can be attached to button objects that a user must click to activate, or the macros can be attached to various QlikView events. For example, a macro can be automatically invoked whenever an application is opened, when the load script is executed, or when a selection is made in a list box. Analysis Engine As described earlier, QlikView’s In Memory Data Model forms the basis for every QlikView application. It holds all data loaded down on a transaction level, and is part of the QVW file (QlikView file format), which is loaded into RAM. The Platform is optimized to run on every available Windows platform (32 & 64-bit), and makes use of all available processing power and RAM for each specific platform. The Selection Engine processes the user “point-and-click” and returns the associated values to that query. It provides sub-second response times on queries made to the In Memory Data Model The Chart & Table Engine handles the calculations and graphic display of the charts in the user interface. It calculates multiple “cubes” in real time (one cube for each graph in application), and promotes user selections directly in graphs. Clients Supported clients include an installed Windows EXE client that connects to QlikView Server; an ActiveX component integrates other software. The Platform also allows for
  • 44. an ActiveX plug-in for Microsoft Internet Explorer, AJAX zero footprint client and is Java-client compatible with Mozilla based web browsers. An open interface enables automated integration with QlikView. Security The data in a QlikView application is often confidential and then you need to control the access to the data. Authentication is any process by which you verify that someone is who they claim they are. QlikView can either let the Windows operating system do the authentication using the Windows log on, or prompt for a user ID and a password (different from the Windows user ID and password) or use the QlikView serial number as a simple authentication method. Authorization is finding out if the person, once identified, is permitted to have the resource. QlikView can let the Windows operating system do the authorization, by allowing or disallowing a user, a group or a domain the access to the entire application. If a finer granularity is needed, e.g. the user is only allowed to see specific records or fields, the QlikView Publisher can be used to automate the creation of a set of applications, i.e. one application per user group. QlikView Application and User Interface The QlikView interface is designed to provide perfect data overview in multiple dimensions – simplifying analysis and reporting for everyone. It presents data intuitively, allowing users to question anything and everything, from all types of objects (e.g., list boxes, graphs, tables) and to any aspect of the underlying data – regardless of where the data is located in a hierarchy.
  • 45. Key Elements of the User Interface Sheets & Tabs In QlikView, analysis is made on sheets navigated through tabs (similar to Excel). Each sheet can hold several sheet objects (list boxes, graphs, tables etc) to analyze the underlying data model. All sheets are interconnected, meaning that selections made on one sheet affect all other objects on all other sheets. List Box The basic building block of a QlikView application is the list box. A list box is a movable, resizable object that presents the data taken from a single column of a table. Rather than listing the duplicate values, only unique values are presented. If desired, the number of occurrences of each distinct value can also be listed. Multi Box The multi box can hold several fields in a single object. Selections can be made through dropdown lists by clicking or text search and select. The multi box displays values only in a single selected mode. Charts & Gauges In QlikView, the results of a selection or query can be displayed in graph. Typically, a graph holds one or more expressions which are recalculated each time a selection is made. The result can be displayed as a bar chart, line chart, heat chart, grid chart, scatter chart, or as speedometer or gauge. All graphs are fully interactive, which
  • 46. means that you can make selections or queries directly by point-and-click or by “painting” the area of interest. Tables Just as with graphical representation of data (in graphs), the result of an analysis can be displayed in a table. QlikView provides the ability to display data in powerful Pivot tables and Straight tables. These tables are fully interactive, which means that you can make selections directly in the tables or by drop down selection in the graph dimensions. Using a table box, QlikView can display any combination of fields in a single object, regardless of what source database table they came from. This feature is useful when providing listings of any kind. The table box can be sorted by any field or combination. Reports & Send to Excel QlikView has an integrated report editor for ease-of-use of application specific reports. The reports are dynamically updated as the user makes selections. Power users can also easily create reports by a simple drag-and-drop procedure. All data displayed in the GUI is ready to be exported at any time to Excel or other applications by a simple click of a button. User Navigation and Analysis Point-and-Click Queries Asking and answering questions is a simple matter of point and click. The user forms a query in QlikView simply by clicking the mouse on a field value or other item of interest. In a list box, the user clicks on one or more values of interest to select them. QlikView immediately responds to the mouse click and updates all objects displayed on the current sheet. Multiple Sort Options Since each field of data can be displayed in its own list box, it makes sense that you would want to sort each list box independently of all others. When you are scrolling through a list box, you want the values to appear in some sorted order appropriate to that field. QlikView allows you to sort each list box independently and according to multiple sort specifications. One or more of the following algorithms can apply to each list box in either ascending or descending order: State: Selected and optional values can be sorted from the top or bottom of the list box Expression: Values are sorted by the result of evaluating any entered expression Frequency: Values are sorted by frequency of occurrence Numeric Value: Values are sorted according to their numeric value Text: Values are sorted alphabetically Load Order: Values are sorted according to the way they occurred in the original source database
  • 47. Powerful Searching Fortunately, QlikView allows you to search through the list as simply and quickly as typing on the keyboard. Select any list box, or open a multi box or drop down list, and start typing. QlikView immediately begins searching through the list to find values matching your criteria. Single character and multi-character wildcards are supported, as well as greater than and less than symbols to enable searching for numeric and date ranges. Rapid Application Design and Deployment Simple applications can be created within just a few minutes using QlikView’s wizards. More complex applications integrating data from various sources and displaying trend analysis charts and pivot tables may take a little bit longer. The best way to understand how simple it is to create and use a QlikView application is to step through the process involved: Step 1: Locate the Data Source The first step in creating an application in QlikView is to determine what data you wish to load. While it is possible to include inline data in the QlikView load script, application data will almost always come from an existing file, spreadsheet or database. You may load data from a single source file or database, or you may load and integrate data from many different sources at the same time. The source file will typically be arranged with each record of the file containing one record of data. However, QlikView can work with data in practically any format, including generic databases, cross-tables, hierarchical databases, multi-dimensional databases, etc. The first row may or may not contain field labels, although you can always choose to set or change the labels in the wizard or in the script. If the data will come from a text file, each file will typically be treated as a single table. When working with spreadsheets, each tabbed sheet will be treated as a table. Step 2: Create the Load Script Once the source data has been determined, a load script must be created to copy the data from the data source into QlikView’s associative database. Creating the load script is simplified by the use of wizards that construct script statements for supported file types. Step 3: Execute the Load Script After the load script is complete, the script must be executed either by using the “Run” button in the Edit Script dialog, or by selecting “Reload,” available on both the toolbar and the File menu. During the load process, QlikView examines each statement in the load script and processes it in sequential order. At the completion of the load script, a copy of all of the data referenced in the load script is loaded and available in the QlikView application.
  • 48. Step 4: Place Objects on a Sheet In order to use the data in the QlikView application, you must place list boxes or other objects on one or more sheets. The actual objects that should be used and how they should be grouped into sheets depends on the specific application. Step 5: Start Using the Application As soon as the first object is created on a sheet, the application is available for use. All objects are automatically associated together, and clicking in any object initiates a query. Step 6: Add More Sheets and Objects as Required Finally, continue to add and arrange objects on sheets until the application achieves the functionality desired. You may wish to add more customization to the load script by taking advantage of QlikView’s “Expression Engine,” or you may wish to add macros to automate certain actions. Main features and benefits of QlikView:  Use of an in-memory data model  Allows instant, in memory, manipulation of massive datasets  Does not require high cost hardware  Automated data integration and a graphical analytical environment attractive for customers  Fast and powerful visualization capabilities  Ease of use - end users require almost no training  Highly scalable - near instant response time on very huge data volumes  Fast implementation - customers are live in less than one month, and most in a week  Flexible - allows unlimited dimensions and measures and can be modified in seconds  Integrated - all in one solution : dashboards, power analysis and simply reporting on a single architecture  Low cost - shorter implementations result in cost saving and fast return on investment  Risk free - available as a fully-functional free trial download
  • 49. REPORTS USING BUSINESS INTELLIGENCE TOOL “ QLIKVIEW”
  • 50. A REPORT ON BISLERI Mineral Water under the name 'Bisleri' was first introduced in Mumbai in glass bottles in two varieties - bubbly & still in 1965 by Bisleri Ltd., a company of Italian origin. This company was started by Signor Felice Bisleri who first brought the idea of selling bottled water in India. Parle bought over Bisleri (India) Ltd. In 1969 & started bottling Mineral water in glass bottles under the brand name 'Bisleri'. Later Parle switched over to PVC non- returnable bottles & finally advanced to PET containers. Since 1995 Mr. Ramesh J. Chauhan has started expanding Bisleri operations substantially and the turn over has multiplied more than 20 times over a period of 10 years and the average growth rate has been around 40% over this period. Presently we have 8 plants & 11 franchisees all over India. We have our presence covering the entire span of India. In our future ventures we look to put up four more plants in 06- 07. We command a 60% market share of the organized market. Overwhelming popularity of 'Bisleri' & the fact that we pioneered bottled water in India, has made us synonymous to Mineral water & a household name. When you think of bottled water, you think Bisleri. Bisleri value their customers & therefore have developed 8 unique pack sizes to suit the need of every individual. We are present in 250ml cups, 250ml bottles, 500ml, 1L, 1.5L, 2L which are the non-returnable packs & 5L, 20L which are the returnable packs. Till date the Indian consumer has been offered Bisleri water, however in their effort to bring to you something refreshingly new, they have introduced Bisleri Natural Mountain Water - water brought to us from the foothills of the mountains situated in Himachal Pradesh. Hence our product range now comprises of two variants : Bisleri with added minerals & Bisleri Mountain Water. It is their commitment to offer every Indian pure & clean drinking water. Bisleri Water is put through multiple stages of purification, organized & finally packed for consumption. . Rigorous R&D & stringent quality controls has made us a market leader in the bottled water segment. Strict hygiene conditions are maintained in all plants. In their endeavour to maintain strict quality controls each unit purchases performs & caps only from approved vendors. They produce our own bottles in-house. They have recently procured the latest world class state of the art machineries that puts us at par with International standards. This has not only helped us improve packaging quality but has also reduced raw material wastage & doubled production capacity. You can be rest assured that you are drinking safe & pure water when you consume Bisleri. Bisleri is free of impurities & 100% safe. Enjoy the Sweet taste of Purity !
  • 51. BISLERI PRODUCTS Bisleri value their customers & therefore have developed 8 unique pack sizes to suit the need of every individual. We are present in 250ml cups, 250ml bottles, 500ml, 1L, 1.5L, 2L which are the non-returnable packs & 5L, 20L which are the returnable packs. Bisleri with added Minerals Bisleri Mineral Water contains minerals such as magnesium sulphate and potassium bicarbonate which are essential minerals for healthy living. They not only maintain the pH balance of the body but also help in keeping you fit and energetic at all times. Bisleri Mountain Water Bisleri Natural Mountain emanates from a natural spring, located in Uttaranchal and Himachal nestled in the vast Shivalik Mountain ranges. Lauded as today's 'fountain of youth', Bisleri Natural Mountain Water resonates with the energy and vibrancy capable of taking you back to nature. Bisleri Natural Water is bottled in its two plants in Uttaranchal and Himachal Pradesh and is available in six different pack sizes of 250ml, 500ml, 1 litre, 1.5 litre, 2 litre and 5 litres.
  • 52. TECHNOLOGICAL ASPECTS Here we are creating excel files as a database which will be linked to qlikview(business intelligence tool) to improve your decision system support systems. Excels sheet created are as follows: 1.ZONE SHEET In the Zone excel sheet I have taken only 3 regions of NCR i.e Gurgaon, Delhi, Noida. Each region is divided into 100 places in its circle. It also consists of the place population its area and population growth. In short it provides full information about the geography. 2.TRANSACTION SHEET In Transaction sheet I have provided with 5 years of data (month, day) along with salesman ID, Transaction ID, Retailer , Retailer ID and at the last Sales. 3.SALESMAN In salesman sheet we provide the Salesman ID with Salesman Name Along with the Distributor ID with whom salesman is related to. 4.RETAILER Retailer excel sheet consist of Retailer ID of the Retailers their name, Zone & Region to which they are associated to.
  • 53. DESCRIPTION OF REPORT Bisleri report is implemented on the business intelligence tool named “Qlikview”. This report is basically for 3 regions in NCR i.e Gurgaon, Delhi & Noida. I have taken around 100 regions in each zone with their population, population growth & area of the region. I have designed the excel sheet in such a way that corresponding region is displayed with their population, population growth & name of the zone, as it was an small level report so nothing much was done with it. Bisleri or any other mineral water supply company has Distributors in the particular regions and a number of salesman executive works under them which are responsible for delivery of the orders home to home. They keep track of the delivery in the respective regions and are responsible for sale in their regions, once in a week i.e Saturday they have to report to the office submitting their sales report which they have done in whole week for e.g if the Bisleri office is in Delhi salesman executive from Noida Gurgaon and Delhi reports to the office every Saturday along with their daily sales report sheet, they are checked accordingly there of their performance and sales they have done whole week, is they are giving benefit to the company is the sales increasing in that regions. DATA COLLECTION Regarding data collection truly speaking no company want to give their data to any outsider, but somehow I managed to get data from a sales executive named “Rajneesh Mani Tripathi” who was working in Bisleri itself.
  • 54. DESCRIPTION OF IMPLEMENTATION ON TOOL  INTRODUCTION This is the first page or home page of the tool, as we were making the report on Bisleri so I have planned to put the background of Bisleri only.
  • 55.  BACKGROUND In the background section I have described about Bisleri its circulation and about the report it also gives a small description about Qlikview and tabs which redirects to Bisleri homepage.  HOW TO WORK This tab describes how the selection are made or how to work on this software suppose an unknown person to computer or who is not in IT field can easily study and use this tool making it user friendly.
  • 56.  GEOGRAPHY Geography tab gives the description about the area it shows Zone with their corresponding regions, their area and population along with their growth is shown graphically. If we select Delhi zone with cannot place as its region it shows its population in thousand with graph.
  • 57.  RETAILER Retailer tab describes the list of retailers with their Retailer ID’s in which zone and region they are working and their sales shown graphically. If the selection is made of Agarwal Restaurant of the Id 1097 its corresponding sales is shown graphically with its zone and region.
  • 58.  SALES Sales tab shows Zone, Regions Years of sale day and month of sale with sales shown graphically by corresponding salesman, bookmark with the retailer shown to which particular sales man is associated.  TABLES There is a table showing sales of the different salesman.
  • 59. CONCLUSION This Bisleri project was a bench mark for me to design other reports, using the concepts of Bisleri report it would be more comfortable to make other reports. This report was a practice report I have learnt through this small report how to work on this tool It was quite interesting toll to be worked upon. I have learnt through this report how to draw graphs make different tables and list, multi, table boxes. Through this report I have also got the idea of working of mineral water supply companies how they work. Last but not the least working on this project was a nice Experience which I would use to make other reports in near future.
  • 61. OVERVIEW OF THE COMPANY LG believe that technological innovation is the key to success in the marketplace. Founded in 1958, led the way in bringing advanced digital products and applied technologies to customers. With commitment to innovation and assertive global business policies aim to become a worldwide leader in advanced digital technology. The trajectory of LG Electronics, its growth and diversification, has always been grounded in the company ethos of making our customers' lives ever better and easier- happier, even-through increased functionality and fun. Since its founding in 1958, LG Electronics has led the way to an ever-more advanced digital era. Along the way, our constantly evolving technological expertise has lent itself to many new products and applied technologies. Moving forward into the 21st century, LG continues to on its path to becoming the finest global electronics company, bar none. LG Electronics is pursuing the vision of becoming a true global digital leader, attracting customers worldwide through its innovative products and design. The company’s goal is to rank among the top 3 consumer electronics and telecommunications companies in the world by 2010. To achieve this, we have embraced the idea of “Great Company, Great People,” recognizing that only great people can create a great company. Facts & Figures  Established In : Jan 1997  Managing Director : Mr. Moon B. Shin  Corporate Office :Plot no51, Udyog Vihar, Surajpur Kasna Road, Greater Noida (UP)  Corporate Website : http://www.lgindia.com  Number of Employees: 3000+ Business Areas & Main Products Home Entertainment Plasma Display Panels, LCD TV , Colour TVs, Audios, Home Theater System, DVD Recorder/Player Home Appliances Refrigerators, Washing Machines, Microwaves, Vacuum Cleaners AC Split AC, Windows AC, Commercial AC’s
  • 62. Business Solutions LCD monitors, CRT monitors, Network Monitors, Graphic Monitor, Optical Storage Devices, LED Projectors, NAS( Network attached Storage) and Digital signage GSM Color Screen GSM Handsets, Camera Phones, Touch Screen Phones, 3G Phones PERFORMANCE AND GROWTH RATE
  • 63. TECHNOLOGICAL ASPECTS Here we are creating excel files as a database which will be linked to qlikview(business intelligence tool) to improve your decision system support systems. Excels sheet created are as follows: 1.ZONE EXCEL SHEET: Zone excel sheet consist of name of 20 states with consist of 10 regions each along with zone id it also comprises of the population of different regions. 2.TRANSACTION EXCEL SHEET: Transaction excel sheet consist of the fields as name of all the Retailers in different regions in all 20 states and their sales of respective years from 2005-2009 with a total of all the sales too. 3. RETAILERS EXCEL SHEET: Retailer excel sheet consist of Retailer ID and the name of respective Retailers in different regions across the country. 4.PROFIT & LOSS EXCEL SHEET: In this sheet had given a profit and loss id so as the star schema should be made easily. This sheet comprises of Profit & Loss ID, Quantity supplied, inventories of 5 years (2005-2009) and I have decided a threshold so that we would be able to determine the profit or loss made by the different retailers. 5.PRODUCT EXCEL SHEET: It comprises of Product ID name of the products with their specifications, category, series & model number 6.OFFERS EXCEL SHEET: There is a offer sheet as well which consist of the model number of different products their original price, festival on which the discount is given by the company, discount rate with its new price after the deduction. 7.MANAGERS EXCEL SHEET: Manager excel sheet consist of Manager ID, name of the managers and their contact number so that they can be called by the company when needed.
  • 64. 8.INVENTORY EXCEL SHEET: Inventory excel sheet consist of Zone ID, Retailers ID, Managers ID, Product ID, Profit & Loss ID, Quantity supplied per year by the retailers, Threshold, and inventories retailers have from 2005 - 2009 9. LG PICTURES OR GRAPHICS EXCEL SHEET: In LG pictures excel sheet I have given the model number of the products and the path to graphics folder in which pictures of different items are saved.
  • 65. DESCRIPTION OF REPORT LG report implemented on qlikview is on a wide scale, this report is nearly for all over India in this report I have taken consist of 20 states each states consist of 10 regions which thereby consist of 5-6 retailers, each retailers have 5 years of sales data 2005 – 2009 their profit and loss is also determined on the basis of sales made by them in corresponding years. Nearly all the products are shown with their series model number and there pictures is also shown. There is also a Manager which is appointed to each region so that if needed would contact directly to the regional Manager so that one can know about the sales and problems going out there in their regions, their contact number is also given along with their Manager ID There are also discounts given on different offers on different festivals and new year, original price with their discounted rate and discounted price is also shown along with their current selection.
  • 66. DESCRIPTION OF IMPLEMENTATION ON TOOL  STARTING PAGE Starting page shows LG logo with a get started tab which directs you to next page,  BACKGROUND TAB: Background tab gives the description about the application or the report made for LG specifications, useful links which redirects you to qlikview community LG homepage and to learn more & there are some points described about qlikview