Unit 2.pptx

Unit 2-CORE SKILLS FOR
VISUAL ANALYSIS
DR. V. NIRMALA
DEPARTMENT OF AI & DS
EASWARI ENGINEERING COLLEGE

Information Visualization
It is a subset of data visualization.
“ The use of computer – supported, interactive, visual representation of abstract data to amplify
cognition.
• Computer supported: the visualization is displayed on a computer usually screen
•Interactive: the visualization are manipulated simply and directly in a free flowing manner., including
action such as filtering the data, drilling down to focus on the details.
•Visula representation: the information is displayed visually using attributes such s location, length,,
shape ,colour and size of object to form a picture, there by allows us o see patterns, trends, etc.
•Abstract data: information such as quantitative data, processes and relationship is considered
abstract data. Since abstract doesn’t have any physical form, information visualization connects the
data to visual characteristics such s shape and size, to represent in a meaningful way.
•Amplify Cognition: interaction with visualization extends our ability to think about information by
assisting memory and representing data in a way that our brain can easily comprehend.

Effective Data Analysis
Traits of Data analyst
• curiosity
◦ One telltale sign of all adept analysts is that they are a curious bunch. This almost childlike interest in how things work and why they work the
way they do can be a major asset since it makes the job more enjoyable and fun given the “investigative” nature of the analyst position.
◦ Deep down, analysts are problem solvers and pattern finders. For naturally curious people, finding patterns from a big data set is not a chore
but an exciting puzzle to be solved. The main motivation for the job comes from the task itself; everything else is just an added bonus. The best
analysts I’ve met are very passionate about their profession, and in my opinion you cannot have passion for data analysis if you are not a
naturally curious person.
• Critical thinking
• A critical mind not only helps you be objective in your analyses, it also makes you aware of your own biases and limitations. Anyone who has
read Daniel Kahneman’s book “Thinking, Fast and Slow” will know that the human mind is inherently biased, and that it requires a lot of
cognitive effort to think statistically. Being aware of these biases helps tremendously in your day-to-day work as an analyst.
• Understanding your data
• In order to understand your data, you need to be competent in several fields. First and foremost, you need to be good at maths – especially
statistics. Saying that you need to be good with numbers might be a bit too obvious, I know, but the importance of statistics cannot be
emphasized enough. People who are “good with numbers” are not just skilled mathematicians, they are also able to apply their maths skills to
various kinds of business problems.
• Having a robust knowledge of statistical concepts, e.g. sample size, variance and significance, is a fundamental requirement for any sort of
quantitative analysis, and any analyst worth their salt should know about the complicated relationship between correlation and causation.

Traits of Data analyst
• High attention to detail
• Attention to detail is a good trait to have in almost any profession, but for data analysts this is one of the main requirements. Rushing to complete a task and then delivering false results might have dire
consequences for the organization, and ultimately for the analyst. Don’t get me wrong – everyone makes mistakes every now and then – but with the kind of work an analyst does, it would generally be difficult
for someone else to spot a mistake before a business decision is made based on an analyst’s work.
• Mastering technologies and tools
◦ It’s certainly a good time to be a data analyst, with countless tech stacks and tools available for you to choose from. Be it Python or R, Adobe or Google Analytics, Tableau
or Power BI – you have a world of choice and you don’t have to master every possible tech and tool.
◦ What matters is that you fully master the tools and technologies at your disposal, and that you keep an eye out for their latest developments. A good analyst also won’t
pigeonhole themselves into too restrictive a tech stack, and is able to (or even eager to) learn new data gathering and analysis methods if need be – more on that in the
final trait.
• Ability to explain your results in simple terms
◦ No matter how clever you are or how elegant your analysis methods might be, you have to be able to communicate your results to your stakeholders to be seen and
appreciated as a top analyst. An analyst, preferably a team of them, can have an enormous impact on any business, but only if the insights they produce are understood by
the decision makers.
◦ When communicating your results, you should stick to the bottom line; make sure you adjust your terminology to your audience’s level of knowledge and emphasize the
business implications of your findings. The ability to visualize data is a very valuable trait as it is often the easiest and most effective way of communicating the results of a
complex data set.
• Continuous learning
◦ The final trait listed here ties most of the other traits together, and it is the desire, or perhaps even the need, to continuously improve at your job; to understand that even
a whole lifetime will never be enough to learn everything there is to know about data analysis. The innate natural curiosity listed in the first trait should be more than
enough to keep a great analyst yearning to learn more, fine-tune their methods, and discover new tools and technologies.
◦ The best way to keep learning is to maintain an active exchange with other like-minded people, share ideas and learn from your experienced peers. This is also the main
reason for this website, so please be sure to leave comments or questions below if you have any!

Traits of meaningful Data
The process of data analysis to be fruitful, the data ought to have certain attributes; low quality data will give us
scarce insight. The higher the quality of the data, the more noteworthy the potential for revelation.
1. High Volume
The more information that is available to us, the more likely it is that we will have what we need while pursuing
specific questions or just searching for patterns that are important.
2. Historical
When choosing data, much insight is gained from examining how information has changed through time. The
more historical information that is available, the more we can make sense of the present by seeing the evolving
pattern. Even when we focus on what is going on right now with the data, to know its background story help the
analyst gain more insight from the data.
3. Consistent
Things change over time, and when they do, data also changes with the situation. A good example of this is ever
changing data of stock market. If data such as revenue have not been adjusted to reflect these changes, an
examination of data will be complicated and incomprehensible. It is usually best to constrained the data which
reflects the purpose of the problem definition.

Traits of meaningful Data..
4. Multivariate
We can examine two types of data- quantitative and qualitative variables. Variables is an aspect of something that changes (meaning vary).
Variables are two types- quantitative — expressed as numbers and categorical — expressed as words. When trying to figure the answer for our
proposed question, we need to expand the number of variable we are examining. The more variable we have as a data, the richer our opportunity
to make sense of data.
5. Atomic
Most of the study includes information that has been aggregated at a far more summerized or generalized level. At times, however, we need
information at the finest level of detail possible. For eg, if we are a text analyst, we spend much of our time analyzing the data, translating
sentences to number formats, but we forget about the emotional component of sentences. Therefore, atomic means to specify the data down to
the lowest level, so that we understand what's going on.
6. Clean
The quality of our research can never be higher than the quality of our data. We cannot draw a reliable conclusion that depends on unformatted,
dirty data. Successful business decisions cannot be made with inaccurate, incomplete or misleading data. People need data that they can trust to be
reliable and clean so that business goals and objectives can be further explored.
7. Dimensionally structured
I have tried to understand the data that is expressed in unfamiliar dimensions. It is frustrating, discouraging, and sometime waste of time, if data is
not well structured. Human senses are constrained to view the world in three dimensional perspective, the tools and graph which we have
implemented so far is only comprehensible when it is presented in three dimensional structure. When data is structured, it is easier to understand
or make the software understand it.

Visual perception
What is visual perception?
Wikipedia defines Visual perception as the ability to interpret the surrounding environment by processing information that is contained
in visible light. The resulting perception is also known as eyesight, sight, or vision.
How does visual perception affect data visualization?
The main purpose of data visualization is to aid in good decision making. To make good decisions, we need to be able to understand
trends, patterns, and relationships from a visual. This is also known as drawing insights from data. Now here is the tricky part, we don’t
see images with our eyes; we see them with our brains. The experience of visual perception is in fact what goes on inside our brains
when we see a visual.
Let’s understand a little bit more about visual perception. There are 3 key points to note:
Visual perception is selective. As you can imagine, if we tune our awareness to everything, we will be very soon overwhelmed. So we
selectively pay attention to things that catch our attention.
Our eyes are drawn to familiar patterns. We see what we expect to see. Hence visualization must take into account what people know
and expect.
Our working memory is very limited. We will go in depth about memory in a bit, but just understand that we can hold a very limited
amount of information in our memory when looking at a visual.
Data visualization is in many some ways an external aid to support our working memory.

Visual perception ..
The power of data visualization
Remember how some visuals give you an “Aha moment” instantly? These visuals correspond
naturally to the workings of visual perception and cognition. What does that mean? Ok, let’s
break this down.
Visual perception is the act of seeing a visual or an image. This is handled by visual cortex
located at the rear of the brain. The visual cortex is extremely fast and efficient.
Cognition is the act of thinking, of processing information, making comparisons and examining
relationships. This is handled by the cerebral cortex located at the front of the brain. The
cerebral cortex is much slower and less efficient.
Here is where the magic happens. Data visualization shifts the balance between perception
and cognition to use our brain’s capabilities to its advantage. This means more use of visual
perception and lesser use of cognition.

Building blocks of information
visualization

Analytical Interaction
• The effectiveness of information visualization hinges on two things: its ability to clearly and
accurately represent information and our ability to interact with it to figure out what the
information means.
• Several ways of interacting with data are especially useful
• Comparing • Sorting • Adding variables • Filtering • Highlighting • Aggregating • Re-
expressing • Re-visualizing • Zooming and panning • Re-scaling • Accessing details on
demand • Annotating • Bookmarking

Comparing
• No interaction is more frequent, useful, and central to
the analytical process than comparing values and
patterns
• Comparison is the beating heart of data analysis.
• In fact, what we do when we compare data really
encompasses both comparing (looking for similarities)
and contrasting (looking for differences).
• Comparing magnitudes- for example, this is greater or
less than that and by what amount- is a fundamental
version of this activity.
• The following graph supports this activity, making it
easy to compare the performance of salespeople to
one another.
0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000
R. Marsh
G. Freeman
J. Gutierrez
M. Bogue
S. Jackson
R. Kipling
M. Chris
M. Elston
D. Johnson
C. Moore
B. Knox
Sales by salesperson

Comparing…
A few typical magnitude comparisons are:
Type- Nominal
Description- Comparing values that have no particular order
Type-Ranking
Description -Comparing values that are arranged by magnitude,
from low to high or high to low
Type-Part-to-Whole
Description - Comparing values that when combined make up parts
of a whole
0
100
200
300
North South East West
Sales in Region
no of employees Sales in region
0
50
100
150
200
250
Employees Per Department
no of employees Departments

Sorting
• Don't underestimate the power of a simple sort.
• It's amazing how much more meaning surfaces when values are sorted from low to high or high to low
• Take a look at the following graphs, which displays employee compensation per state:
• With the states in alphabetical order, the only thing we can do with ease is look up employee
compensation for a particular state. It is difficult to see any meaningful relationships among the values.
Now take a look at the same data, this time sorted from the state with the highest employee
compensation to the one with the lowest.

Adding Variables
• We don't always know in advance every element
of a data set that we'll need during the process
of .analyzing it. This is natural.
• Data analysis involves looking for interesting
attributes and examining them in various ways,
which always leads to questions that we didn't
think to ask when we first began.
• This is how the process works because this is
how thinking works. We might be examining
sales revenues per product when we begin to
wonder how profits relate to what we're seeing.
• We might at that point want to shift between a
graph such as the one below on the left, to a
richer graph such as the one on the right, which
adds the profit variable.

Filtering
• Filtering is the act of reducing the data that we're viewing to a subset of what's currently there.
• From a database perspective, this involves removing particular data records from view. This is usually
done by selecting particular items within a categorical variable
• The purpose of filtering is simple: to get any information we don't need at the moment out of the way
because it is distracting us from the task at hand

Highlighting
• Sometimes, rather than filtering out data we aren't
interested in at the moment, we want to cause
particular data to stand out without causing all
other data to go away.
• Highlighting makes it possible to focus on a subset
of data while still seeing it in context of the whole.
• In the following example, I have highlighted data
points in red belonging to customers in their 20s
who purchased products, without throwing out the
other age groups. In this particular case,
highlighting rather than the filtering allows us to
see the relationship between the total number of
purchases (along the X-axis) and the amount spent
on groceries (along theY-axis) by people in their
20s, in fairly good isolation from other age groups
while still being able to see how their shopping
habits compare to those of customers overall.

Aggregating
 When we aggregate or disaggregate information, we are not changing the amount of information but rather the level of detail
at which we're viewing it.
 We aggregate data to view it at a higher level of summarization or generalization; we disaggregate to view it at a lower level
of detail.
 Consider the process of sales analysis. At its lowest level of detail, sales usually consist of line items on an order. A single
order at a grocery store might consist of one wedge of pecorino cheese, three jars of the same pasta sauce, and two boxes of
the same pasta.
 If we're analyzing sales that occurred during a particular month, most of our effort would not require knowing how many jars
of the same pasta sauce were sold; we would look at the data at much higher levels than order line items. At times we might
examine sales by region. At others, we might shift to sales by large groupings of products such as all pasta, grains, and rice
products.
 Any time that a particular item looks interesting, however, we might dive down to a lower level of detail, perhaps sales per
day, per individual product, or even per individual shopper. Moving up and down through various levels of generality and
specificity is part and parcel of the analytical process.

Re-expressing
 Sometimes quantitative values can be expressed in multiple ways, and each expression can lead to different insights.
 By the term re-expressing, I mean that we sometimes change the way we delineate quantitative values that we're
examining.
 The most common example involves changing the unit of measure, from some natural unit, such as U.S. dollars for
sales revenues, to another, such as percentages. Examining each product type's percentage of total sales might lead
to insights that did not come to light when we were viewing the same values expressed as dollars.

Re-visualizing
 This activity pertains only to visual forms of analysis. It involves changing
the visual representation in some fundamental way, such as switching
from one type of graph to another. Being able to do this quickly and easily
is essential
 No single way of visualizing data can serve every analytical need. Different
types of visualization have different strengths. If we don't have the ability
to switch from one to another as fast as we recognize the need, our data
analysis will be fragmented and slow, and we will probably end the process
prematurely in frustration, missing the full range of possible insights
standing in the wings.
 Imagine that we're comparing actual expenses to the expense budget for a
year's worth of data using a bar graph. Bars nicely support magnitude
comparisons of individual values, such as actual expenses to budgeted
expenses.
 Before long, however, we want to see how the variation between actual
and budgeted expenses changed through the year, which will be much
easier if we switch from a bar to a line graph with a single line that
expresses the difference between actual and budgeted expenses

Zooming and Panning
 When exploring and analyzing data visually, we sometimes
want to take a closer look at a specific section of a graph. We
can accomplish this by zooming in on the contents of a
visualization, which enlarges the portion of the display that
we wish to see more closely.
 If we become particularly interested in what's happening
during the specific period from February 14 through 20 while
viewing the first graph below, we might want to zoom in on
that portion, resulting in the bottom graph below.

Analytical Navigation
• The visual analysis process involves many steps and
potential paths to get us from where we begin-in the
dark- to where we wish to be-in the light (enlightened).
Some methods of navigating through data are more
effective than others.
• There is no one correct way to navigate through
information analytically, but some navigational
strategies are helpful general guidelines within which
we can learn to improvise as our expertise grows.
• Directed vs. Exploratory Navigation
• At a fundamental level, analytical navigation can be
divided into two approaches: directed or exploratory.
Directed analysis begins with a specific question that we
hope to answer, searches for an answer to that question
(perhaps a particular pattern), and then produces an
answer.
• With exploratory analysis, however, we begin by simply
looking at data without predetermining what we might
find; then, when we notice something that seems
interesting and ask a question about it, we proceed in a
directed fashion to find an answer to that question.

Hierarchical Navigation
• It's frequently useful to navigate through
information from a high-level view into
progressively lower levels along a defined
hierarchical structure and back up again.
• A typical example involves sales analysis by
region along a defined geographical hierarchy,
such as continents at the highest level, then
countries, followed by states or provinces, and
perhaps down to cities at the lowest level.

Optimal Quantitative Scales
•Quantitative scaling refers to the process of assigning numerical values to data in order to represent
the magnitude or intensity of a particular variable
•It allows for the comparison of values across different data sets, and the ability to identify patterns in
data.
•By scaling data, we can make sense of large data sets and draw meaningful insights from them
•Quantitative scaling can be used to create graphs, charts, and maps that are easy to interpret and
visually appealing.
•A good starting point for choosing a scale is to use a range that includes the minimum and maximum
values of the data.
•The scale that's optimal depends on the nature of the information, the kind of graph we're using to
view it, and what we're trying to discover and understand.
•To choosing an appropriate scale,the importance of labeling axes clearly and accurately, and using
appropriate units of measurement.

Optimal Quantitative Scales..
The scale that's optimal depends on the nature of the information, the kind of graph we're using to view it, and what we're trying to discover and understand.
The basic rules of thumb are simple:
• When using a bar graph, begin the scale at zero, and end the scale a little above the highest value.
• With every type of graph other than a bar graph, begin the scale a little below the lowest value and end it a little above the highest
value.
• Begin and end the scale at round numbers, and make the intervals round numbers as well.
The following graph presents a visual lie because the heights of the bars cannot be compared to accurately determine differences in
value.

Optimal Quantitative Scales..
Information visualization software should support optimal quantitative scales in the following ways:
● Make it difficult or impossible to remove zero from the quantitative scale of a bar graph.
● Automatically set the quantitative scale for dot plots, line graphs,scatterplots, and box plots to begin a little below the lowest value
in the data set and end a little above the highest value, based on round numbers.
● Provide a means to easily adjust the quantitative scale as needed
● Colors can be used to highlight specific data points or patterns, and can also be used to create contrast between different types of
data.
● Labels can help to provide context and explanation for the data, making it easier for the audience to understand and interpret.

Reference lines and regions
 Comparisons are intimately interwoven into the analytical process. Therefore, anything that makes comparisons easier, such as including reference lines and
reference regions, is worth doing. Imagine that we want to see how well our manufacturing process has been going relative to standards that have been
established for an acceptable number of defects.
Let's say that it's unacceptable when defects exceed 1% of the products manufactured on any given day. We could look at manufacturing quality for the current
month using the following graph:
We could accomplish our task with the graph above, but notice how much easier and faster we could do this using the next graph, which has a reference line
indicating the threshold for an acceptable percentage of defects:
With the addition of the reference line, the days when the number of defects ventured into the unacceptable range pop out, and it's much easier to see the
degree to which we exceeded the limits on those occasions. All we've done to create this clarity is mark the threshold with a reference line.

Reference lines and regions …
Information visualization software should support reference lines and regions in the following
ways:
• Provide a means to include reference lines in graphs based on setting a specific value (for
example, $10,000), an ad hoc calculation (for example, 1% of the number of manufactured
products), or a statistical calculation.
• Provide automated calculations for the following statistical calculations: mean, median,
standard deviation, specified percentiles, minimum, and maximum.
• Provide a means to base calculated reference lines either on the values that appear in the
graph only or on a larger set of values. (I'll explain this with an example in a moment.)
• Provide a means to label reference lines to clearly indicate what the lines represent.
• Provide a means to format reference lines as needed, including a choice of hue, color intensity,
line weight, and line style (solid, dashed, etc.).

Reference lines and regions …
In this next example, the value marked by the reference lines is the same in each graph; it represents
average sales revenues for products in all regions, not just the region represented in the particular graph

Trellises and Crosstabs
•By splitting the data into multiple graphs that appear on the screen at the same time in close
proximity to one another, we can examine the data in any one graph more easily, and we can
compare values and patterns among graphs with relative ease
•Trellis displays should exhibit the following characteristics:
• Individual graphs only differ in terms of the data that they display. Each graph displays a subset of a
single larger set of data, divided according to some categorical variable, such as by region or
department.
• Every graph is the same type, shape, and size, and shares the same categorical and quantitative
scales. Quantitative scales in each graph begin and end with the same values (otherwise values in
different graphs cannot be accurately compared).
• Graphs can be arranged horizontally (side by side), vertically (one above another), or both (as a
matrix of columns and rows).
• Graphs are sequenced in a meaningful order, usually based on the values that are featured in the
graphs (for e?Cample, sales revenues).

Trellises and Crosstabs..
• How we arrange the graphs-horizontally, vertically, or as a matrix-depends on the number of graphs that we're trying to squeeze into the space available on the
screen as well as the types of comparisons that we're making among the graphs. In the following example, notice that it's easy to track a specific region, such as
the east, through all the graphs because the bars that represent that region are aligned with one another across the page.
• We can easily isolate the east region as we take in the full set of graphs. But if we want to accurately compare the magnitudes of the four bars that encode the
east region's sales values, we could do that more easily using graphs arranged as illustrated below where quantitative scales are aligned with one another down
the page:

When we can't display all the graphs in either a horizontal or vertical arrangement, we can shift to a matrix. Here are 15
graphs, one per department, that we can use to compare departmental expenses:

Information visualization software should support trellis and visual crosstab displays in the following ways:
• Provide a means to automatically arrange data as a trellis display simply by indicating the categorical variable on
which the individual graphs should be based, the arrangement of the graphs (horizontally, vertically, or as a
matrix), and the sorted order of the graphs (for example, by sales amounts in descending order).
• Automatically set the quantitative scales that appear in all graphs in a trellis display to be consistent with one
another.
• Provide a means to automatically arrange data as a visual crosstab display by indicating one or more categorical
variables on which columns of graphs should be based, one or more categorical variables on which rows of graphs
should be based, and one or more quantitative variables that should be displayed in the graphs.
• Provide a means to display more than one quantitative variable in separate columns or rows of a visual crosstab
display.
• Automatically set the quantitative scales that appear in all graphs in a visual crosstab display to be consistent
with one another except when multiple quantitative variables are included in separate columns or rows of graphs,
in which case each quantitative variable should be scaled independently.

Multiple Concurrent Views and Brushing
Multiple concurrent views refer to the ability to view and work with different perspectives or aspects of a system, process, or problem
simultaneously. This can be achieved through various means, such as using multiple screens, split-screen modes, or virtual desktops.
In general, having multiple concurrent views can help individuals better understand and manage complex systems or problems by
providing them with a more comprehensive and holistic view. However, it is essential to ensure that the different views are coherent and
aligned with each other to avoid confusion or errors.
Information visualization software should support multiple concurrent views in the following ways:
• Provide a means to easily create, co-locate, and tightly couple multiple tables and graphs based on a shared set of data on a single
screen.
• Provide a means to easily place particular tables and graphs anywhere and to size them as necessary.
• Provide a means to easily filter all tightly-coupled tables and graphs together by a single action.
• Provide a means to directly brush (that is, select) any subset of data in a single view (table or graph) to highlight it and have that same
subset of data automatically highlighted in all the other tables and graphs that are tightly coupled to that view.
• When a subset of data is brushed in a particular table or graph, and that subset is associated with a bar or box in a graph, highlight only
that portion of the bar of box that is made up of that subset.
Here's a display that combines several different views of the same data set on a single screen.

Focus and Context Together
Focus and context are two important concepts in information visualization and design that are often used together to provide users with a
better understanding of complex data sets or information.
Focus refers to the part of the data or information that a user is currently interested in or working with. This can be a specific data point, a
chart, a table, or any other component of the visualization that the user needs to focus on to accomplish their task.
Information visualization software should support concurrent focus and context views in the following ways:
• Provide a means, while viewing a subset of a larger set of data, to simultaneously see a visual representation of the whole that
indicates the subset of data as part of the whole.
• Provide a way for the larger context view to be removed to free up screen space when it isn't needed.
One common example of focus and context together is in the design of maps. Maps provide users with a visual representation of
geographic locations, and the focus and context design approach helps users to navigate and explore these locations.
In a map, the focus may be on a particular location or area, such as a city or neighbourhood, while the context provides a broader view of
the surrounding area. For example, in a digital map, the user may zoom in on a specific street, which becomes the focus, while the
surrounding streets, neighbourhood s, and landmarks provide the context.
Another example of focus and context together is in data visualization, particularly in time-series data. In a line chart or graph, the focus
may be on a specific time period, such as a week or a month, while the context provides a broader view of the overall trend.

Details on Demand
Details on demand is a design approach in information visualization and user interface design that allows users to
access additional information or details about a particular item or element on demand. This means that users can
choose to see more information about a specific data point or item by interacting with the visualization or user
interface, rather than being overloaded with information all at once.
Details on demand can be implemented in various ways, such as through tooltips, pop-ups, or expandable
sections. When a user hovers over or clicks on an item, additional information or details about that item can be
displayed in a small window or pop-up.
This approach has several benefits, including:
Reducing information overload: By providing additional details on demand, users can focus on the essential
information and only see additional details when they need them.
Improving user experience: Details on demand can make a user interface more interactive and engaging, allowing
users to explore data or information in a more flexible and intuitive way.
Saving screen space: Rather than displaying all information at once, details on demand can save screen space and
keep the user interface clean and organized.

Over-Plotting Reduction
•In some graphs, especially those that use data points or lines to encode data, multiple objects can end up sharing the
same space, positioned on top of one another. This makes it difficult or impossible to see the individual values, which in
turn makes analysis of the data difficult. This problem is called over-plotting.
•When it gets in the way, we need some way to eliminate or at least reduce the problem. The information visualization
research community has worked hard to come up with methods to do this.
• We'll take a look at the following seven methods:
• Reducing the size of data objects
• Removing fill color from data objects
• Changing the shape of data objects
• Jittering data objects
• Making data objects transparent
• Encoding the density of values
• Reducing the number of values

Over-Plotting Reduction..
Reducing the Size of Data Objects
Consider the following scatterplot filled with data. Notice that, in some areas,- multiple data points fight for the same location and as a
result sit on top of one another.
Sometimes the problem can be adequately resolved simply by reducing the size of the objects that encode the data, in this case the dots.
Here is the same data set with the size of the dots reduced:

Removing Fill Color from Data Objects
Another simple method to reduce over-plotting involves removing the fill color from the objects that encode the
data, which allows us to see better how the objects overlap. In this next example, the dots are slightly enlarged
and the fill color is removed. The color is also changed so that it that stands out more clearly against the white
background even when the amount of color in each dot has been reduced.
Changing the Shape of Data Objects
Another simple way to address over-plotting is to change the shape of the data objects. We can change shapes
from the circles above, which function like containers with an interior and require a fair amount of space, to
shapes that are not container-like, such as plus signs or X's.
jittering Data Objects
One of the best ways to reduce over-plotting when multiple data objects have the same exact values is to change
something about the data rather than changing the appearance of the object. Jittering slightly alters the actual
values so they are no longer precisely the same, moving them to slightly different positions. In the scatterplot
below, we can now see more detail than we could in the previous versions above

Making Data Objects Transparent
A newer method, which in many cases works even better than jittering and does not entail
altering the data values or changing the shape of the data objects, makes the objects partially
transparent. The proper degree of transparency allows us to see through the objects to perceive
differences in the amount of over-plotting as variations in color intensity. The following
scatterplot allows us to easily detect differences between the dense center of the cluster, which
is intensely blue, and surrounding areas of progressively less concentration (less intensely blue).
Using a slider control to vary the degree of transparency, we can quickly and easily adjust a
display that suffers from over-plotting to reveal nuance in the midst of clutter.
Encoding the Density of Values
Another approach encodes the density of the overlapping data points located in each region of
the graph. Consider the scatterplot on the next page, which suffers from a great deal of over-
plotting, prior to any attempt to reduce it.

Reducing the Number of Values
The remaining methods for reducing over-plotting don't involve changes to the objects that encode the data or to
the values; they involve reductions in the number of values that are displayed. The four most useful methods of
this type are:
• Aggregating the data. This can be done when we really don't need to view the data at its current level of detail
and can accomplish our analytical objectives by viewing the data at a more general or summarized level.
• Filtering the data. This is a simple solution that can be used to remove unnecessary values in the graph if there
are any.
• Breaking the data up into a series of separate graphs. When we cannot aggregate or filter the data any further
without losing important information, we can sometimes reduce over-plotting by breaking the data into
individual graphs in a trellis or visual crosstab display.
• Statistically sampling the data. This technique involves reducing the total data set using statistical sampling
techniques to produce a subset that represents the whole. This is a promising method for the reduction of over-
plotting, but it is relatively new and still under development. If it is successfully refined, it could become a useful
standard feature of visual analysis software in the near future.

PATTERN ANALYSIS AND PATTERN
EXAMPLES
Pattern analysis:
Pattern analysis in data visualization involves identifying and understanding the patterns that exist within the data. This can be done by
using various data visualization techniques to visually represent the data in a way that makes it easier to identify and analyze patterns.
Techniques used:
1. Time-series analysis: This technique involves analyzing patterns in data over time. Line charts are often used to display time-series data,
allowing users to easily identify trends, seasonal patterns, and other patterns that emerge over time.
2. Clustering analysis: Clustering is a technique used to group similar data points together based on their characteristics. Cluster analysis
can be used to identify patterns and relationships between different data points.
3. Correlation analysis: Correlation analysis involves measuring the relationship between two or more variables. Scatter plots are
commonly used to display correlation data, making it easier to identify patterns and relationships between variables.
4. Frequency analysis: Frequency analysis involves analyzing how often certain events or values occur within a data set. Histograms and
bar charts are often used to display frequency data, allowing users to quickly identify patterns in the distribution of values.
5. Geographic analysis: Geographic analysis involves visualizing data on a map. This can be useful for identifying geographic patterns, such
as regional variations in data or spatial relationships between different data points.
6. Network analysis: Network analysis involves visualizing relationships between different entities within a network. This can be useful for
identifying patterns in social networks, transportation networks, or other types of networks.

PATTERN ANALYSIS AND PATTERN
EXAMPLES
By using these techniques and others, data analysts and visualization experts can identify and analyze patterns within the data, making it
easier to draw insights and make informed decisions based on the data.
Steps: There are several steps involved in pattern analysis:
1. Data Preparation: The first step in pattern analysis involves
preparing the data. This includes cleaning and transforming the data to ensure that it is in a format that can be easily analyzed.
2. Exploratory Data Analysis: The next step involves exploring the data to identify any patterns or trends that may exist. This can be done
using visualizations such as scatter plots, histograms, or box plots.
3. Pattern Identification: Once the data has been explored, the next step is to identify any patterns that have been observed. This involves
looking for trends, seasonality, outliers, clusters, and correlations within the data.
4. Pattern Interpretation: After patterns have been identified, the next step is to interpret them. This involves understanding the
underlying causes of the patterns and what they may mean in terms of the data being analyzed.
5. Pattern Communication: Finally, the results of the pattern analysis need to be communicated to stakeholders. This can be done through
data visualizations, reports, or presentations. Pattern analysis can be used in many different fields, including finance, marketing,
healthcare, and manufacturing. By identifying patterns within a dataset, organizations can gain valuable insights that can inform decision-
making and drive business success.

Unit 2.pptx

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Unit 2.pptx

Ähnlich wie Unit 2.pptx (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Unit 2.pptx