1. A DATA SCIENCE COMPANY
We handle terabyte-size data via non-traditional analytics and visualise it in real-time.
Gramener visualises
your data
Gramener transforms your data into concise dashboards
that make your business problem & solution visually obvious.
We help you find insights quickly, based on cognitive research,
and our visualisations guide you towards actionable decisions.
2. S ANAND, GRAMENER
HOW YOU CAN GET
INSIGHTS FROM DATA
OVERCOMING COMMON OBJECTIONS ON READINESS
13. Portfolio Performance Visual
Worldwide$288.0mn
A: Accelerate$68.9mn
B: Build$77.2mn
C: Cut down$141.9mn
Worldwide:
$288 mn
The visualization shows the market
opportunities across various countries to
identify areas of focus. This chart has
been built as an interactive-app to
present the key findings, while letting
user click-through and drill-down to a
custom view across 4 different levels.
16. Billing fraud at an energy utility
This plot shows the frequency of all meter readings from Apr-
2010 to Mar-2011. An unusually large number of readings are
aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels.
Each bar represents the number of customers with a customers with a
specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in
full at a higher tariff than someone with 100 units. So people have a
strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million
subscribers) had 10 years worth of
customer billing data available.
Most fraud detection software failed to
load the data, and sampled data
revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their
usage very carefully, and turn of their
lights and fans the instant their usage
hits the slab boundary.
Or, more realistically, there’s probably some level of corruption
involved, where customers pay a small sum to the meter reading staff
to ensure that it stays exactly at the slab boundary, giving them the
advantage of a lower price.
17. This is a dataset (1975 – 1990) that has
been around for several years, and has
been studied extensively. Yet, a
visualization can reveal patterns that
are neither obvious nor well known.
For example,
• Are birthdays uniformly distributed?
• Do doctors or parents exercise the C-section option to move dates?
• Is there any day of the month that has unusually high or low births?
• Are there any months with relatively high or low births?
More births Fewer births … on average, for each day of the year (from 1975 to 1990)
LET’S LOOK AT 15 YEARS OF US BIRTH DATA
18. THE PATTERN IN INDIA IS QUITE DIFFERENT
This is a birth date dataset that’s
obtained from school admission data
for over 10 million children. When we
compare this with births in the US, we
see none of the same patterns.
For example,
• Is there an aversion to the 13th or is there a local cultural nuance?
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
19. THIS ADVERSELY IMPACTS CHILDREN’S MARKS
It’s a well established fact that older
children tend to do better at school in
most activities. Since many children
have had their birth dates brought
forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the
month tend to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
24. We have internal
information. Getting
information from outside is
our challenge. There’s no way
of doing that.
– Senior Editor
Leading Media Company
“
28. AUGMENT YOUR
DATA
SOURCES
DATA IS
EVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE DATA
COMMON COMPLAINT #2
THE DATA ISN’T STRUCTURED
CRM DATA
SALES DATA
PRICING DATA
CALL RECORDS
WEB LOG DATA
VENDOR INVOICES
SOCIAL MEDIA DATA
CLICKTHROUGH DATA
COMPETITOR RESEARCH
CUSTOMER TRANSACTIONS
…
CENSUS DATA
E-COMMERCE PRICES
COMMODITY PRICES
STOCK MARKET DATA
FINANCIAL REPORTING
SOCIAL MEDIA DATA
MOBILE PENETRATION
AADHAR DATA
COURT CASE BRIEFS
SHAPE FILES
…
29. Recruiting top quality developers is always a problem. We decided to use an
algorithmic approach and pulled out the social network of developers on
Github (a social network for open source code).
In this visualisation, each circle is a person. The size of the circle
represents the number of followers. Larger circles have more
followers (but not in proportion – it’s a log scale.)
The circle’s colour represents the city the
programmer’s live in. This visual is a slice showing the
tale of two cities: Bangalore and Singapore
Two people are connected if one
follows the other. This leads to a
clustering of people in the form of a
network.
Here, you can see that Bangalore and
Singapore are reasonably well
connected cities. Bangalore has more
developers, but Singapore has more
popular ones (larger circles).
However, the interaction between
Bangalore and Singapore are few and
far between. But for a few people
across both cities, like:
… etc.
Sudar, Yahoo!
Anand C, Consultant
Kiran, Hasgeek
Anand S, Gramener
Mugunth, Steinlogic
Honcheng, buUuk
Sau Sheong, HP Labs
Lim Chee Aung
Bangalore
Singapore
1 follower
100 followers
A follows B (or)
B follows A
Most followed in
Bangalore
Most followed in
Singapore
Ciju Cherian
Lin Junjie
Amudhi Sebastian
There are, of course, a number of smaller
independent circles – people who are not connected
to others in the same city. (They may be connected to
people in other cities.)
Apart from this, there are a few small networks of
connected people – often people within the same
company or start-up – who form a community of their
own.
THE SOCIAL TALE OF TWO CITIES: BANGALORE & SINGAPORE
30. Tata Teleservices
Tata Consultancy Services
Tata Business Support Services
Tata Global Beverages
Tata Infotech (merged)
Tata Toyo Radiator
Honeywell Automation India
Tata Communications
A G C Networks
Tata Technologies
Tata Projects
Tata Power
Tata Finance
Idea Cellular
Tata Motors
Tata Sons
Tata Steel
Tayo Rolls
Tata Securities
Tata Coffee
Tata Investment Corp
A J Engineer
H H Malgham
H K Sethna
Keshub Mahindra
Ravi Kant
Russi Mody
Sujit Gupta
A S Bam
Amal Ganguli
D B Engineer
D N Ghosh
M N Bhagwat
N N Kampani
U M Rao
B Muthuraman
Ishaat Hussain
J J Irani
N A Palkhivala
N A Soonawala
R Gopalakrishnan
Ratan Tata
S Ramadorai
S Ramakrishnan
DIRECTORSHIPS AT THE TATAS
Every person who was a Director at the Tata Group
is shown here as an orange circle. The size of the circle
is based on the number of directorship positions held
over their lifetime.
Every company in the Tata Group is shown
here as a blue circle. The size of the circle is
based on the number of directors the
company has had over time.
Every directorship relation is shown by a
line. If a person has held a directorship
position at a company, the two are
connected by a line.
The group appears to be divided into
two clusters based on the network of
directorship roles.
Prominent leaders
bridge the groups
Second group of companies
First group of companies
Some directors are
mainly associated with
the first group of
companies
Some directors are
mainly associated with
the second group of
companies
We’ve used network diagrams to detect terrorism, corporate fraud, product
affinities and behavioural customer segmentation
32. How does Mahabharata, one of the largest epics with 1.8
million words lend itself to text analytics?
Can this ‘unstructured data’ be processed to extract
analytical insights?
What does sentiment analysis of this tome convey?
Is there a better way to explore relations between
characters?
How can closeness of characters be analysed & visualized?
VISUALISING THE
MAHABHARATA
33. DATA IS
EVERYWHERE
EXTRACT THE
META DATA
AUGMENT YOUR
DATA
SOURCES
COMMON COMPLAINT #2
THE DATA ISN’T STRUCTURED
COMMON
WHO, WHAT, WHEN, WHERE
TEXT
TEXT KEYWORDS
SENTIMENT
IMAGE
VISUAL RECOGNITION
AUDIO / CALLS
TRANSCRIPTS
MOOD ANALYSIS
34. THE CAPABILITIES ARE
IN YOUR REACH TODAY
EXPLORE THE ART OF DATA
S ANAND s.anand@gramener.com
CEO, GRAMENER 9741552552
Hinweis der Redaktion
Gramener is a data analytics and visualisation company. We handle large-scale data via non-traditional analytics (by which we mean programmatic analysis) and visualize the results in real-time.
The visualizations are our key differentiator.
We transform your data into concise dashboards that make it easy for you to find the problems as well as the solution.
We help you find these insights quickly, based on our work in cognitive research, and our visualizations guide you towards actionable decisions.
In other words, we make enterprise data consumption very easy.
https://flic.kr/p/aCqg7w
We said, let’s take all of the players who’ve ever played one day internationals. Each box is one player. The size of the rectangle is proportional to the number of runs they’ve scored.
So you can see that Tendulkar has scored the most runs, followed by Ganguly, then Dravid, and so on. The size of the entire visual is representative of the total runs ever scored by India in one-day internationals.
Colour is based on the strike rate. The greener the rectangle, the faster the score. You can see that Sehwag has done a fairly good job. So has Yusuf Pathan, one of the smaller green boxes. But given that that box is just about 1/10th the size of Sehwag’s, you could say that Yusuf Pathan has a long way to go before he can be considered on par.
You’ll also find that many of the players who have a lower run rate – like Ravi Shastri, Dilip Vengsarkar, Sunil Gavaskar, Mohinder Amarnath, etc. – were playing in a different era, a time when a score of 200 was considered a rather good score. Today, 300 would be a respectable target.
It turns out that strike rate increases at around 3.5% every decade. If we adjust for that and re-plot the strike rates, it emerges that Kapil Dev’s adjusted strike rate is almost exactly the same as Sewhag’s, and between them, we have the two fastest players India has had.
So, what we did was put a variant of this visual together. On the right, you have a series of currencies like the Australian dollar, the Euro, the British pound, etc; some commodities like silver and gold; and some stock indices like Sensex, FTSE, and S&P.
The cells here have a number inside that indicates the pairwise correlation between a pair of securities. For example, the number 68 on the top left indicates a 68% correlation between the Australian dollar and the Euro. To the left of the Euro and just below the dollar (diagonally opposite to the 68), there’s a scatter plot that shows the daily prices of both these currencies. Each dot is one day’s data. The x-axis shows the Australian dollar value. The y-axis shows the Euro value. This helps identify what the pattern of movements of any two currencies is. From this, you can easily see visually that the Australian dollar and the Euro both tend to move together. Or, where there are strong correlations like the FTSE & S&P, the pattern is almost a straight line.
In some cases there are negative correlations. For instance, if you take the Sensex against the Japanese Yen, the correlation is -79%. The cells are coloured based on their correlation values. Greens indicate strong positive correlation. Reds indicate strong negative correlation.
These are also grouped hierarchically. On the left, we have a series of lines indicating clusters. The most similar securities are grouped together. So FTSE and S&P with a 98% correlation are very close. The ones that are less correlated are kept further away based on a tree-structure.
This leads to clustering of securities. For example, there is a green block in the center which has SGD, JPY, XAU, CHF and CNY. All of these are fairly well correlated. When any one currency in this block goes up, all the others go up as well. When any one goes down, all others go down as well.
Similarly, you have another block to its top left: S&P, FTSE, Sensex and to a certain extent, the Pakistani Rupee. These move together as a block as well.
But when this block goes up, all the currencies in the other block go down, as indicated by the red negative correlations between these two blocks.
This can be used very easily for decision making. For example, one client who was trading with Singapore and Japan looked at the strong correlation and decided to consolidate their holdings in Japanese Yen. They then moved up and down this column to find a good hedge. FTSE looked like a good hedge – it was the most negatively correlated with JPY at that time -- and they decided to place a third of their portfolio in FTSE.
A sheet like this improves people’s understanding of relatively complex data, and results in significantly increased trade volumes.
We were working with a restaurant who had 7 months’ worth of sales data, and asked what we could do with this data. It was a fairly open-ended problem.
Among other things, we looked at the various product categories they sold, such as starters, breads, desserts, etc. and the pairwise correlations between each of these.
The number in each cell shows the pairwise correlation between any two products. The 17 on the top left, for example, indicates a 17% correlation between side dishes and meals. The scatter plots diagonally opposite show the correlations between these visually as well. These are colour coded based on the correlation. The redder it is, the more negative the correlation. The greener it is, the more positive the correlation.
There are a few patterns that emerge. For example: desserts are positively correlated with every product. The row and column are green right through, indicating that it doesn’t matter what people eat – they usually have desserts at the end.
Starters are an interesting category. They were introduced 4 years ago as a loss-leader, with the aim of increasing the restaurant’s menu variety and to bring in footfall. As a result, they were priced at cost. You can see from this that starters sell well with breads (rotis, naans, etc). They sell well with desserts, but then, everything sells well with desserts. But they reduce the sales of every other product!
What’s been happening is that since starters were so attractive, people were coming in, ordering starters and desserts, and leaving. As a result, this initiative had been a net loss for the profit margin, though it had not been spotted for nearly four years.