For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper
In this white paper, Impetus talks about the need for building Big Data technologies based social analytics platform for better business insight.
Using Big Data Technologies for Social Media Analytics- Impetus White Paper
1. Using Big Data technologies to
enable social media analytics
W H I T E P A P E R
Abstract
In this white paper, Impetus talks about the need for
building Big Data technologies based social analytics
platform for better business insight. The paper also
focuses on why social media analytics is important in
today’s world and how 3-D data sources—that is ,
internal, external and social data—can be utilized to build
a data warehouse based on Big Data technologies.
Impetus also shares in this white paper, its recommended
solution, and how Big Data technologies can be used to
optimize costs and handle and exponential increases in
data over time.
Impetus Technologies Inc.
www.impetus.com
2. Using Big Data technologies to enable social media analytics
2
Table of Contents
Introduction ..................................................................................................................................................3
The benefits of Social Analytics ....................................................................................................................5
Data sources that facilitate Social Media Analytics ......................................................................................6
Technical tenets of Social Media Analytics...................................................................................................6
Using Big Data technologies to enable Social Media Analytics ....................................................................8
Building a Big Data warehouse .....................................................................................................................9
A step-by-step approach to creating the Big Data EDW.............................................................................10
The Impetus solution ..................................................................................................................................11
The iLaDaP high level architecture..........................................................................................................11
Summary.....................................................................................................................................................13
3. Using Big Data technologies to enable social media analytics
3
Introduction
Social Media Analytics is a discipline that helps organizations measure, assess
and explain the performance of their social media initiatives.
There are four stages of analyzing social media data, including the following:
Step 1: collecting the data. This facilitates the compiling of reports and statistics
that are to be shared with the management or the internal and external
stakeholders.
Step 2: measuring the data. This helps in Sentiment Analysis and gauging which
products are well received in the marketplace.
Step 3: analysis. Here, data is presented in a visual and interactive manner to
the management, as well as the sales and marketing teams to provide better
insights.
Step 4: innovation. Based on the insights and analysis, there is a move towards
innovation, where organizations determine the new products and ideas they are
going to pursue, as a response to customer requirements. Innovation also helps
unearth the cross sell or up sell opportunities that were not visible before.
Social Analytics opens up a host of new opportunities and perspectives.
Category-wise analysis of customer data for instance, enables their
demographic profiling and helps determine their usage patterns. Similarly, with
Feature analysis, it is possible to figure out which forums, platforms or sources
of data are more active as compared to others.
Product Growth Analysis, which focuses on the data generated for a specific
product, helps understand the response of users to that product. There is also a
Recommendation Engine, which helps zero in on what is missing or lacking in a
product range.
4. Using Big Data technologies to enable social media analytics
4
Finally, Social Analytics enables Third Party Analysis, which is purely focused on
what the public social media platforms, such as Twitter, Facebook, MySpace,
etc. have to say about the product.
5. Using Big Data technologies to enable social media analytics
5
The benefits of Social Analytics
Social Analytics is an outcome-based approach and one which creates visible
Return on Investment (RoI).
• It helps organizations retain customers by addressing their concerns
upfront, rather than being slaves to processes. The results of the
analytics help organizations retain brand preference in a fickle
consumer world.
• It improves customer service and brings down the cost of operations.
• It enables organizations to add new customers, by understanding and
addressing their requirements
• Social Analytics helps companies keep an eye on their competition. With
easy access to social media data, it is simple to track and counter the
moves of competitors.
• It helps companies remain proactive. The turnaround time for gathering
customer feedback is reduced drastically. Moreover, the reactions of
customers and their subsequent actions can be predicted more
accurately, enabling organizations to take appropriate measures.
6. Using Big Data technologies to enable social media analytics
6
Social Media Analytics effectively converges on-site, social media and third party
data to extract useful information. Considering these factors, and the fact that it
enables enterprises to leverage the colossal data that is continuously generated
through social media interactions, Social Media Analytics should be made an
integral part of the marketing and research strategies of enterprises.
Data sources that facilitate Social Media
Analytics
Data sources include internal data, such as the purchase history of customers,
their transactions, and profiles in the enterprise database. It also encompasses
website traffic analysis, covering internal CSR logs, customer queries,
automated agent discussions, complaints and resolutions, and employee
insights.
Data sources can also be the social activities and profile updates of customers
on public social media platforms such as Twitter, Facebook, Myspace, LinkedIn,
etc.
External data sources can additionally be used, and customers analyzed by
factoring in industry sources of information and market research reports.
Technical tenets of Social Media Analytics
Here’s a look at what Social Media Analytics entails and enables:
Clustering: Clustering is about capturing and analyzing various comments,
demands, and questions that customers share with like-minded friends and
groups, over social media platforms. It helps identify the appropriate response
and behavioral anomalies.
Classification: Having captured data on the activities of customers and their
comments, it is possible to perform natural language processing on it to evolve
patterns. These patterns can then be categorized and understood for
appropriate responses. Organizations can use Classification to address the
concerns of customers and approach them with products and offerings that
really meet their needs.
Sequential classification: This enables organizations to identify the subsequent
steps and actions that customers might take, based on their recent experiences.
7. Using Big Data technologies to enable social media analytics
7
Entity Extraction: Organizations can identify the concerns and issues that
dissatisfied customers are struggling with through Entity Extraction. They can
then take appropriate measures to ease the situation and retain customers on
the verge of switching to other suppliers or vendors. Event Extraction enables
companies to unearth the sequence of events leading up to customer
defections, or why people moved on to other providers.
Communications Graphs: Once organizations have all the data nicely sliced and
diced, they can draw Communication Graphs. These graphs can help analyze
and identify the top influencers, and active members in various groups. They
can also help companies gain a better understanding of where the messages
originate, and how they travel through the network. Knowing this, organizations
can target the top influencers and most active members in the network,
projecting a positive image of the brand or product in the community.
8. Using Big Data technologies to enable social media analytics
8
Using Big Data technologies to enable Social
Media Analytics
One of the biggest challenges that organizations face with their social media
data is its humungous size.
Existing Enterprise Data warehousing (EDW) environments, designed decades
ago, simply lack the ability to capture, and process social media data within a
reasonable time. Moreover, these traditional EDWs have limited capabilities
when it comes to analyzing the behavioral data of users. Traditional solutions
cannot help companies in managing complex and unstructured data generated
by social media interactions nor handle multimedia data.
Using Big Data technologies is their best bet in this scenario. Big Data
technologies can help organizations handle large volumes of complex,
unstructured data from social sources, of the order of terabytes and petabytes,
gain insights into customers and trends, store images and videos, and save
hundreds of thousands of dollars per terabyte per year.
Take the instance of a Big Data Social Analytics Platform which has to deal with
information from various data sources such as Social Media sites and web 2.0
enabled websites. The Platform can also pull historical bulk data lying around in
existing systems using appropriate connectors.
The connectors enable the conversion of the data from all kinds of data sources
into a Hadoop-based data warehouse. After collecting this data, Apache’s
Mahout, a scalable machine learning and data mining solution, can be used to
categorize the data and store it in accordance with the categories for later use.
It is also possible to run Map-Reduce jobs that use Natural Language Tool Kits
(NLTK) to perform natural language processing of the comments and feedback
from the social data sources.
The aptly massaged and categorized data can then be used to draw graphs, and
analyze market sentiment about a product. The data can be used for MIS and to
compile regulatory reports that need to be produced on a regular basis using
Sqoop.
Since the Big Data Social Analytics is powered by Hadoop, it can linearly scale up
to thousands of nodes using commodity hardware. This spells a significant cost
advantage for organizations, in the long run.
Since it is important for businesses to track down, and take advantage of
opportunities quickly, this platform can enable them to react to the events as
they happen.
9. Using Big Data technologies to enable social media analytics
9
Building a Big Data warehouse
In order to build a Big Data warehouse that extracts data from the sources
discussed earlier, and draw pertinent insights from it, organizations must begin
by grabbing social media data from various public social media platforms. The
historical master data and transactional data about customers can be taken
from existing systems. Sqoop can come in handy for pulling out the data into the
RDBMS systems, which are already in place.
Text User Location Source
Gift card TweetUser USA, NY Twitter
Free offer FaceUser USA, GA Facebook
10. Using Big Data technologies to enable social media analytics
10
For natural language processing, using a NLTK is a good Open Source option.
Data preparation/Mashups can be accomplished by running Map-Reduce jobs
over the collected data and massaging it.
Apache Mahout’s k-means algorithm can be used for clustering, while its Naïve
Bayesian algorithm can be used for classification/sentiment analysis using the
comments and tweets from social media data sources and identifying patterns.
The item-based similarity algorithm of Mahout can be used for collaborative
filtering and recommendations. When the data is ready for analytical reporting
and deep mining, Hive or Pig can be used.
A step-by-step approach to creating the Big
Data EDW
Step 1: The first step is to create and run training data through Mahout to help
it understand how to classify social data feeds. Next, the feeds have to be
collected from public social media platforms. This can be accomplished by
performing keyword based searches and streaming in the result sets on a
continuous basis. It is possible now to search on the basis of a brand name,
product make and model, category, industry terminology, product segment,
special offers and marketing buzzwords, using the various APIs offered by social
media platforms. This classified data can then be dumped into an HBASE-based
data warehouse constantly and continuously.
The data from existing systems can also be imported into the HBASE base Big
Data warehouse. Online content can be crawled and dumped into the HBASE
database. Connectors are available for classification of online pages. Lucene
and Solr are very suitable for this purpose.
Step 2: At this stage, quantitative analytics can be performed on the collected
data. It is possible to draw comparisons between ‘Total tweets’ versus ‘Our
product specific tweets.’ This is accomplished by using Mahout algorithms over
a Hadoop cluster. Organizations can also publish a daily trend watch. This may
contain the ‘total number of comments about the products of their
competitors,’ versus the ‘total number of comments about their own products.’
With customers increasingly using devices for connecting to social media
platforms, it is now possible to perform location-based trend analysis.
Classification and clustering is performed by using Mahout/NLTK processed
data. Organizations can run the training data through Mahout/NLTK to help it
understand how to build trained models. After that, it is possible to run the
tweets and feed from other social media platforms through trained models, and
have the tweets and comments classified. This provides a clear picture of the
11. Using Big Data technologies to enable social media analytics
11
sentiments prevailing in the marketplace for the products of organizations as
well as their competitors.
Companies can come up with recommendations by running the data through
Mahout. These recommendations can then be factored into future product
design and rollouts.
Step 3: This step is about using customer data to recommend new and related
products. Once companies have data from their existing systems as well as
social sources, they can prepare the mock customer data for Social ID mapping
and run Item or User based recommendations on this data using Mahout.
At this stage, it is possible to produce Analytical Reports on data generated by
Mahout. This can be accomplished by generating reports using a traditional
Reporting product or framework. The nicely sliced and diced reporting data can
be dumped into a MySQL database or some other SQL database, with the help
of Sqoop. This SQL database can be used to meet the regular downstream
reporting requirements of organizations. This will enable them to use their
existing investments in reporting tools as well as provide the drill down reports
for use by the management and Sales and Marketing departments.
Alongside social media, this Big Data Media Analytics platform can be used to
address other large data analytics requirements. The platform can give
companies a head start in putting together the pieces of their Big Data strategy
and provide them with an asymmetric advantage over competition.
The Impetus solution
Impetus has used this approach and technologies to build a platform for Social
Media Analytics. Impetus, an established thought leader in the Big Data space
has conceptualized, architected and built this platform based on the experience
and expertise that it has gained through its client engagements.
The iLaDaP high level architecture
The Large Data Analytics Platform developed by Impetus is built using the
Service Oriented Architecture (SOA), and incorporates all the key characteristics
of an ideal Big Data Analytics Platform. The iLaDaP is designed to derive
intelligence and operate on huge datasets collected from numerous data
sources in multiple data formats.
12. Using Big Data technologies to enable social media analytics
12
It is powered by Hadoop, and therefore, can linearly scale up to thousands of
nodes using commodity hardware. This spells a significant cost advantage in the
long run. iLaDaP also comes with a set of pre-canned and customized reports.
Businesses that need to track down and take advantage of opportunities as they
happen can use the Impetus platform to react to events. The iLaDaP is also
capable of collecting data from a range of disparate sources. This unstructured
data can be transformed and utilized for strategic business decisions.
Furthermore, organizations can deploy the solution on-premise, as well as in a
Cloud supported setup. iLaDaP can be seamlessly integrated with the current
platforms of companies, without making any major changes.