SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Using Big Data technologies to
enable social media analytics
W H I T E P A P E R
Abstract
In this white paper, Impetus talks about the need for
building Big Data technologies based social analytics
platform for better business insight. The paper also
focuses on why social media analytics is important in
today’s world and how 3-D data sources—that is ,
internal, external and social data—can be utilized to build
a data warehouse based on Big Data technologies.
Impetus also shares in this white paper, its recommended
solution, and how Big Data technologies can be used to
optimize costs and handle and exponential increases in
data over time.
Impetus Technologies Inc.
www.impetus.com
Using Big Data technologies to enable social media analytics
2
Table of Contents
Introduction ..................................................................................................................................................3
The benefits of Social Analytics ....................................................................................................................5
Data sources that facilitate Social Media Analytics ......................................................................................6
Technical tenets of Social Media Analytics...................................................................................................6
Using Big Data technologies to enable Social Media Analytics ....................................................................8
Building a Big Data warehouse .....................................................................................................................9
A step-by-step approach to creating the Big Data EDW.............................................................................10
The Impetus solution ..................................................................................................................................11
The iLaDaP high level architecture..........................................................................................................11
Summary.....................................................................................................................................................13
Using Big Data technologies to enable social media analytics
3
Introduction
Social Media Analytics is a discipline that helps organizations measure, assess
and explain the performance of their social media initiatives.
There are four stages of analyzing social media data, including the following:
Step 1: collecting the data. This facilitates the compiling of reports and statistics
that are to be shared with the management or the internal and external
stakeholders.
Step 2: measuring the data. This helps in Sentiment Analysis and gauging which
products are well received in the marketplace.
Step 3: analysis. Here, data is presented in a visual and interactive manner to
the management, as well as the sales and marketing teams to provide better
insights.
Step 4: innovation. Based on the insights and analysis, there is a move towards
innovation, where organizations determine the new products and ideas they are
going to pursue, as a response to customer requirements. Innovation also helps
unearth the cross sell or up sell opportunities that were not visible before.
Social Analytics opens up a host of new opportunities and perspectives.
Category-wise analysis of customer data for instance, enables their
demographic profiling and helps determine their usage patterns. Similarly, with
Feature analysis, it is possible to figure out which forums, platforms or sources
of data are more active as compared to others.
Product Growth Analysis, which focuses on the data generated for a specific
product, helps understand the response of users to that product. There is also a
Recommendation Engine, which helps zero in on what is missing or lacking in a
product range.
Using Big Data technologies to enable social media analytics
4
Finally, Social Analytics enables Third Party Analysis, which is purely focused on
what the public social media platforms, such as Twitter, Facebook, MySpace,
etc. have to say about the product.
Using Big Data technologies to enable social media analytics
5
The benefits of Social Analytics
Social Analytics is an outcome-based approach and one which creates visible
Return on Investment (RoI).
• It helps organizations retain customers by addressing their concerns
upfront, rather than being slaves to processes. The results of the
analytics help organizations retain brand preference in a fickle
consumer world.
• It improves customer service and brings down the cost of operations.
• It enables organizations to add new customers, by understanding and
addressing their requirements
• Social Analytics helps companies keep an eye on their competition. With
easy access to social media data, it is simple to track and counter the
moves of competitors.
• It helps companies remain proactive. The turnaround time for gathering
customer feedback is reduced drastically. Moreover, the reactions of
customers and their subsequent actions can be predicted more
accurately, enabling organizations to take appropriate measures.
Using Big Data technologies to enable social media analytics
6
Social Media Analytics effectively converges on-site, social media and third party
data to extract useful information. Considering these factors, and the fact that it
enables enterprises to leverage the colossal data that is continuously generated
through social media interactions, Social Media Analytics should be made an
integral part of the marketing and research strategies of enterprises.
Data sources that facilitate Social Media
Analytics
Data sources include internal data, such as the purchase history of customers,
their transactions, and profiles in the enterprise database. It also encompasses
website traffic analysis, covering internal CSR logs, customer queries,
automated agent discussions, complaints and resolutions, and employee
insights.
Data sources can also be the social activities and profile updates of customers
on public social media platforms such as Twitter, Facebook, Myspace, LinkedIn,
etc.
External data sources can additionally be used, and customers analyzed by
factoring in industry sources of information and market research reports.
Technical tenets of Social Media Analytics
Here’s a look at what Social Media Analytics entails and enables:
Clustering: Clustering is about capturing and analyzing various comments,
demands, and questions that customers share with like-minded friends and
groups, over social media platforms. It helps identify the appropriate response
and behavioral anomalies.
Classification: Having captured data on the activities of customers and their
comments, it is possible to perform natural language processing on it to evolve
patterns. These patterns can then be categorized and understood for
appropriate responses. Organizations can use Classification to address the
concerns of customers and approach them with products and offerings that
really meet their needs.
Sequential classification: This enables organizations to identify the subsequent
steps and actions that customers might take, based on their recent experiences.
Using Big Data technologies to enable social media analytics
7
Entity Extraction: Organizations can identify the concerns and issues that
dissatisfied customers are struggling with through Entity Extraction. They can
then take appropriate measures to ease the situation and retain customers on
the verge of switching to other suppliers or vendors. Event Extraction enables
companies to unearth the sequence of events leading up to customer
defections, or why people moved on to other providers.
Communications Graphs: Once organizations have all the data nicely sliced and
diced, they can draw Communication Graphs. These graphs can help analyze
and identify the top influencers, and active members in various groups. They
can also help companies gain a better understanding of where the messages
originate, and how they travel through the network. Knowing this, organizations
can target the top influencers and most active members in the network,
projecting a positive image of the brand or product in the community.
Using Big Data technologies to enable social media analytics
8
Using Big Data technologies to enable Social
Media Analytics
One of the biggest challenges that organizations face with their social media
data is its humungous size.
Existing Enterprise Data warehousing (EDW) environments, designed decades
ago, simply lack the ability to capture, and process social media data within a
reasonable time. Moreover, these traditional EDWs have limited capabilities
when it comes to analyzing the behavioral data of users. Traditional solutions
cannot help companies in managing complex and unstructured data generated
by social media interactions nor handle multimedia data.
Using Big Data technologies is their best bet in this scenario. Big Data
technologies can help organizations handle large volumes of complex,
unstructured data from social sources, of the order of terabytes and petabytes,
gain insights into customers and trends, store images and videos, and save
hundreds of thousands of dollars per terabyte per year.
Take the instance of a Big Data Social Analytics Platform which has to deal with
information from various data sources such as Social Media sites and web 2.0
enabled websites. The Platform can also pull historical bulk data lying around in
existing systems using appropriate connectors.
The connectors enable the conversion of the data from all kinds of data sources
into a Hadoop-based data warehouse. After collecting this data, Apache’s
Mahout, a scalable machine learning and data mining solution, can be used to
categorize the data and store it in accordance with the categories for later use.
It is also possible to run Map-Reduce jobs that use Natural Language Tool Kits
(NLTK) to perform natural language processing of the comments and feedback
from the social data sources.
The aptly massaged and categorized data can then be used to draw graphs, and
analyze market sentiment about a product. The data can be used for MIS and to
compile regulatory reports that need to be produced on a regular basis using
Sqoop.
Since the Big Data Social Analytics is powered by Hadoop, it can linearly scale up
to thousands of nodes using commodity hardware. This spells a significant cost
advantage for organizations, in the long run.
Since it is important for businesses to track down, and take advantage of
opportunities quickly, this platform can enable them to react to the events as
they happen.
Using Big Data technologies to enable social media analytics
9
Building a Big Data warehouse
In order to build a Big Data warehouse that extracts data from the sources
discussed earlier, and draw pertinent insights from it, organizations must begin
by grabbing social media data from various public social media platforms. The
historical master data and transactional data about customers can be taken
from existing systems. Sqoop can come in handy for pulling out the data into the
RDBMS systems, which are already in place.
Text User Location Source
Gift card TweetUser USA, NY Twitter
Free offer FaceUser USA, GA Facebook
Using Big Data technologies to enable social media analytics
10
For natural language processing, using a NLTK is a good Open Source option.
Data preparation/Mashups can be accomplished by running Map-Reduce jobs
over the collected data and massaging it.
Apache Mahout’s k-means algorithm can be used for clustering, while its Naïve
Bayesian algorithm can be used for classification/sentiment analysis using the
comments and tweets from social media data sources and identifying patterns.
The item-based similarity algorithm of Mahout can be used for collaborative
filtering and recommendations. When the data is ready for analytical reporting
and deep mining, Hive or Pig can be used.
A step-by-step approach to creating the Big
Data EDW
Step 1: The first step is to create and run training data through Mahout to help
it understand how to classify social data feeds. Next, the feeds have to be
collected from public social media platforms. This can be accomplished by
performing keyword based searches and streaming in the result sets on a
continuous basis. It is possible now to search on the basis of a brand name,
product make and model, category, industry terminology, product segment,
special offers and marketing buzzwords, using the various APIs offered by social
media platforms. This classified data can then be dumped into an HBASE-based
data warehouse constantly and continuously.
The data from existing systems can also be imported into the HBASE base Big
Data warehouse. Online content can be crawled and dumped into the HBASE
database. Connectors are available for classification of online pages. Lucene
and Solr are very suitable for this purpose.
Step 2: At this stage, quantitative analytics can be performed on the collected
data. It is possible to draw comparisons between ‘Total tweets’ versus ‘Our
product specific tweets.’ This is accomplished by using Mahout algorithms over
a Hadoop cluster. Organizations can also publish a daily trend watch. This may
contain the ‘total number of comments about the products of their
competitors,’ versus the ‘total number of comments about their own products.’
With customers increasingly using devices for connecting to social media
platforms, it is now possible to perform location-based trend analysis.
Classification and clustering is performed by using Mahout/NLTK processed
data. Organizations can run the training data through Mahout/NLTK to help it
understand how to build trained models. After that, it is possible to run the
tweets and feed from other social media platforms through trained models, and
have the tweets and comments classified. This provides a clear picture of the
Using Big Data technologies to enable social media analytics
11
sentiments prevailing in the marketplace for the products of organizations as
well as their competitors.
Companies can come up with recommendations by running the data through
Mahout. These recommendations can then be factored into future product
design and rollouts.
Step 3: This step is about using customer data to recommend new and related
products. Once companies have data from their existing systems as well as
social sources, they can prepare the mock customer data for Social ID mapping
and run Item or User based recommendations on this data using Mahout.
At this stage, it is possible to produce Analytical Reports on data generated by
Mahout. This can be accomplished by generating reports using a traditional
Reporting product or framework. The nicely sliced and diced reporting data can
be dumped into a MySQL database or some other SQL database, with the help
of Sqoop. This SQL database can be used to meet the regular downstream
reporting requirements of organizations. This will enable them to use their
existing investments in reporting tools as well as provide the drill down reports
for use by the management and Sales and Marketing departments.
Alongside social media, this Big Data Media Analytics platform can be used to
address other large data analytics requirements. The platform can give
companies a head start in putting together the pieces of their Big Data strategy
and provide them with an asymmetric advantage over competition.
The Impetus solution
Impetus has used this approach and technologies to build a platform for Social
Media Analytics. Impetus, an established thought leader in the Big Data space
has conceptualized, architected and built this platform based on the experience
and expertise that it has gained through its client engagements.
The iLaDaP high level architecture
The Large Data Analytics Platform developed by Impetus is built using the
Service Oriented Architecture (SOA), and incorporates all the key characteristics
of an ideal Big Data Analytics Platform. The iLaDaP is designed to derive
intelligence and operate on huge datasets collected from numerous data
sources in multiple data formats.
Using Big Data technologies to enable social media analytics
12
It is powered by Hadoop, and therefore, can linearly scale up to thousands of
nodes using commodity hardware. This spells a significant cost advantage in the
long run. iLaDaP also comes with a set of pre-canned and customized reports.
Businesses that need to track down and take advantage of opportunities as they
happen can use the Impetus platform to react to events. The iLaDaP is also
capable of collecting data from a range of disparate sources. This unstructured
data can be transformed and utilized for strategic business decisions.
Furthermore, organizations can deploy the solution on-premise, as well as in a
Cloud supported setup. iLaDaP can be seamlessly integrated with the current
platforms of companies, without making any major changes.
Using Big Data technologies to enable social media analytics
13
Summary
Traditional Enterprise Data Warehouses do not have the ability to keep up with
rapidly increasing social media data. The need of the hour is to effectively
strategize and build a Big Data Analytics Platform to manage, store and derive
insights from this digital data.
Any single vendor technology may not be sufficient to undertake this task, and it
is recommended that organizations go for Open Source options to build a Social
Media Analytics Platform using Big Data technologies. The fact is that the
success of a Big Data platform depends entire on the tools that are used.
Organizations therefore, need to use discretion and select the most appropriate
tools from the available options. Companies can also re-use existing EDW
investments for their Big Data Analytics Platform.
About Impetus
Impetus Technologies is a leading provider of Big Data solutions for the
Fortune 500®. We help customers effectively manage the “3-Vs” of Big Data
and create new business insights across their enterprises.
Website: www.bigdata.impetus.com | Email: bigdata@impetus.com
© 2013 Impetus Technologies,
Inc. All rights reserved. Product
and company names mentioned
herein may be trademarks of
their respective companies.
May 2013

Weitere ähnliche Inhalte

Mehr von Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

Mehr von Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

Kürzlich hochgeladen

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Kürzlich hochgeladen (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Using Big Data Technologies for Social Media Analytics- Impetus White Paper

  • 1. Using Big Data technologies to enable social media analytics W H I T E P A P E R Abstract In this white paper, Impetus talks about the need for building Big Data technologies based social analytics platform for better business insight. The paper also focuses on why social media analytics is important in today’s world and how 3-D data sources—that is , internal, external and social data—can be utilized to build a data warehouse based on Big Data technologies. Impetus also shares in this white paper, its recommended solution, and how Big Data technologies can be used to optimize costs and handle and exponential increases in data over time. Impetus Technologies Inc. www.impetus.com
  • 2. Using Big Data technologies to enable social media analytics 2 Table of Contents Introduction ..................................................................................................................................................3 The benefits of Social Analytics ....................................................................................................................5 Data sources that facilitate Social Media Analytics ......................................................................................6 Technical tenets of Social Media Analytics...................................................................................................6 Using Big Data technologies to enable Social Media Analytics ....................................................................8 Building a Big Data warehouse .....................................................................................................................9 A step-by-step approach to creating the Big Data EDW.............................................................................10 The Impetus solution ..................................................................................................................................11 The iLaDaP high level architecture..........................................................................................................11 Summary.....................................................................................................................................................13
  • 3. Using Big Data technologies to enable social media analytics 3 Introduction Social Media Analytics is a discipline that helps organizations measure, assess and explain the performance of their social media initiatives. There are four stages of analyzing social media data, including the following: Step 1: collecting the data. This facilitates the compiling of reports and statistics that are to be shared with the management or the internal and external stakeholders. Step 2: measuring the data. This helps in Sentiment Analysis and gauging which products are well received in the marketplace. Step 3: analysis. Here, data is presented in a visual and interactive manner to the management, as well as the sales and marketing teams to provide better insights. Step 4: innovation. Based on the insights and analysis, there is a move towards innovation, where organizations determine the new products and ideas they are going to pursue, as a response to customer requirements. Innovation also helps unearth the cross sell or up sell opportunities that were not visible before. Social Analytics opens up a host of new opportunities and perspectives. Category-wise analysis of customer data for instance, enables their demographic profiling and helps determine their usage patterns. Similarly, with Feature analysis, it is possible to figure out which forums, platforms or sources of data are more active as compared to others. Product Growth Analysis, which focuses on the data generated for a specific product, helps understand the response of users to that product. There is also a Recommendation Engine, which helps zero in on what is missing or lacking in a product range.
  • 4. Using Big Data technologies to enable social media analytics 4 Finally, Social Analytics enables Third Party Analysis, which is purely focused on what the public social media platforms, such as Twitter, Facebook, MySpace, etc. have to say about the product.
  • 5. Using Big Data technologies to enable social media analytics 5 The benefits of Social Analytics Social Analytics is an outcome-based approach and one which creates visible Return on Investment (RoI). • It helps organizations retain customers by addressing their concerns upfront, rather than being slaves to processes. The results of the analytics help organizations retain brand preference in a fickle consumer world. • It improves customer service and brings down the cost of operations. • It enables organizations to add new customers, by understanding and addressing their requirements • Social Analytics helps companies keep an eye on their competition. With easy access to social media data, it is simple to track and counter the moves of competitors. • It helps companies remain proactive. The turnaround time for gathering customer feedback is reduced drastically. Moreover, the reactions of customers and their subsequent actions can be predicted more accurately, enabling organizations to take appropriate measures.
  • 6. Using Big Data technologies to enable social media analytics 6 Social Media Analytics effectively converges on-site, social media and third party data to extract useful information. Considering these factors, and the fact that it enables enterprises to leverage the colossal data that is continuously generated through social media interactions, Social Media Analytics should be made an integral part of the marketing and research strategies of enterprises. Data sources that facilitate Social Media Analytics Data sources include internal data, such as the purchase history of customers, their transactions, and profiles in the enterprise database. It also encompasses website traffic analysis, covering internal CSR logs, customer queries, automated agent discussions, complaints and resolutions, and employee insights. Data sources can also be the social activities and profile updates of customers on public social media platforms such as Twitter, Facebook, Myspace, LinkedIn, etc. External data sources can additionally be used, and customers analyzed by factoring in industry sources of information and market research reports. Technical tenets of Social Media Analytics Here’s a look at what Social Media Analytics entails and enables: Clustering: Clustering is about capturing and analyzing various comments, demands, and questions that customers share with like-minded friends and groups, over social media platforms. It helps identify the appropriate response and behavioral anomalies. Classification: Having captured data on the activities of customers and their comments, it is possible to perform natural language processing on it to evolve patterns. These patterns can then be categorized and understood for appropriate responses. Organizations can use Classification to address the concerns of customers and approach them with products and offerings that really meet their needs. Sequential classification: This enables organizations to identify the subsequent steps and actions that customers might take, based on their recent experiences.
  • 7. Using Big Data technologies to enable social media analytics 7 Entity Extraction: Organizations can identify the concerns and issues that dissatisfied customers are struggling with through Entity Extraction. They can then take appropriate measures to ease the situation and retain customers on the verge of switching to other suppliers or vendors. Event Extraction enables companies to unearth the sequence of events leading up to customer defections, or why people moved on to other providers. Communications Graphs: Once organizations have all the data nicely sliced and diced, they can draw Communication Graphs. These graphs can help analyze and identify the top influencers, and active members in various groups. They can also help companies gain a better understanding of where the messages originate, and how they travel through the network. Knowing this, organizations can target the top influencers and most active members in the network, projecting a positive image of the brand or product in the community.
  • 8. Using Big Data technologies to enable social media analytics 8 Using Big Data technologies to enable Social Media Analytics One of the biggest challenges that organizations face with their social media data is its humungous size. Existing Enterprise Data warehousing (EDW) environments, designed decades ago, simply lack the ability to capture, and process social media data within a reasonable time. Moreover, these traditional EDWs have limited capabilities when it comes to analyzing the behavioral data of users. Traditional solutions cannot help companies in managing complex and unstructured data generated by social media interactions nor handle multimedia data. Using Big Data technologies is their best bet in this scenario. Big Data technologies can help organizations handle large volumes of complex, unstructured data from social sources, of the order of terabytes and petabytes, gain insights into customers and trends, store images and videos, and save hundreds of thousands of dollars per terabyte per year. Take the instance of a Big Data Social Analytics Platform which has to deal with information from various data sources such as Social Media sites and web 2.0 enabled websites. The Platform can also pull historical bulk data lying around in existing systems using appropriate connectors. The connectors enable the conversion of the data from all kinds of data sources into a Hadoop-based data warehouse. After collecting this data, Apache’s Mahout, a scalable machine learning and data mining solution, can be used to categorize the data and store it in accordance with the categories for later use. It is also possible to run Map-Reduce jobs that use Natural Language Tool Kits (NLTK) to perform natural language processing of the comments and feedback from the social data sources. The aptly massaged and categorized data can then be used to draw graphs, and analyze market sentiment about a product. The data can be used for MIS and to compile regulatory reports that need to be produced on a regular basis using Sqoop. Since the Big Data Social Analytics is powered by Hadoop, it can linearly scale up to thousands of nodes using commodity hardware. This spells a significant cost advantage for organizations, in the long run. Since it is important for businesses to track down, and take advantage of opportunities quickly, this platform can enable them to react to the events as they happen.
  • 9. Using Big Data technologies to enable social media analytics 9 Building a Big Data warehouse In order to build a Big Data warehouse that extracts data from the sources discussed earlier, and draw pertinent insights from it, organizations must begin by grabbing social media data from various public social media platforms. The historical master data and transactional data about customers can be taken from existing systems. Sqoop can come in handy for pulling out the data into the RDBMS systems, which are already in place. Text User Location Source Gift card TweetUser USA, NY Twitter Free offer FaceUser USA, GA Facebook
  • 10. Using Big Data technologies to enable social media analytics 10 For natural language processing, using a NLTK is a good Open Source option. Data preparation/Mashups can be accomplished by running Map-Reduce jobs over the collected data and massaging it. Apache Mahout’s k-means algorithm can be used for clustering, while its Naïve Bayesian algorithm can be used for classification/sentiment analysis using the comments and tweets from social media data sources and identifying patterns. The item-based similarity algorithm of Mahout can be used for collaborative filtering and recommendations. When the data is ready for analytical reporting and deep mining, Hive or Pig can be used. A step-by-step approach to creating the Big Data EDW Step 1: The first step is to create and run training data through Mahout to help it understand how to classify social data feeds. Next, the feeds have to be collected from public social media platforms. This can be accomplished by performing keyword based searches and streaming in the result sets on a continuous basis. It is possible now to search on the basis of a brand name, product make and model, category, industry terminology, product segment, special offers and marketing buzzwords, using the various APIs offered by social media platforms. This classified data can then be dumped into an HBASE-based data warehouse constantly and continuously. The data from existing systems can also be imported into the HBASE base Big Data warehouse. Online content can be crawled and dumped into the HBASE database. Connectors are available for classification of online pages. Lucene and Solr are very suitable for this purpose. Step 2: At this stage, quantitative analytics can be performed on the collected data. It is possible to draw comparisons between ‘Total tweets’ versus ‘Our product specific tweets.’ This is accomplished by using Mahout algorithms over a Hadoop cluster. Organizations can also publish a daily trend watch. This may contain the ‘total number of comments about the products of their competitors,’ versus the ‘total number of comments about their own products.’ With customers increasingly using devices for connecting to social media platforms, it is now possible to perform location-based trend analysis. Classification and clustering is performed by using Mahout/NLTK processed data. Organizations can run the training data through Mahout/NLTK to help it understand how to build trained models. After that, it is possible to run the tweets and feed from other social media platforms through trained models, and have the tweets and comments classified. This provides a clear picture of the
  • 11. Using Big Data technologies to enable social media analytics 11 sentiments prevailing in the marketplace for the products of organizations as well as their competitors. Companies can come up with recommendations by running the data through Mahout. These recommendations can then be factored into future product design and rollouts. Step 3: This step is about using customer data to recommend new and related products. Once companies have data from their existing systems as well as social sources, they can prepare the mock customer data for Social ID mapping and run Item or User based recommendations on this data using Mahout. At this stage, it is possible to produce Analytical Reports on data generated by Mahout. This can be accomplished by generating reports using a traditional Reporting product or framework. The nicely sliced and diced reporting data can be dumped into a MySQL database or some other SQL database, with the help of Sqoop. This SQL database can be used to meet the regular downstream reporting requirements of organizations. This will enable them to use their existing investments in reporting tools as well as provide the drill down reports for use by the management and Sales and Marketing departments. Alongside social media, this Big Data Media Analytics platform can be used to address other large data analytics requirements. The platform can give companies a head start in putting together the pieces of their Big Data strategy and provide them with an asymmetric advantage over competition. The Impetus solution Impetus has used this approach and technologies to build a platform for Social Media Analytics. Impetus, an established thought leader in the Big Data space has conceptualized, architected and built this platform based on the experience and expertise that it has gained through its client engagements. The iLaDaP high level architecture The Large Data Analytics Platform developed by Impetus is built using the Service Oriented Architecture (SOA), and incorporates all the key characteristics of an ideal Big Data Analytics Platform. The iLaDaP is designed to derive intelligence and operate on huge datasets collected from numerous data sources in multiple data formats.
  • 12. Using Big Data technologies to enable social media analytics 12 It is powered by Hadoop, and therefore, can linearly scale up to thousands of nodes using commodity hardware. This spells a significant cost advantage in the long run. iLaDaP also comes with a set of pre-canned and customized reports. Businesses that need to track down and take advantage of opportunities as they happen can use the Impetus platform to react to events. The iLaDaP is also capable of collecting data from a range of disparate sources. This unstructured data can be transformed and utilized for strategic business decisions. Furthermore, organizations can deploy the solution on-premise, as well as in a Cloud supported setup. iLaDaP can be seamlessly integrated with the current platforms of companies, without making any major changes.
  • 13. Using Big Data technologies to enable social media analytics 13 Summary Traditional Enterprise Data Warehouses do not have the ability to keep up with rapidly increasing social media data. The need of the hour is to effectively strategize and build a Big Data Analytics Platform to manage, store and derive insights from this digital data. Any single vendor technology may not be sufficient to undertake this task, and it is recommended that organizations go for Open Source options to build a Social Media Analytics Platform using Big Data technologies. The fact is that the success of a Big Data platform depends entire on the tools that are used. Organizations therefore, need to use discretion and select the most appropriate tools from the available options. Companies can also re-use existing EDW investments for their Big Data Analytics Platform. About Impetus Impetus Technologies is a leading provider of Big Data solutions for the Fortune 500®. We help customers effectively manage the “3-Vs” of Big Data and create new business insights across their enterprises. Website: www.bigdata.impetus.com | Email: bigdata@impetus.com © 2013 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. May 2013