2. About Brian Culver
SharePoint Solutions Architect for Expert Point Solutions
Based in Houston, TX
Author
ProveIT! Analytics
SharePoint 2010 Unleashed
Various White Papers
Speaker and Blogger
3. Session Agenda
What is Big Data?
Understanding Sentiment Analysis
Connecting Big Data and Business Intelligence
Create an Azure HDInsight Cluster
Load data into Blob Storage
Validate data via HDInsight
Hadoop and C#
Visualizing Results via PowerView
Closing comments
4. What is Big Data?
Big Data is about personalization and knowledge. Understanding
our customers and relationships to the world.
• 27% of customers have seen Personalization online
• 86% of those say Personalization influenced what they purchased to
some extent
• 31% want a more Personalized experience
• 59% of customers who have experienced Personalization believe it has
a noticeable influence on purchasing
• 58% prefer product recommendations from previous purchases over
other forms of personalization
6. What is Big Data?
Big Data is data that is “too” complex, large, and/or fast.
Big Data offers a new set of approaches for analyzing data sets that
were not previously accessible which posed challenges across one or
more of the “3 V’s”:
Volume - too big and large - Terabytes (and more) of credit card
transactions, web usage data, system logs, etc.
Variety - too Complex - Unstructured data such as social media,
customer reviews, call center records, etc.
Velocity - too Fast - Sensor data, live web traffic, mobile phone usage,
GPS data, etc.
8. What is Big Data?
Web app
optimization
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting
Natural
resource
exploration
Social
network
analysis
Churn
analysis
Traffic flow
optimization
IT
infrastructure
optimization
Legal
discovery
COMMON BIG DATA CUSTOMER SCENARIOS
GAIN COMPETITIVE ADVANTAGE BY MOVING FIRST
AND FAST IN YOUR INDUSTRY
9. How Pier 1 uses HDInsight to improve and customize
customer experience:
http://www.youtube.com/watch?v=fN8Cixcc5yg
What is Big Data?
How are customers using HDInsight?
Azure HDInsight + Machine Learning
10. What is Big Data?
Ambari - Cluster provisioning, management, and monitoring.
Avro (Microsoft .NET Library for Avro) - Data serialization for the
Microsoft .NET environment.
Hive - Structured Query Language (SQL)-like querying.
Mahout - Machine learning.
MapReduce and YARN - Distributed processing and resource
management.
Oozie - Workflow management.
Pig - Simpler scripting for MapReduce transformations.
Sqoop - Data import and export.
ZooKeeper - Coordination of processes in distributed systems.
11. REDUCEMAP
Map + Reduce = Extract, Load + Transform
Raw Data Raw Data Raw Data Raw Data
Mapper Mapper Mapper Mapper
Data Data Data Data
Reducer
Output
What is Big Data?
13. Understanding Sentiment Analysis
For example:
Free Form Text
I had a fantastic time on holiday at your resort. The service was
excellent and awesome. My family really enjoyed themselves.
We look forward to next year. One thing though, the pool was
closed which sucked.
Hotel Feedback
14. Understanding Sentiment Analysis
Take a list of positive and negative words
Positive
Good
Great
Fantastic
Excellent
Friendly
Awesome
Enjoyed
Negative
Bad
Worse
Rubbish
Sucked
Awful
Terrible
Bogus
15. Understanding Sentiment Analysis
I had a fantastic time on holiday at your resort. The
service was excellent and awesome. My family
really enjoyed themselves. We look forward to next
year. One thing though, the pool was closed which
sucked.
Hotel Feedback
19. Connecting Big Data & Business Intelligence
In the following demo, we will cover the following:
• Create an Azure HDInsight Cluster
• Create Storage
• Create HDInsight Cluster
• Load data into Blob Storage
• Validate data via HDInsight
• Hadoop and C#
• Visualizing Results via Excel (PowerQuery,
PowerView, etc.)
21. Closing Comments
Big Data is about understanding what your customers are saying and
thinking.
Anything can be understood and processed but it requires time to
analyze and understand.
Any device that creates data can produce valuable information.
Try different things and let the patterns emerge.
23. Constructive Feedback Is Appreciated
Great information,
but would like to
have learned more
about [Insert Topic]Brian – Your
presentation
was …
Good
Demos!
Thanks!