SlideShare a Scribd company logo
1 of 42
Download to read offline
#TOSMAC
Toronto SMAC Meetup – Welcome!
An Intro to Text Analytics on Big Data with a use case
#TOSMAC
Toronto SMAC Team
| © 2014 IBM Corporation2
Lucas Silva Felipe MosquettaMarcos de
Mello
#TOSMAC
Twitters numbers
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation3
As you know:
-500 million Tweets are sent per day.
-Twitter supports 35+ languages.
-255 million monthly active users.
Huge amount of data!
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation4
Overview
Section1 Section2 Section3 Section4 Section5
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation5
Overview
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation6
Overview
#TOSMAC
Let’s get started!
| © 2014 IBM Corporation7
#TOSMAC
Input data
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation8
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation9
Section2
#TOSMAC
Demo
| © 2014 IBM Corporation10
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation11
Next section
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation12
Next section Extractor: used to extract
structured information from
unstructured and
semi-structured data.
AQL: Annotation Query
Language. Rule language
with familiar SQL-like syntax.
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation13
Next section
Profiler:
troubleshooting performance
problems.
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation14
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation15
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation16
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation17
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation18
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation19
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation20
Types of extraction specifications:
- Dictionaries
-Regular expressions
- Part of speech
numbers:
7.5
4
13
#TOSMAC
Demo
| © 2014 IBM Corporation21
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation22
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main concepts
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation23
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation24
#TOSMAC
| © 2014 IBM Corporation25
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Basic feature AQL statements
- Develop the core building blocks of the extractor.
#TOSMAC
| © 2014 IBM Corporation26
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Candidate generation AQL statements
- Combine basic features AQL statements.
#TOSMAC
| © 2014 IBM Corporation27
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
#TOSMAC
| © 2014 IBM Corporation28
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million
$4 thousand
$ 7.5 million
$7.5 million
#TOSMAC
| © 2014 IBM Corporation29
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Filter and consolidate AQL statements
- Refine results
- Remove invalid annotations
- Resolve overlap between annotations.
#TOSMAC
Demo
| © 2014 IBM Corporation30
#TOSMAC
| © 2014 IBM Corporation31
An Intro to Text Analytics on Big Data with a use case
Conclusion
#TOSMAC
Check point
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation32
#TOSMAC
What we have done
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation33
Section1 Section2 Section3
#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation34
Section4 Section5
#TOSMAC
Demo
| © 2014 IBM Corporation35
#TOSMAC
Also using R
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation36
1.75 0.32
#TOSMAC
What are we going to do?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation37
#TOSMAC
Demo
| © 2014 IBM Corporation38
#TOSMAC
So what?
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation39
#TOSMAC
Companies
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation40
#TOSMAC
Exporting to you
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation41
#TOSMAC
Thank you!
Let's network!
| © 2014 IBM Corporation42

More Related Content

Viewers also liked

Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 

Viewers also liked (7)

Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
 
Big data analytics use case and software
Big data analytics use case and softwareBig data analytics use case and software
Big data analytics use case and software
 
Creating a Business Case for Big Data
Creating a Business Case for Big DataCreating a Business Case for Big Data
Creating a Business Case for Big Data
 
CRM as the hub of your big data - A Salesforce use case.
CRM as the hub of your big data - A Salesforce use case.CRM as the hub of your big data - A Salesforce use case.
CRM as the hub of your big data - A Salesforce use case.
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 

Similar to An Intro to Text Analytics on Big Data with a use case

Similar to An Intro to Text Analytics on Big Data with a use case (20)

A Text Analytics Marketscape (from Strata NY 2014)
A Text Analytics Marketscape (from Strata NY 2014)A Text Analytics Marketscape (from Strata NY 2014)
A Text Analytics Marketscape (from Strata NY 2014)
 
SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!SMAC projects - The best summer internship experience I ever had!
SMAC projects - The best summer internship experience I ever had!
 
Delivering Enterprise Applications: Faster. Cheaper. Better
Delivering Enterprise Applications: Faster. Cheaper. BetterDelivering Enterprise Applications: Faster. Cheaper. Better
Delivering Enterprise Applications: Faster. Cheaper. Better
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise SolutionsMaking Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
 
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise SolutionsMaking Sense of DevOps Tools: Open Source to Enterprise Solutions
Making Sense of DevOps Tools: Open Source to Enterprise Solutions
 
Aspera In Telco
Aspera In TelcoAspera In Telco
Aspera In Telco
 
Vision2015-CBS-1148-Final
Vision2015-CBS-1148-FinalVision2015-CBS-1148-Final
Vision2015-CBS-1148-Final
 
0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t0626 2014 01_toronto-smac meetup_io_t
0626 2014 01_toronto-smac meetup_io_t
 
Analytics in High Tech Electronics Supply Chain
Analytics in High Tech Electronics Supply ChainAnalytics in High Tech Electronics Supply Chain
Analytics in High Tech Electronics Supply Chain
 
Big Data & Analytics Day
Big Data & Analytics Day Big Data & Analytics Day
Big Data & Analytics Day
 
Energy Central Webinar on June 14, 2016
Energy Central Webinar on June 14, 2016Energy Central Webinar on June 14, 2016
Energy Central Webinar on June 14, 2016
 
0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final0430 toronto smac_meetup_worklight_intro_final
0430 toronto smac_meetup_worklight_intro_final
 
Fast Track AIOps Automation with Prebuilt Databots
Fast Track AIOps Automation with Prebuilt DatabotsFast Track AIOps Automation with Prebuilt Databots
Fast Track AIOps Automation with Prebuilt Databots
 
IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer IBM Relay 2015: Cloud is All About the Customer
IBM Relay 2015: Cloud is All About the Customer
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...
How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...
How DixonsCarphone uses AppDynamics Application Analytics to Influence Busine...
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
SaaS Data Protection
SaaS Data ProtectionSaaS Data Protection
SaaS Data Protection
 

More from Raul Chong

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 

More from Raul Chong (14)

Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
Design thinking
Design thinkingDesign thinking
Design thinking
 
Risk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical IntroductionRisk and financial portfolio analytics - A technical Introduction
Risk and financial portfolio analytics - A technical Introduction
 
Introducing Bluemix
Introducing BluemixIntroducing Bluemix
Introducing Bluemix
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization Introduction
 
What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?What has IBM Watson been up to since the Jeopardy! challenge?
What has IBM Watson been up to since the Jeopardy! challenge?
 
Starting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data UniversityStarting your education in big data - Sneak peek to the new Big Data University
Starting your education in big data - Sneak peek to the new Big Data University
 
Developing wearable technology apps quickly
Developing wearable technology apps quicklyDeveloping wearable technology apps quickly
Developing wearable technology apps quickly
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Mobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - CloudantMobile solutions for iOS (and other platforms) - Cloudant
Mobile solutions for iOS (and other platforms) - Cloudant
 
Mobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - WorklightMobile solutions for iOS (and other platforms) - Worklight
Mobile solutions for iOS (and other platforms) - Worklight
 
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
Rapidly developing IoT (Internet of Things) applications - Part 2: Arduino, B...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

An Intro to Text Analytics on Big Data with a use case