SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Introduction to
                        Text
                    Visualization




                   Dr. Anton Heijs
                        CEO
    Treparel
 Delftechpark 26
  2628 XH Delft       July 2012
The Netherlands
www.treparel.com
KMX enables information and knowledge professionals
to gain faster, reliable, more precise insights in large
complex unstructured data sets allowing them to make
better informed decisions.




                   Treparel is a leading technology solution provider in
                         Big Data Text Analytics & Visualization

Treparel KMX – All rights reserved 2012
Topics covered in this presentation


         • Who is Treparel?
         • Why visualize data?
         • How do we visualize data?
         • Multiple coupled views and interaction.




Treparel KMX – All rights reserved 2012   www.treparel.com   3
Nexus of Forces: Social, Cloud, Mobile, Information
         IT Market shift driving Big Data challenges
                                                                                 Copyright: Gartner, 2011




                 80% of data is Unstructured (Documents, Text, Images, Graphs)



Treparel KMX – All rights reserved 2012     www.treparel.com                                 4
About Treparel

         • Delft, The Netherlands, 2006.
         • Treparel is an innovative technology solution provider in Big Data
           Analytics, Text Mining and Visualization.
         • KMX is an integrated data analysis toolset which provide faster,
           reliable intelligent insights in large complex unstructured data sets to
           allow companies to make better informed decisions.
         • Clients: Philips, Bayer, Abbott, European Patent Office, European
           Commission
         • Part of Research Centers and University ecosystem; TU Delft,
           Universities of Paris and Sao Paulo
         • More info: www.treparel.com




Treparel KMX – All rights reserved 2012   www.treparel.com                        5
Positioning of Treparel’s KMX technology

Text Acquisition & Preparation             Analysis and processing       Output and display
‘Seek’                                     ‘Model’                       ‘Adapt’


External sources                                                         Reporting &
                                       Text preprocessing
Patents                                                                  Presentation
Legal
                                                                         Media and publishing
Research                               Indexing                          databases
Media / Publishers
                                                                         Content management
Other sources                          Clustering                        systems
Documents
Websites                                                                 Line-of-business
                                       Classification                    applications
Blogs
Newsfeeds                                                                Research applications
Email                                  Semantic Analysis
Application notes                                                        Search engines
Search results
Social networks                                              Visualization


               Information extraction (entities, facts, relationships, concepts, patents)
                                Management, Development and Configuration
 Treparel KMX – All rights reserved 2012                                     Copyright: Gartner, J. Popkin 2010
Why visualize data?

         • Gives bird’s eye view to comprehensive and
           large datasets
         • Discover previously unknown patterns of
           the dataset
         • Reveals inherent problems of the data (noise, outliers)
         • Enables both the examination of the large scale features
           of the dataset as well as the local features (frame of
           reference)
         • Allows the user to form hypotheses on developed
           insights and make better informed decisions




Treparel KMX – All rights reserved 2012   www.treparel.com            7
Why visualize data?

                                   Analyse without visualization




         •    Initial selection of training documents is ‘blind’
         •    No insight on how the data is separated
         •    Improving the Classifiertm a lengthy process
         •    No clear insights in the performance of a Classifiertm

Treparel KMX – All rights reserved 2012    www.treparel.com            8
Why visualize data?

                                      Analyse with visualization




         • Representive selection of training documents data by information
           professional
         • A clean training set is used (no noise/outliers) for building a Classifiertm
         • Visual feedback if the Classifiertm results in a separated section within
           the frame of reference of the whole dataset.
         • Insight into the performance of a Classifiertm
Treparel KMX – All rights reserved 2012     www.treparel.com                        9
How do we visualize data?

         • Create a representation we can use (vector space model)




         • Correct for (normalization):
                – Terms that appear often in most documents (non-discriminative
                  terms)
                – Length of documents (remove bias towards longer documents)


Treparel KMX – All rights reserved 2012   www.treparel.com                    10
3 types of visualization

         1. Visualize frequency distribution

         2. Visualize classification scores

         3. Visualize document similarity




Treparel KMX – All rights reserved 2012   www.treparel.com   11
1. Visualize frequency distribution

                Requires carefull interpretation




Treparel KMX – All rights reserved 2012   www.treparel.com   12
2. Visualize classification scores

                Parallel Coordinate Plot




Treparel KMX – All rights reserved 2012   www.treparel.com   13
3. Visualize document similarity

                Also visualizes the Classification scores




Treparel KMX – All rights reserved 2012   www.treparel.com   14
How do we visualize data?

         What is Least Squares Projection (LSP)?
         • LSP attempts to retain the neighborhood relations of
           objects in a high dimensional feature space when the
           data gets projected onto a two dimensional surface.
                                   Advantage                      Disadvantage

                Objects that are close to each other As a result of the projection to two
                (similar) in high dimensional feature dimensions (for human
                space are also close to each other in interpretation) information is lost.
                the LSP visualization.
                                                      This can result in clusters that for
                This results in the formation of
                                                      human interpretation are not
                clusters on the two dimensional
                                                      condensed in an identifiable cluster
                surface.
                                                      (e.g. scattered over the
                                                      visualization).
Treparel KMX – All rights reserved 2012        www.treparel.com                          15
3. Visualize document classification scores

                After applying LSP




Treparel KMX – All rights reserved 2012   www.treparel.com   16
Multiple coupled views and interaction




Treparel KMX – All rights reserved 2012   www.treparel.com   17
Treparel is a leading technology solution provider
       in Big Data Text Analytics & Visualization


                                              Treparel
                                           Delftechpark 26
                                            2628 XH Delft
                                          The Netherlands
                                          www.treparel.com


Treparel KMX – All rights reserved 2012      www.treparel.com   18

Weitere ähnliche Inhalte

Was ist angesagt?

Browsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchBrowsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchWagner Andreas
 
Servicios Terminológicos
Servicios TerminológicosServicios Terminológicos
Servicios TerminológicosPablo Pazos
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 

Was ist angesagt? (7)

Browsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchBrowsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted Search
 
Servicios Terminológicos
Servicios TerminológicosServicios Terminológicos
Servicios Terminológicos
 
Machine Learning - Intro
Machine Learning - IntroMachine Learning - Intro
Machine Learning - Intro
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 

Andere mochten auch

Data Visualization Tips & Tricks
Data Visualization Tips & TricksData Visualization Tips & Tricks
Data Visualization Tips & TricksKoenVerbeeck
 
Finding trends in large scale document sets
Finding trends in large scale document setsFinding trends in large scale document sets
Finding trends in large scale document setsTreparel
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel
 
Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)
Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)
Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)Treparel
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & VisualizationTreparel
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
 
PLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPlotly
 
Spatial analysis and modeling
Spatial analysis and modelingSpatial analysis and modeling
Spatial analysis and modelingTolasa_F
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial dataRahul Kumar
 

Andere mochten auch (9)

Data Visualization Tips & Tricks
Data Visualization Tips & TricksData Visualization Tips & Tricks
Data Visualization Tips & Tricks
 
Finding trends in large scale document sets
Finding trends in large scale document setsFinding trends in large scale document sets
Finding trends in large scale document sets
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013
 
Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)
Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)
Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
PLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization MethodsPLOTCON NYC: Text is data! Analysis and Visualization Methods
PLOTCON NYC: Text is data! Analysis and Visualization Methods
 
Spatial analysis and modeling
Spatial analysis and modelingSpatial analysis and modeling
Spatial analysis and modeling
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial data
 

Ähnlich wie Text and Data Visualization Introduction 2012

Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Cre-Aid
 
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...Dr. Haxel Consult
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
 
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...Dr. Haxel Consult
 
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...Insight Technology, Inc.
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...Edge AI and Vision Alliance
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. PresentationMediabistro
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012Anand Deshpande
 
Kahn.theodore
Kahn.theodoreKahn.theodore
Kahn.theodoreNASAPMC
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 

Ähnlich wie Text and Data Visualization Introduction 2012 (20)

Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
 
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
II-SDV 2013 Large scale Application of Text Mining and Visualization in the E...
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
 
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. Presentation
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012
 
Kahn.theodore
Kahn.theodoreKahn.theodore
Kahn.theodore
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 

Kürzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Text and Data Visualization Introduction 2012

  • 1. Introduction to Text Visualization Dr. Anton Heijs CEO Treparel Delftechpark 26 2628 XH Delft July 2012 The Netherlands www.treparel.com
  • 2. KMX enables information and knowledge professionals to gain faster, reliable, more precise insights in large complex unstructured data sets allowing them to make better informed decisions. Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization Treparel KMX – All rights reserved 2012
  • 3. Topics covered in this presentation • Who is Treparel? • Why visualize data? • How do we visualize data? • Multiple coupled views and interaction. Treparel KMX – All rights reserved 2012 www.treparel.com 3
  • 4. Nexus of Forces: Social, Cloud, Mobile, Information IT Market shift driving Big Data challenges Copyright: Gartner, 2011 80% of data is Unstructured (Documents, Text, Images, Graphs) Treparel KMX – All rights reserved 2012 www.treparel.com 4
  • 5. About Treparel • Delft, The Netherlands, 2006. • Treparel is an innovative technology solution provider in Big Data Analytics, Text Mining and Visualization. • KMX is an integrated data analysis toolset which provide faster, reliable intelligent insights in large complex unstructured data sets to allow companies to make better informed decisions. • Clients: Philips, Bayer, Abbott, European Patent Office, European Commission • Part of Research Centers and University ecosystem; TU Delft, Universities of Paris and Sao Paulo • More info: www.treparel.com Treparel KMX – All rights reserved 2012 www.treparel.com 5
  • 6. Positioning of Treparel’s KMX technology Text Acquisition & Preparation Analysis and processing Output and display ‘Seek’ ‘Model’ ‘Adapt’ External sources Reporting & Text preprocessing Patents Presentation Legal Media and publishing Research Indexing databases Media / Publishers Content management Other sources Clustering systems Documents Websites Line-of-business Classification applications Blogs Newsfeeds Research applications Email Semantic Analysis Application notes Search engines Search results Social networks Visualization Information extraction (entities, facts, relationships, concepts, patents) Management, Development and Configuration Treparel KMX – All rights reserved 2012 Copyright: Gartner, J. Popkin 2010
  • 7. Why visualize data? • Gives bird’s eye view to comprehensive and large datasets • Discover previously unknown patterns of the dataset • Reveals inherent problems of the data (noise, outliers) • Enables both the examination of the large scale features of the dataset as well as the local features (frame of reference) • Allows the user to form hypotheses on developed insights and make better informed decisions Treparel KMX – All rights reserved 2012 www.treparel.com 7
  • 8. Why visualize data? Analyse without visualization • Initial selection of training documents is ‘blind’ • No insight on how the data is separated • Improving the Classifiertm a lengthy process • No clear insights in the performance of a Classifiertm Treparel KMX – All rights reserved 2012 www.treparel.com 8
  • 9. Why visualize data? Analyse with visualization • Representive selection of training documents data by information professional • A clean training set is used (no noise/outliers) for building a Classifiertm • Visual feedback if the Classifiertm results in a separated section within the frame of reference of the whole dataset. • Insight into the performance of a Classifiertm Treparel KMX – All rights reserved 2012 www.treparel.com 9
  • 10. How do we visualize data? • Create a representation we can use (vector space model) • Correct for (normalization): – Terms that appear often in most documents (non-discriminative terms) – Length of documents (remove bias towards longer documents) Treparel KMX – All rights reserved 2012 www.treparel.com 10
  • 11. 3 types of visualization 1. Visualize frequency distribution 2. Visualize classification scores 3. Visualize document similarity Treparel KMX – All rights reserved 2012 www.treparel.com 11
  • 12. 1. Visualize frequency distribution Requires carefull interpretation Treparel KMX – All rights reserved 2012 www.treparel.com 12
  • 13. 2. Visualize classification scores Parallel Coordinate Plot Treparel KMX – All rights reserved 2012 www.treparel.com 13
  • 14. 3. Visualize document similarity Also visualizes the Classification scores Treparel KMX – All rights reserved 2012 www.treparel.com 14
  • 15. How do we visualize data? What is Least Squares Projection (LSP)? • LSP attempts to retain the neighborhood relations of objects in a high dimensional feature space when the data gets projected onto a two dimensional surface. Advantage Disadvantage Objects that are close to each other As a result of the projection to two (similar) in high dimensional feature dimensions (for human space are also close to each other in interpretation) information is lost. the LSP visualization. This can result in clusters that for This results in the formation of human interpretation are not clusters on the two dimensional condensed in an identifiable cluster surface. (e.g. scattered over the visualization). Treparel KMX – All rights reserved 2012 www.treparel.com 15
  • 16. 3. Visualize document classification scores After applying LSP Treparel KMX – All rights reserved 2012 www.treparel.com 16
  • 17. Multiple coupled views and interaction Treparel KMX – All rights reserved 2012 www.treparel.com 17
  • 18. Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization Treparel Delftechpark 26 2628 XH Delft The Netherlands www.treparel.com Treparel KMX – All rights reserved 2012 www.treparel.com 18