SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Text-Mining:
Big Data Analytics voor ongestructureerde data
Prof dr ir Jan C. Scholtes
https://textmining.nu
Prof dr ir Jan C. Scholtes
3
Exploratory Search
4
Text Mining
Text Mining: The next step in
Search Technology
Finding without knowing exactly what
you’re looking for, or finding what
apparently isn’t there (or who do not
want to be found …).
5
5
•Social network analysis
•Community Detection
•Different types of
visualization for
temporal, geographical,
semantic or relational
mappings.
•Anomaly Detection
•Decision Tree
•Bayes Classifiers
•Rochio
•k-NN
•Support Vector Machines
•Clustering
•CNN
•LSTM
•Entity extraction
•Fact, Event & Concept
extraction
•Negations, co-reference
resolution
•Grammars
•Statistical methods: Hidden
Markov Models, Maximum
Entropy Models, Conditional
Random Fields, …
•Data normalization
(Ontology matching)
•Inverted file index
•Relevance ranking
•Relevance feedback
•Faceted search
•Incomplete matching
•Index compression
•Precision & Recall
Search
Information
Extraction
Link Analysis
& Data
Visualization
Machine
Learning
6
Language_Name English
CITY New Brunswick, WASHINGTON
COMPANY J&J, Johnson & Johnson
COUNTRY Greece, Poland, Romania, United Kingdom
CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD
DATE 04-08
DAY Fri, Friday
NOUN_GROUP
biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations,
intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several
days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past
year, paying kickbacks, same time, several new positions, similar violations, travel gifts
ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan
PEOPLES Iraqi
PERSON Erik Gordon, Mythili Raman, William Weldon
PLACE_REGION Europe
PRODUCT Benadryl, Tylenol
PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil
STATE N.J.
TIME 1:32 pm ET
TIME_PERIOD 13 years, five years, six months, three years
YEAR 2007
Problem
"We went to the government to report improper payments and have taken full responsibility for these actions," said
William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where
millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign
Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain
business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in
illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver
the bribes.
Sentiment
giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing
foreign officials, what is honest
Request make sure it complies with anti-bribery laws across its businesses
7
WHAT happened?
8
WHO
8
9
WHAT-WHEN: Topic Rivers
10
WHY & WHO: Emotion Detection
11
Anomaly Detection
Σ(Φ)
12
Text Mining the Lord of the Rings
• Automatic
identification of
key players
(custodians)
• Automatic
identification of
locations.
• Automatic
identification of
travel patterns of
key players.
• Visualize in time.
Memory Consistency
24/7
Speed &
Scalability
Search
M&A and
Restructuring
Data
Collection
Analytics
eDiscovery,
Regulatory
Requests,
Investigations,
Fact-Finding
Missions
Reporting
Archiving
Knowledge
Management
Production
Big Data Analytics and the Law
ZyLAB used as e-
Discovery & e-Disclosure
standard for all United
Nations-backed War Crime
Tribunals and ongoing UN
courts
16SLIDE / 16
• FOIA (WOB)
• Audits &
Internal Investigations
• Litigation
• Arbitration
• Answering Regulatory
Requests
• Subject Access
Requests
• Right to be Forgotten
eDiscovery
17
3x more relevant
documents than
Boolean search
No complex queries, just
review documents
2x total number of
relevant documents
is all that need to be
reviewed
Estimate
accurately percentage of all
relevant documents found at
end
Teach the computer what to look for …
18
CCPA
SLIDE / 19
GDPR & AVG: Aflakken, anonimiseren, …
SLIDE / 20
Hoe werkt dat?
Search Pattern Recognition Text-Mining
Thank you!
Time for Q&A
Prof dr ir Jan C. Scholtes
https://www.linkedin.com/in/jscholtes/
https://textmining.nu

Weitere ähnliche Inhalte

Ähnlich wie Text mining scholtes - big data congress utrecht 2019

1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxshyam1985
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
 
Cognitive Legal Science V5
Cognitive Legal Science  V5Cognitive Legal Science  V5
Cognitive Legal Science V5Howard Moskowitz
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021andygustafson
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Stuart Shulman
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for DiscoveryDayOne
 
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesPremNarayanan6
 

Ähnlich wie Text mining scholtes - big data congress utrecht 2019 (20)

Ona 2012
Ona 2012Ona 2012
Ona 2012
 
benfords Law
benfords Lawbenfords Law
benfords Law
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Cognitive Legal Science V5
Cognitive Legal Science  V5Cognitive Legal Science  V5
Cognitive Legal Science V5
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
 
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical Devices
 
mineria de datos
mineria de datosmineria de datos
mineria de datos
 
mineria datos
mineria datosmineria datos
mineria datos
 

Mehr von jcscholtes

Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029jcscholtes
 
LegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote ScholtesLegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote Scholtesjcscholtes
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Taskjcscholtes
 
Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101jcscholtes
 
Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030jcscholtes
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
Hogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal AnalyticsHogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal Analyticsjcscholtes
 
HvA Legaltech Lab Opening
HvA Legaltech Lab OpeningHvA Legaltech Lab Opening
HvA Legaltech Lab Openingjcscholtes
 
Big Data en Data Science en de Rechtspraak
Big Data en Data Science en de RechtspraakBig Data en Data Science en de Rechtspraak
Big Data en Data Science en de Rechtspraakjcscholtes
 
How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?jcscholtes
 
Big data analytics for legal fact finding
Big data analytics for legal fact findingBig data analytics for legal fact finding
Big data analytics for legal fact findingjcscholtes
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finaljcscholtes
 
Efficiently Handling Subject Access Requests
Efficiently Handling Subject Access RequestsEfficiently Handling Subject Access Requests
Efficiently Handling Subject Access Requestsjcscholtes
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftjcscholtes
 

Mehr von jcscholtes (14)

Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029
 
LegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote ScholtesLegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote Scholtes
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
 
Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101
 
Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Hogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal AnalyticsHogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal Analytics
 
HvA Legaltech Lab Opening
HvA Legaltech Lab OpeningHvA Legaltech Lab Opening
HvA Legaltech Lab Opening
 
Big Data en Data Science en de Rechtspraak
Big Data en Data Science en de RechtspraakBig Data en Data Science en de Rechtspraak
Big Data en Data Science en de Rechtspraak
 
How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?
 
Big data analytics for legal fact finding
Big data analytics for legal fact findingBig data analytics for legal fact finding
Big data analytics for legal fact finding
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
 
Efficiently Handling Subject Access Requests
Efficiently Handling Subject Access RequestsEfficiently Handling Subject Access Requests
Efficiently Handling Subject Access Requests
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeft
 

Kürzlich hochgeladen

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Text mining scholtes - big data congress utrecht 2019

  • 1. Text-Mining: Big Data Analytics voor ongestructureerde data Prof dr ir Jan C. Scholtes https://textmining.nu
  • 2. Prof dr ir Jan C. Scholtes
  • 4. 4 Text Mining Text Mining: The next step in Search Technology Finding without knowing exactly what you’re looking for, or finding what apparently isn’t there (or who do not want to be found …).
  • 5. 5 5 •Social network analysis •Community Detection •Different types of visualization for temporal, geographical, semantic or relational mappings. •Anomaly Detection •Decision Tree •Bayes Classifiers •Rochio •k-NN •Support Vector Machines •Clustering •CNN •LSTM •Entity extraction •Fact, Event & Concept extraction •Negations, co-reference resolution •Grammars •Statistical methods: Hidden Markov Models, Maximum Entropy Models, Conditional Random Fields, … •Data normalization (Ontology matching) •Inverted file index •Relevance ranking •Relevance feedback •Faceted search •Incomplete matching •Index compression •Precision & Recall Search Information Extraction Link Analysis & Data Visualization Machine Learning
  • 6. 6 Language_Name English CITY New Brunswick, WASHINGTON COMPANY J&J, Johnson & Johnson COUNTRY Greece, Poland, Romania, United Kingdom CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD DATE 04-08 DAY Fri, Friday NOUN_GROUP biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations, intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past year, paying kickbacks, same time, several new positions, similar violations, travel gifts ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan PEOPLES Iraqi PERSON Erik Gordon, Mythili Raman, William Weldon PLACE_REGION Europe PRODUCT Benadryl, Tylenol PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil STATE N.J. TIME 1:32 pm ET TIME_PERIOD 13 years, five years, six months, three years YEAR 2007 Problem "We went to the government to report improper payments and have taken full responsibility for these actions," said William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver the bribes. Sentiment giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing foreign officials, what is honest Request make sure it complies with anti-bribery laws across its businesses
  • 10. 10 WHY & WHO: Emotion Detection
  • 12. 12 Text Mining the Lord of the Rings • Automatic identification of key players (custodians) • Automatic identification of locations. • Automatic identification of travel patterns of key players. • Visualize in time.
  • 13.
  • 14. Memory Consistency 24/7 Speed & Scalability Search M&A and Restructuring Data Collection Analytics eDiscovery, Regulatory Requests, Investigations, Fact-Finding Missions Reporting Archiving Knowledge Management Production Big Data Analytics and the Law
  • 15. ZyLAB used as e- Discovery & e-Disclosure standard for all United Nations-backed War Crime Tribunals and ongoing UN courts
  • 16. 16SLIDE / 16 • FOIA (WOB) • Audits & Internal Investigations • Litigation • Arbitration • Answering Regulatory Requests • Subject Access Requests • Right to be Forgotten eDiscovery
  • 17. 17 3x more relevant documents than Boolean search No complex queries, just review documents 2x total number of relevant documents is all that need to be reviewed Estimate accurately percentage of all relevant documents found at end Teach the computer what to look for …
  • 19. SLIDE / 19 GDPR & AVG: Aflakken, anonimiseren, …
  • 20. SLIDE / 20 Hoe werkt dat? Search Pattern Recognition Text-Mining
  • 21. Thank you! Time for Q&A Prof dr ir Jan C. Scholtes https://www.linkedin.com/in/jscholtes/ https://textmining.nu