SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Vrije Universiteit
MSc Information Sciences




     Maritime Safety Events Extraction
            from News Articles

                  Anastasios Martidis
            anastasios.martidis@student.vu.nl
                        July 31, 2012
   Supervisors:
   Willem R. van Hage, Dr
   Davide Ceolin, MSc
                                                1
Outline

 Introduction          Training Sets
 Information           System Overview
  Spectrum              Test sets
 Problem Statement     Evaluation
 Significance of       Results
  Research              Conclusions
 Research Questions
 Hypotheses
 Materials and
  Methods
                                           2
Introduction


 “We are drowning in information, and
  starved for knowledge. ”
  John Naisbitt




                                        3
Information Spectrum


Structured Data: Automatic Identification System (AIS)




                                                                                              theoceandreamer.files.wordpress.com/
                                                                                              2011/03/img_21861.jpg




                                                                               Free Text: News Articles



 http://www.tideway.nl/images/NorthWestEveningMail-
 PortSettoRockasTurbinesGetBoostfromaRollingstone-Walney2010-kleinbestan.jpg
                                                                                                                                     4
Problem Statement

News Articles:
 Descriptive and informative, but…
 Vast in number, daily growing and updated
 Free text, difficult to process automatically
 Generic Natural Language Processing tools:
 Popular and useful, but…
 Present limitations in recognizing specific
  types of maritime safety events and ship
  names
                                                  5
Significance of the Research

Applications                   Potential Stakeholders
 Risk assessments              Ship owners, operators
 Improvement of vessel          and managers
  safety standards              Insurance Companies
 Port facility security        Coast Guard
  assessments                   International Maritime
 Recognition of problematic     Organization (IMO)
  areas (Piracy)                International Maritime
 Identification of shipping     Security (IMS)
  companies, ships, ship        Private Security
  constructors with history      Companies (PCSs)
  in maritime safety events
 Maritime education and
  training


                                                          6
Research Questions

1.   Can we automatically process a news article in order to
     determine if it concerns a maritime safety event?

2.   Can we automatically extract a description of a maritime
     safety event? The objective of the description is to
     automatically recognize the type of maritime safety event,
     ships involved, location, date and time.

3.   Can we recognize relations and significance of the
     extracted information from the text?
       -Can we recognize the dominant event? Dominant
       event is considered the event that is primarily described
       in the news article.
       -Can we identify relations between extracted locations
       and specific event types described in the text?

                                                                   7
Hypotheses

1.   We can define sets of keywords that if are present in
     certain combinations in the text under processing, indicate
     that it concerns a maritime safety event.

2.   We can extract a description for the event described in
     the news article using rule based text classification and
     sets of keywords, datasets of ship names, regular
     expressions matching and Name Entity Recognition tasks.

3.   We can evaluate the extracted information from the text:
      -identifying the dominant event by measuring the
      frequency of keyword indicators for each event type
      -recognize relation between locations and event types
      by examining the position of locations and event type
      indicators in the text

                                                                   8
Materials & Methods

 Rule Based Text Classification
 Information Extraction
 OpenCalais
 NLTK
 AIS
 dbpedia




                                   9
Training Set

 200 news articles (retrieved from CBS news)
 100 related to maritime safety (53937 tokens)
 100 of general domains (47053 tokens)
 Word Frequency
 Maritime Safety Related     General Domains




                                                  10
Training Set Outcomes

 Manual  discrimination of significant words
 Categorize into sets of keywords by their
  meaning
 Use of keywords for text classification
 Mapping of keywords into maritime safety
  event types
 Use of keywords as event type indicators



                                                11
Text Classification

 Document    D
 Lists of keywords:
 L1, most frequent keywords
 L2, safety related keywords
 L3, vessel type keywords
 L4, maritime related keywords
 L5, naval hierarchy keywords
 L6, part of ship keywords
 L7, water based locations keywords



                                       12
Event Type Recognition

 Document   D,
 Event Types (ET):
  Piracy             Capsizing
  Sinking            Drifting
  Oil spill          Leakage
  Fire/Explosion     Evacuation
  Grounding          Collision




                                    13
Ship Names Extraction

 Datasetof ship names retrieved from AIS
  messages and dbpedia
 Comparison of the dataset entries to the
  text
 Compromises
  Location names
  Part of names




                                             14
Locations Extraction

 Use  of OpenCalais for NER tasks
 Interested in locations only
 Four types of locations recognized by
  Calais:
     Continent
     Country
     City
     Provenance or State




                                          15
Date and Time Extraction

 Chucked  sentences
 Pattern matching using regular
  expressions
  Numeric representation of date (e.g., 1322012, 22-07-12)
  Months (e.g., January or Jan.)
  Days (e.g., Monday or Mon.)
  Day periods (e.g., morning, afternoon)
  Time (e.g., 11:00am or 11.00 a.m.)

 Presented      in specific order for each
 sentence
                                                                16
Dominant Event Recognition

 For each list of event type indicators
  keywords
 Sum of keywords occurrence in the text
 Event type with the highest sum is
  predicted as the dominant event




                                           17
Location to Event Relations

 Chunked   sentences
 For every sentence containing an
  extracted location, if a keyword indicator
  of an event type also occurs in the same
  sentence
 Then is predicted that the location is
  related to the event type



                                               18
Test Set

 200 news articles (BBC, Reuters)
 100 maritime safety related
 100 of general domains (50 of them
  selected as an attempt to mislead the
  system)
 Each news article manually labeled and
  automatically processed by the system
 Comparison of the results to the labeled
  news article
                                             19
Labeled News Article




                       20
Results of the System




                        21
Evaluation




             22
Results: Text Classification



Precision: 100 %
Recall: 100 %
F-measure: 100 %




                                23
Results: Event Type Recognition



Precision: 88%
Recall: 97 %
F-measure: 92.2 %




                                   24
Results: Ship Name Extraction



Precision: 18.5%
Recall: 45.3%
F-measure: 26.3%




                                 25
Results: Location Extraction



Precision: 88.5%
Recall: 74.7%
F-measure: 81%




                                26
Results: Date and Time Extraction



Precision: 95.3%
Recall: 89.4%
F-measure: 92.3%




                                     27
Results: Dominant Event Recognition



Precision: 92%
Recall: 92%
F-measure: 92%




                                       28
Results: Location to Event Relations



Precision: 81%
Recall: 67.8%
F-measure: 73.8%




                                        29
Conclusions

 The  system accomplished the extraction
  of maritime safety events from news
  articles
 Overall performance of the system was
  satisfying
 The system can be improved and refined
 Ship names extraction require a different
  approach

                                              30
Vrije Universiteit
MSc Information Sciences




     Maritime Safety Events Extraction
            from News Articles

                  Anastasios Martidis
            anastasios.martidis@student.vu.nl
                        July 31, 2012
   Supervisors:
   Willem R. van Hage, Dr
   Davide Ceolin, MSc
                                                31

Weitere ähnliche Inhalte

Ähnlich wie Maritime safety events extraction from news articles

Disaster Planning Lightning
Disaster Planning   LightningDisaster Planning   Lightning
Disaster Planning Lightning
Dagrashley
 
VGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emfVGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emf
Vanessa Garcia
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
aphex34
 
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docxUnit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
marilucorr
 
Thin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsThin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the Unknowns
Michele Chubirka
 
The rise of the robot and the lie of resilience
The rise of the robot and the lie of resilienceThe rise of the robot and the lie of resilience
The rise of the robot and the lie of resilience
Girija Shettar
 
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
MedicineAndDermatology
 
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docxWORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
ambersalomon88660
 
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLYMaritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Marianne Molchan
 

Ähnlich wie Maritime safety events extraction from news articles (20)

Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
 
Linking Safety Culture & Safety Performance In Marine Transportation
Linking Safety Culture & Safety Performance In Marine TransportationLinking Safety Culture & Safety Performance In Marine Transportation
Linking Safety Culture & Safety Performance In Marine Transportation
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdfGramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
 
Hello dr. aguiar and classmates,for this week’s forum we were as
Hello dr. aguiar and classmates,for this week’s forum we were asHello dr. aguiar and classmates,for this week’s forum we were as
Hello dr. aguiar and classmates,for this week’s forum we were as
 
Disaster Planning Lightning
Disaster Planning   LightningDisaster Planning   Lightning
Disaster Planning Lightning
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
VGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emfVGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emf
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
 
Message mapping by Dr. Vincent Covello
Message mapping by Dr. Vincent CovelloMessage mapping by Dr. Vincent Covello
Message mapping by Dr. Vincent Covello
 
Future-proofing maritime ports against emerging cyber-physical threats
Future-proofing maritime ports against emerging cyber-physical threatsFuture-proofing maritime ports against emerging cyber-physical threats
Future-proofing maritime ports against emerging cyber-physical threats
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
 
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docxUnit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
 
Thin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsThin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the Unknowns
 
The rise of the robot and the lie of resilience
The rise of the robot and the lie of resilienceThe rise of the robot and the lie of resilience
The rise of the robot and the lie of resilience
 
Classifying Crises-Information Relevancy with Semantics
Classifying Crises-Information Relevancy with SemanticsClassifying Crises-Information Relevancy with Semantics
Classifying Crises-Information Relevancy with Semantics
 
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
 
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docxWORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
 
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLYMaritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
 
Seacurity Hacking for Defense 2017
Seacurity Hacking for Defense 2017Seacurity Hacking for Defense 2017
Seacurity Hacking for Defense 2017
 

Kürzlich hochgeladen

Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
dlhescort
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
lizamodels9
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
Renandantas16
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Dipal Arora
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
Matteo Carbone
 

Kürzlich hochgeladen (20)

Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
John Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfJohn Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdf
 

Maritime safety events extraction from news articles

  • 1. Vrije Universiteit MSc Information Sciences Maritime Safety Events Extraction from News Articles Anastasios Martidis anastasios.martidis@student.vu.nl July 31, 2012 Supervisors: Willem R. van Hage, Dr Davide Ceolin, MSc 1
  • 2. Outline  Introduction  Training Sets  Information  System Overview Spectrum  Test sets  Problem Statement  Evaluation  Significance of  Results Research  Conclusions  Research Questions  Hypotheses  Materials and Methods 2
  • 3. Introduction “We are drowning in information, and starved for knowledge. ” John Naisbitt 3
  • 4. Information Spectrum Structured Data: Automatic Identification System (AIS) theoceandreamer.files.wordpress.com/ 2011/03/img_21861.jpg Free Text: News Articles http://www.tideway.nl/images/NorthWestEveningMail- PortSettoRockasTurbinesGetBoostfromaRollingstone-Walney2010-kleinbestan.jpg 4
  • 5. Problem Statement News Articles:  Descriptive and informative, but…  Vast in number, daily growing and updated  Free text, difficult to process automatically  Generic Natural Language Processing tools:  Popular and useful, but…  Present limitations in recognizing specific types of maritime safety events and ship names 5
  • 6. Significance of the Research Applications Potential Stakeholders  Risk assessments  Ship owners, operators  Improvement of vessel and managers safety standards  Insurance Companies  Port facility security  Coast Guard assessments  International Maritime  Recognition of problematic Organization (IMO) areas (Piracy)  International Maritime  Identification of shipping Security (IMS) companies, ships, ship  Private Security constructors with history Companies (PCSs) in maritime safety events  Maritime education and training 6
  • 7. Research Questions 1. Can we automatically process a news article in order to determine if it concerns a maritime safety event? 2. Can we automatically extract a description of a maritime safety event? The objective of the description is to automatically recognize the type of maritime safety event, ships involved, location, date and time. 3. Can we recognize relations and significance of the extracted information from the text? -Can we recognize the dominant event? Dominant event is considered the event that is primarily described in the news article. -Can we identify relations between extracted locations and specific event types described in the text? 7
  • 8. Hypotheses 1. We can define sets of keywords that if are present in certain combinations in the text under processing, indicate that it concerns a maritime safety event. 2. We can extract a description for the event described in the news article using rule based text classification and sets of keywords, datasets of ship names, regular expressions matching and Name Entity Recognition tasks. 3. We can evaluate the extracted information from the text: -identifying the dominant event by measuring the frequency of keyword indicators for each event type -recognize relation between locations and event types by examining the position of locations and event type indicators in the text 8
  • 9. Materials & Methods  Rule Based Text Classification  Information Extraction  OpenCalais  NLTK  AIS  dbpedia 9
  • 10. Training Set  200 news articles (retrieved from CBS news)  100 related to maritime safety (53937 tokens)  100 of general domains (47053 tokens)  Word Frequency Maritime Safety Related General Domains 10
  • 11. Training Set Outcomes  Manual discrimination of significant words  Categorize into sets of keywords by their meaning  Use of keywords for text classification  Mapping of keywords into maritime safety event types  Use of keywords as event type indicators 11
  • 12. Text Classification  Document D  Lists of keywords: L1, most frequent keywords L2, safety related keywords L3, vessel type keywords L4, maritime related keywords L5, naval hierarchy keywords L6, part of ship keywords L7, water based locations keywords 12
  • 13. Event Type Recognition  Document D,  Event Types (ET): Piracy Capsizing Sinking Drifting Oil spill Leakage Fire/Explosion Evacuation Grounding Collision 13
  • 14. Ship Names Extraction  Datasetof ship names retrieved from AIS messages and dbpedia  Comparison of the dataset entries to the text  Compromises  Location names  Part of names 14
  • 15. Locations Extraction  Use of OpenCalais for NER tasks  Interested in locations only  Four types of locations recognized by Calais: Continent Country City Provenance or State 15
  • 16. Date and Time Extraction  Chucked sentences  Pattern matching using regular expressions  Numeric representation of date (e.g., 1322012, 22-07-12)  Months (e.g., January or Jan.)  Days (e.g., Monday or Mon.)  Day periods (e.g., morning, afternoon)  Time (e.g., 11:00am or 11.00 a.m.)  Presented in specific order for each sentence 16
  • 17. Dominant Event Recognition  For each list of event type indicators keywords  Sum of keywords occurrence in the text  Event type with the highest sum is predicted as the dominant event 17
  • 18. Location to Event Relations  Chunked sentences  For every sentence containing an extracted location, if a keyword indicator of an event type also occurs in the same sentence  Then is predicted that the location is related to the event type 18
  • 19. Test Set  200 news articles (BBC, Reuters)  100 maritime safety related  100 of general domains (50 of them selected as an attempt to mislead the system)  Each news article manually labeled and automatically processed by the system  Comparison of the results to the labeled news article 19
  • 21. Results of the System 21
  • 23. Results: Text Classification Precision: 100 % Recall: 100 % F-measure: 100 % 23
  • 24. Results: Event Type Recognition Precision: 88% Recall: 97 % F-measure: 92.2 % 24
  • 25. Results: Ship Name Extraction Precision: 18.5% Recall: 45.3% F-measure: 26.3% 25
  • 26. Results: Location Extraction Precision: 88.5% Recall: 74.7% F-measure: 81% 26
  • 27. Results: Date and Time Extraction Precision: 95.3% Recall: 89.4% F-measure: 92.3% 27
  • 28. Results: Dominant Event Recognition Precision: 92% Recall: 92% F-measure: 92% 28
  • 29. Results: Location to Event Relations Precision: 81% Recall: 67.8% F-measure: 73.8% 29
  • 30. Conclusions  The system accomplished the extraction of maritime safety events from news articles  Overall performance of the system was satisfying  The system can be improved and refined  Ship names extraction require a different approach 30
  • 31. Vrije Universiteit MSc Information Sciences Maritime Safety Events Extraction from News Articles Anastasios Martidis anastasios.martidis@student.vu.nl July 31, 2012 Supervisors: Willem R. van Hage, Dr Davide Ceolin, MSc 31