Harnessing the Crowds for Automating the Identification of Web APIs

•Als KEY, PDF herunterladen•

1 gefällt mir•12,097 views

Supporting the efficient discovery and use of Web APIs is increasingly important as their use and popularity grows. Yet, a simple task like finding potentially inter- esting APIs and their related documentation turns out to be hard and time consuming even when using the best resources currently available on the Web. We describe our research towards an automated Web API documentation crawler and search engine. We have devised and exploited crowdsourcing techniques to generate a curated dataset of Web APIs documentation. Thanks to this dataset, we have devised an engine able to automatically detect documentation pages. Our preliminary experiments have shown that we obtain an accuracy of 80% and a precision increase of 15 points over a keyword-based heuristic we have used as baseline.

Technologie Bildung

Harnessing The Crowds For Automating
The Identification Of Web APIs
Carlos Pedrinaci, Chenghua Lin, Dong Liu, John Domingue
KMi, The Open University

Web APIs are the Publicly offering valuable data and
functionality
new WEB services Widely used and reused
Although their use is hardly automated

Web APIs and RESTful
Services

• Services based on a simple(r) stack of
technologies than WS-*
• Roughly URL + HTTP + XML/JSON
• Easy way to provide a programmatic
interface to existing Web sites
• Seldom adopt REST principles

Issues for
Discovering Web APIs

• There is no simple way to effectively
and uniquely identify Web APIs
• No standardised document
describing the interface
• URLs are hardly usable for this end

Hypothesis

• Every Web API provides a/several
public documentation page(s)
• These pages provide the most relevant
information for developers
‣ Web API location can be approached as a
documentation discovery problem

Web API Given a Web page determine if it
documents an API or not
Identification Sometimes a hard problem even for
humans

Collecting Harnessing the crowds for detecting
documentation pages
documentation Pages

Generating a Often the links are obsolete or point to
general pages
curated dataset

Dataset Generated

• We used API Validator to process 1,872
APIs from ProgrammableWeb
• 43% of the URLs we started with
(data from 2010)
• 624 a documentation page
• 929 not a documentation page
• 318 skipped (server down or unclear)

Web API identification
Engine

• Web API identiﬁcation as a binary
classiﬁcation problem
• Extract core features from Web pages
• Use machine learning algorithms to
provide an identiﬁcation engine

Preliminary Experiment

• Used initially only Web page words as a
feature
• Trained two classiﬁers NB and SVM
• Used a simple keyword-based heuristic
as baseline for comparison (the
occurrence of 3 or more keywords)
• api, input, output, GET, PUT, etc

Evaluation Results

Model Precision Recall F1 Accuracy

Keyword 60.3 75.7 67.0 70.2

NB 71.0 79.2 74.8 78.6

SVM 75.4 70.8 73.1 79.0

Evaluation Results

• Although preliminary the approach
already provides promising results
• Both NB and SVM provide a good
accuracy (about 80%)
• Best Precision (75.4%) achieved by SVM
which is 15 points better than the
baseline

Conclusions and Future
Work
• Discovering Web APIs is becoming
increasingly important and existing
support is not optimal
• Web APIs identiﬁcation is a ﬁrst step
that can well be approached as a
documentation identiﬁcation problem
• Crowds input (ProgWeb and API
Validator) has been essential

Conclusions and
Future Work

• Further features are been included for
improving the results
• Title, URL, presence of camelCase
words
• Current tests have reached an
accuracy of 82% using SGD

Conclusions and
Future Work

• A larger training set is necessary
• Need more validated pages (help!)
• http://iserve-dev.kmi.open.ac.uk/validator/

• A larger experiment will be carried over
a normal Web crawl

Weitere ähnliche Inhalte

Was ist angesagt?

Pushing the Boundaries - A Deep-Dive into Real-World SharePoint Add-In and Ap...Eric Shupps

Yohan_CVYohan Martin

SenchaCon 2016: Oracle Forms Modernisation - Owen PaganSencha

Web Application Frameworks (WAF)Ako Kaman

Rest api to integrate with your siteHoang Nguyen

sell ideaRashmi Joshi

Web Accessibility Evaluation with WAVEJared Smith

Introduction to the Web APIBrad Genereaux

HTBYOOFIYRHT RubyConfSandy Vanderbleek

The SharePoint Survival Guide Top 10Eric Shupps

Benefits of developing single page web applications using angular jsHarbinger Systems - HRTech Builder of Choice

Single page applications with backbone jsGil Fink

Api crashHoang Nguyen

Aapkamanchrushabh_mehta

Fast Track introduction to ASP.NET MVCAnkit Kashyap

Was ist angesagt? (15)

Pushing the Boundaries - A Deep-Dive into Real-World SharePoint Add-In and Ap...

Yohan_CV

SenchaCon 2016: Oracle Forms Modernisation - Owen Pagan

Web Application Frameworks (WAF)

Rest api to integrate with your site

sell idea

Web Accessibility Evaluation with WAVE

Introduction to the Web API

HTBYOOFIYRHT RubyConf

The SharePoint Survival Guide Top 10

Benefits of developing single page web applications using angular js

Single page applications with backbone js

Api crash

Aapkamanch

Fast Track introduction to ASP.NET MVC

Andere mochten auch

Linked Services for the Web of DataCarlos Pedrinaci

iServe Version 1Carlos Pedrinaci

Supporting the virtual physiological human with semantics and services e scie...Carlos Pedrinaci

2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Web service recommendation via ex...IEEEFINALYEARSTUDENTPROJECT

Towards a Web of ServicesCarlos Pedrinaci

Web Mapping - exploiting location based information through eGovernmentDavid Hayward

Dieter Fensel's view on the future of Linked DataCarlos Pedrinaci

Semantics for the Web of ThingsCarlos Pedrinaci

Learn BEM: CSS Naming ConventionIn a Rocket

How to Build a Dynamic Social Media PlanPost Planner

Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldabaux singapore

SEO: Getting PersonalKirsty Hulse

Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Stanford GSB Corporate Governance Research Initiative

Andere mochten auch (13)

Linked Services for the Web of Data

iServe Version 1

Supporting the virtual physiological human with semantics and services e scie...

2014 IEEE JAVA PARALLEL DISTRIBUTED PROJECT Web service recommendation via ex...

Towards a Web of Services

Web Mapping - exploiting location based information through eGovernment

Dieter Fensel's view on the future of Linked Data

Semantics for the Web of Things

Learn BEM: CSS Naming Convention

How to Build a Dynamic Social Media Plan

Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba

SEO: Getting Personal

Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?

Ähnlich wie Harnessing the Crowds for Automating the Identification of Web APIs

apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...apidays

Lessons learned on the Azure API Stewardship Journey.pptxapidays

Do not automate GUI testingAtila Inovecký

Owin from spec to applicationdamian-h

Understanding Web servicesFabricio Epaminondas

ASP.NET MVC - Latest & Greatest So FarLohith Goudagere Nagaraj

Azure Functions Real World Examples Yochay Kiriaty

Single page applications the basicsChris Love

Build Modern Web Apps Using ASP.NET Web API and AngularJSTaiseer Joudeh

RESTful HATEOAS standards using Java based KatharsisKeith Moore

Web api using rest based architectureSoham Kulkarni

Amish Umesh - Future Of Web App Testing - ClubHack2007ClubHack

Cross-Lingual Web API Classificationmmaleshkova

Backbonification for dummies - Arrrrug 10/1/2012Dimitri de Putte

Selenium – Web Browser AutomationPakorn Weecharungsan

Best Practices in Web Service DesignLorna Mitchell

Planet of the APIs: Monitoring Transactions in the WildDeborah Schalm

Planet of the APIs: Monitoring Transactions in the WildDevOps.com

Web Based APIsJosh Schumacher

Ähnlich wie Harnessing the Crowds for Automating the Identification of Web APIs (20)

apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...

Lessons learned on the Azure API Stewardship Journey.pptx

Do not automate GUI testing

Owin from spec to application

Understanding Web services

ASP.NET MVC - Latest & Greatest So Far

Azure Functions Real World Examples

Single page applications the basics

Build Modern Web Apps Using ASP.NET Web API and AngularJS

RESTful HATEOAS standards using Java based Katharsis

Web api using rest based architecture

Amish Umesh - Future Of Web App Testing - ClubHack2007

Cross-Lingual Web API Classification

Backbonification for dummies - Arrrrug 10/1/2012

Selenium – Web Browser Automation

Best Practices in Web Service Design

Planet of the APIs: Monitoring Transactions in the Wild

Web Based APIs

Kürzlich hochgeladen

Take control of your SAP testing with UiPath Test SuiteDianaGray10

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

CloudStudio User manual (basic edition):comworks

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Kürzlich hochgeladen (20)

Take control of your SAP testing with UiPath Test Suite

TeamStation AI System Report LATAM IT Salaries 2024

Unleash Your Potential - Namagunga Girls Coding Club

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Vertex AI Gemini Prompt Engineering Tips

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

How AI, OpenAI, and ChatGPT impact business and software.

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Nell’iperspazio con Rocket: il Framework Web di Rust!

Powerpoint exploring the locations used in television show Time Clash

CloudStudio User manual (basic edition):

DSPy a system for AI to Write Prompts and Do Fine Tuning

DMCC Future of Trade Web3 - Special Edition

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Unraveling Multimodality with Large Language Models.pdf

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Ensuring Technical Readiness For Copilot in Microsoft 365

Dev Dives: Streamline document processing with UiPath Studio Web

Harnessing the Crowds for Automating the Identification of Web APIs

1. Harnessing The Crowds For Automating The Identification Of Web APIs Carlos Pedrinaci, Chenghua Lin, Dong Liu, John Domingue KMi, The Open University

2. Web APIs are the Publicly offering valuable data and functionality new WEB services Widely used and reused Although their use is hardly automated

3. Web APIs and RESTful Services • Services based on a simple(r) stack of technologies than WS-* • Roughly URL + HTTP + XML/JSON • Easy way to provide a programmatic interface to existing Web sites • Seldom adopt REST principles

7. How to Discover Web APIs?

8. Po or Res ul ts

9. OK Re su lts

10. Po or Res ul ts

11. Ou to fd ate

12. Issues for Discovering Web APIs • There is no simple way to effectively and uniquely identify Web APIs • No standardised document describing the interface • URLs are hardly usable for this end

13. How can we automatically find Web APIs?

14. Hypothesis • Every Web API provides a/several public documentation page(s) • These pages provide the most relevant information for developers ‣ Web API location can be approached as a documentation discovery problem

15. Web API Given a Web page determine if it documents an API or not Identification Sometimes a hard problem even for humans

16. Collecting Harnessing the crowds for detecting documentation pages documentation Pages

17. Generating a Often the links are obsolete or point to general pages curated dataset

18. Dataset Generated • We used API Validator to process 1,872 APIs from ProgrammableWeb • 43% of the URLs we started with (data from 2010) • 624 a documentation page • 929 not a documentation page • 318 skipped (server down or unclear)

19. Web API identification Engine • Web API identification as a binary classification problem • Extract core features from Web pages • Use machine learning algorithms to provide an identification engine

20. Preliminary Experiment • Used initially only Web page words as a feature • Trained two classiﬁers NB and SVM • Used a simple keyword-based heuristic as baseline for comparison (the occurrence of 3 or more keywords) • api, input, output, GET, PUT, etc

21. Evaluation Results Model Precision Recall F1 Accuracy Keyword 60.3 75.7 67.0 70.2 NB 71.0 79.2 74.8 78.6 SVM 75.4 70.8 73.1 79.0

22. Evaluation Results • Although preliminary the approach already provides promising results • Both NB and SVM provide a good accuracy (about 80%) • Best Precision (75.4%) achieved by SVM which is 15 points better than the baseline

23. Conclusions and Future Work • Discovering Web APIs is becoming increasingly important and existing support is not optimal • Web APIs identification is a first step that can well be approached as a documentation identification problem • Crowds input (ProgWeb and API Validator) has been essential

24. Conclusions and Future Work • Further features are been included for improving the results • Title, URL, presence of camelCase words • Current tests have reached an accuracy of 82% using SGD

25. Conclusions and Future Work • A larger training set is necessary • Need more validated pages (help!) • http://iserve-dev.kmi.open.ac.uk/validator/ • A larger experiment will be carried over a normal Web crawl

26. Thanks for your attention

Hinweis der Redaktion

\n
\n
\n
A Web API - the main page\n
A Web API - the documentation page\n
A Web API - the offered functionality and data\n(example of an invocation and the XML obtained)\n
\n
Google?\n
Prog Web\n\nIssues: \n- requires manual registration\n- gets out of date (example)\n- discovery remains at a very coarse grain (manual categorisation, or some keywords) \nNo notion of operations and resources provided, etc\n
Prog Web\n\nIssues: \n- requires manual registration\n- gets out of date (example)\n- discovery remains at a very coarse grain (manual categorisation, or some keywords) \nNo notion of operations and resources provided, etc\n
Prog Web\n\nIssues: \n- requires manual registration\n- gets out of date (example)\n- discovery remains at a very coarse grain (manual categorisation, or some keywords) \nNo notion of operations and resources provided, etc\n
\n
\n
Some example documentation pages\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n

Harnessing the Crowds for Automating the Identification of Web APIs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie Harnessing the Crowds for Automating the Identification of Web APIs

Ähnlich wie Harnessing the Crowds for Automating the Identification of Web APIs (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Harnessing the Crowds for Automating the Identification of Web APIs

Hinweis der Redaktion