SlideShare ist ein Scribd-Unternehmen logo
1 von 15
AmCAT3
Using Django for a scientific document analysis
 website: Tastypie, unit tests, R, open platforms
 and open questions



 Wouter van Atteveldt (VU Amsterdam)
AmCAT

    What is AmCAT?


    Design considerations


    Open data and the publication cycle


    Tables, TastyPie, and R


    Unit tests
What is AmCAT?

    Document management and analysis


    Aimed at social sciences and humanities
    
        Input: scraping, uploading
    
        Management: projects, selections
    
        Analyses: keyword analysis, linguistic processing
        (lemmatizing etc), manual annotation


    Open source, open standards, open access
Design Choices

    Default Django: web site backed by a database

    AmCAT: database with a web front end
Design Choices

    Default Django: web site backed by a database

    AmCAT: database with a web front end


    Data should be accessible from outside

    ORM should be usable without web site code

    DB should be final authentication/authorisation
Design choices

    Separate 'apps' for business, presentation

    Custom authentication middleware and user
    management
    
        save() and update() with using=
    
        database-specific code for creating users
    
        We don't actually like this too much...


    All data and methods (should be) exposed
    through web service API
Open data and Publication Cycle


         AmCAT               Navigator
                             (web site)

                             REST API
                   ORM
                            (web service)
                 (django)
 Relational
                             SPARQL         External scripts
    DB
                             End point      (Python, R, ...)
Open access publication cycle



    Source:                      Analysis:                    Publication:
DANS/AmCAT3
 (Linked) data                 R, matlab, ...   e.g. Sweave PDF + hyperlinks
                 Web service                       + Latex
                                                            Structured data?
                 'data link'
                 from site




                               Links back to
Tastypie + Datatables

    Django Model-based REST api

    Jquery datatables with AJAX call


    The good news:
    
        It works
    
        Unified point of entry for tables in website and
        scripts

    The bad news:
    
        Tastypie code horribly redundant
    
        (Unless we're doing it wrong!)
Unit tests

    Web pages tough to test well

    Move as much code as possible from
    presentation to business layer
    
        Trivial views need less testing
    
        Regular python modules easy to test



    Our choices:
    
        Put all unit tests in the 'target' module
    
        Put more complicated integration tests in tests/
        package
Bonus slide: Plugins

    Django (model)forms as interface description
    for plugins

    Plugins callable from web site, as web service,
    and from cli

    Single point of entry for actions

    (relation with REST data modification?)

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

How Cross-Organizational Cooperation on AI lead to changes in Norway
How Cross-Organizational Cooperation on AI lead to changes in NorwayHow Cross-Organizational Cooperation on AI lead to changes in Norway
How Cross-Organizational Cooperation on AI lead to changes in Norway
 
Pengguna internet Dunia & Asia
Pengguna internet Dunia & AsiaPengguna internet Dunia & Asia
Pengguna internet Dunia & Asia
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
Data visualization by Kenneth Odoh
Data visualization by Kenneth OdohData visualization by Kenneth Odoh
Data visualization by Kenneth Odoh
 
How to Write Amazing Functional Analysis Documents for your SharePoint Projects
How to Write Amazing Functional Analysis Documents for your SharePoint Projects How to Write Amazing Functional Analysis Documents for your SharePoint Projects
How to Write Amazing Functional Analysis Documents for your SharePoint Projects
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 

Ähnlich wie Using Django for a scientific document analysis (web) application

Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
Zaiyang Li
 
MyResume_Updated
MyResume_UpdatedMyResume_Updated
MyResume_Updated
Shiva Ram
 
Web Development Environments: Choose the best or go with the rest
Web Development Environments:  Choose the best or go with the restWeb Development Environments:  Choose the best or go with the rest
Web Development Environments: Choose the best or go with the rest
george.james
 

Ähnlich wie Using Django for a scientific document analysis (web) application (20)

Technology Stack Discussion
Technology Stack DiscussionTechnology Stack Discussion
Technology Stack Discussion
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS Lambda
 
Django, What is it, Why is it cool?
Django, What is it, Why is it cool?Django, What is it, Why is it cool?
Django, What is it, Why is it cool?
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
MyResume_Updated
MyResume_UpdatedMyResume_Updated
MyResume_Updated
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
Web Development Environments: Choose the best or go with the rest
Web Development Environments:  Choose the best or go with the restWeb Development Environments:  Choose the best or go with the rest
Web Development Environments: Choose the best or go with the rest
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Ramya devi R internet of things
Ramya devi R internet of thingsRamya devi R internet of things
Ramya devi R internet of things
 
Intro to-html-backbone
Intro to-html-backboneIntro to-html-backbone
Intro to-html-backbone
 
A Day in the Life of a Silicon Valley Startup
A Day in the Life of a Silicon Valley StartupA Day in the Life of a Silicon Valley Startup
A Day in the Life of a Silicon Valley Startup
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin Kropov
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Using Django for a scientific document analysis (web) application

  • 1. AmCAT3 Using Django for a scientific document analysis website: Tastypie, unit tests, R, open platforms and open questions Wouter van Atteveldt (VU Amsterdam)
  • 2. AmCAT  What is AmCAT?  Design considerations  Open data and the publication cycle  Tables, TastyPie, and R  Unit tests
  • 3. What is AmCAT?  Document management and analysis  Aimed at social sciences and humanities  Input: scraping, uploading  Management: projects, selections  Analyses: keyword analysis, linguistic processing (lemmatizing etc), manual annotation  Open source, open standards, open access
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Design Choices  Default Django: web site backed by a database  AmCAT: database with a web front end
  • 9. Design Choices  Default Django: web site backed by a database  AmCAT: database with a web front end  Data should be accessible from outside  ORM should be usable without web site code  DB should be final authentication/authorisation
  • 10. Design choices  Separate 'apps' for business, presentation  Custom authentication middleware and user management  save() and update() with using=  database-specific code for creating users  We don't actually like this too much...  All data and methods (should be) exposed through web service API
  • 11. Open data and Publication Cycle AmCAT Navigator (web site) REST API ORM (web service) (django) Relational SPARQL External scripts DB End point (Python, R, ...)
  • 12. Open access publication cycle Source: Analysis: Publication: DANS/AmCAT3 (Linked) data R, matlab, ... e.g. Sweave PDF + hyperlinks Web service + Latex Structured data? 'data link' from site Links back to
  • 13. Tastypie + Datatables  Django Model-based REST api  Jquery datatables with AJAX call  The good news:  It works  Unified point of entry for tables in website and scripts  The bad news:  Tastypie code horribly redundant  (Unless we're doing it wrong!)
  • 14. Unit tests  Web pages tough to test well  Move as much code as possible from presentation to business layer  Trivial views need less testing  Regular python modules easy to test  Our choices:  Put all unit tests in the 'target' module  Put more complicated integration tests in tests/ package
  • 15. Bonus slide: Plugins  Django (model)forms as interface description for plugins  Plugins callable from web site, as web service, and from cli  Single point of entry for actions  (relation with REST data modification?)

Hinweis der Redaktion

  1. Met open standards, open access bedoel ik dat de gegevens ontsloten zijn voor 'alle clients', niet alleen voor 'AmCAT' of python scripts, met behulp van open standards als SQL, RDF, HTTP, XML, JSON, etc. Op die manier kan een onderzoeker een eigen script schrijven in R, Perl, etc dat met AmCAT communiceert.
  2. Waarbij een codebook ook een taxonomie / ontologie / etc genoemd kan worden...