SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Requirement Analysis THE STAT PROJECT Milestone 1 Report
To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
Variation for importing dataset (File Sources)
Variations for importing dataset (File formats)
Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
Reuters dataset example
Simplified approach ,[object Object],[object Object],[object Object],[object Object],Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)
Able to  persist and read back  memory objects
Able to  visualize  memory objects
STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
STAT framework sample code (conceptual)
 
Domain Concept:  RawCorpus A collection of  RawDocument , supporting collection operations: - Add new  RawDocument   element - Remove existing  RawDocument   element - Accessing elements in the collection - …
Domain Concept:  RawCorpus abstract class  RawCorpus  { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
Domain Concept:  RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
Domain Concept:  RawDocument class  MyRawDocument  extends  RawDocument  { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class  RawDocument  { public RawDocument() {} }
Domain Concept:  Processor An object that processes  RawCorpus  and produces  Corpus .  - Linguistic:  Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
Domain Concept:  Corpus An object representing a collection of  Document   for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
Domain Concept:  Trainer A representation of a machine learning algorithm, which can learn from a  Corpus  and produce a  Model .
Domain Concept:  Model An object of what machine learning algorithm (i.e.,  Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e.,  Corpus )
Domain Concept:  Classifier An object that maps  Documents  to target values (label, number, probability). It takes a  Corpus  and a  Model  as inputs, and produces a  Prediction  associated with the  Corpus  according to the  Model .
Domain Concept:  Prediction A collection of target values (label, number, probability) that associate with a  Corpus , i.e., a collection of  Document .
Domain Concept:  Evaluator An object used for comparing the  Prediction  against its associated  Corpus  and generating  Evaluation
Domain Concept:  Evaluation A representation of evaluation result given by a  Evaluator , in a summarized manner.
THE STAT PROJECT Thanks
STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
STAT Domain Model Note : We ignore texts above lines for brevity  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
STAT Domain Model Note : We ignore texts above lines for brevity  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument

Weitere ähnliche Inhalte

Was ist angesagt?

Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 

Was ist angesagt? (19)

ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
 
Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern Fragments
 
Java stereams
Java stereamsJava stereams
Java stereams
 
9 Inputs & Outputs
9 Inputs & Outputs9 Inputs & Outputs
9 Inputs & Outputs
 
Data structure Unit-I Part A
Data structure Unit-I Part AData structure Unit-I Part A
Data structure Unit-I Part A
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Input output streams
Input output streamsInput output streams
Input output streams
 
C programming disk file reading and writing
C programming disk file reading and writingC programming disk file reading and writing
C programming disk file reading and writing
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource Index
 
input/ output in java
input/ output  in javainput/ output  in java
input/ output in java
 
EKAW - Linked Data Publishing
EKAW - Linked Data PublishingEKAW - Linked Data Publishing
EKAW - Linked Data Publishing
 
RapidMiner: Word Vector Tool And Rapid Miner
RapidMiner:  Word Vector Tool And Rapid MinerRapidMiner:  Word Vector Tool And Rapid Miner
RapidMiner: Word Vector Tool And Rapid Miner
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
File Handling in Java Oop presentation
File Handling in Java Oop presentationFile Handling in Java Oop presentation
File Handling in Java Oop presentation
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Java IO Package and Streams
Java IO Package and StreamsJava IO Package and Streams
Java IO Package and Streams
 

Andere mochten auch

Effective usecases
Effective usecasesEffective usecases
Effective usecases
am_iim
 
Software (requirement) analysis using uml
Software (requirement) analysis using umlSoftware (requirement) analysis using uml
Software (requirement) analysis using uml
Dhiraj Shetty
 
Example requirements specification
Example requirements specificationExample requirements specification
Example requirements specification
indrisrozas
 
Sample Business Requirement Document
Sample Business Requirement DocumentSample Business Requirement Document
Sample Business Requirement Document
Isabel Elaine Leong
 

Andere mochten auch (8)

Effective usecases
Effective usecasesEffective usecases
Effective usecases
 
Requirement analysis with use case
Requirement analysis with use caseRequirement analysis with use case
Requirement analysis with use case
 
Determining Requirements In System Analysis And Dsign
Determining Requirements In System Analysis And DsignDetermining Requirements In System Analysis And Dsign
Determining Requirements In System Analysis And Dsign
 
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
 
Software (requirement) analysis using uml
Software (requirement) analysis using umlSoftware (requirement) analysis using uml
Software (requirement) analysis using uml
 
Software Requirement Specification
Software Requirement SpecificationSoftware Requirement Specification
Software Requirement Specification
 
Example requirements specification
Example requirements specificationExample requirements specification
Example requirements specification
 
Sample Business Requirement Document
Sample Business Requirement DocumentSample Business Requirement Document
Sample Business Requirement Document
 

Ähnlich wie STAT Requirement Analysis

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
honey725342
 

Ähnlich wie STAT Requirement Analysis (20)

postgres loader
postgres loaderpostgres loader
postgres loader
 
ORDBMS.pptx
ORDBMS.pptxORDBMS.pptx
ORDBMS.pptx
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight Manual
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
iOS Application Development
iOS Application DevelopmentiOS Application Development
iOS Application Development
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
 
Java basics
Java basicsJava basics
Java basics
 
ODF Mashups
ODF MashupsODF Mashups
ODF Mashups
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 

Mehr von stat

Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09
stat
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1
stat
 
Requirementv4
Requirementv4Requirementv4
Requirementv4
stat
 
Stat2 25 09
Stat2 25 09Stat2 25 09
Stat2 25 09
stat
 
Requirment
RequirmentRequirment
Requirment
stat
 
Requirements - Part 1
Requirements - Part 1Requirements - Part 1
Requirements - Part 1
stat
 

Mehr von stat (6)

Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1
 
Requirementv4
Requirementv4Requirementv4
Requirementv4
 
Stat2 25 09
Stat2 25 09Stat2 25 09
Stat2 25 09
 
Requirment
RequirmentRequirment
Requirment
 
Requirements - Part 1
Requirements - Part 1Requirements - Part 1
Requirements - Part 1
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

STAT Requirement Analysis

  • 1. Requirement Analysis THE STAT PROJECT Milestone 1 Report
  • 2. To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
  • 3. Variation for importing dataset (File Sources)
  • 4. Variations for importing dataset (File formats)
  • 5. Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
  • 7.
  • 8. Able to persist and read back memory objects
  • 9. Able to visualize memory objects
  • 10. STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
  • 11. STAT framework sample code (conceptual)
  • 12.  
  • 13. Domain Concept: RawCorpus A collection of RawDocument , supporting collection operations: - Add new RawDocument element - Remove existing RawDocument element - Accessing elements in the collection - …
  • 14. Domain Concept: RawCorpus abstract class RawCorpus { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
  • 15. Domain Concept: RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
  • 16. Domain Concept: RawDocument class MyRawDocument extends RawDocument { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class RawDocument { public RawDocument() {} }
  • 17. Domain Concept: Processor An object that processes RawCorpus and produces Corpus . - Linguistic: Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
  • 18. Domain Concept: Corpus An object representing a collection of Document for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
  • 19. Domain Concept: Trainer A representation of a machine learning algorithm, which can learn from a Corpus and produce a Model .
  • 20. Domain Concept: Model An object of what machine learning algorithm (i.e., Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e., Corpus )
  • 21. Domain Concept: Classifier An object that maps Documents to target values (label, number, probability). It takes a Corpus and a Model as inputs, and produces a Prediction associated with the Corpus according to the Model .
  • 22. Domain Concept: Prediction A collection of target values (label, number, probability) that associate with a Corpus , i.e., a collection of Document .
  • 23. Domain Concept: Evaluator An object used for comparing the Prediction against its associated Corpus and generating Evaluation
  • 24. Domain Concept: Evaluation A representation of evaluation result given by a Evaluator , in a summarized manner.
  • 26. STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
  • 27. STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
  • 28. STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument