SlideShare ist ein Scribd-Unternehmen logo
1 von 82
Data-Intensive Researchwith DISPEL Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid (in collaboration with all the ADMIRE project consortium) Special thanks to Malcolm Atkinson, from whom most of the slides have been reused 29/05/11 Data-Intensive Research with DISPEL 1
There are manynames of manypeoplewhohavecontributedtotheseslides I am almostjust a simple story-tellerorwork done byothers… Difficulttoprovideallnames Especiallywhenyoufinishcompilingslidesthedaybefore.  Thisslidewill be completedforthe online versionwithallnames Forsimplicity, thankstothe ADMIRE consortiummembers Recognitionslide… 29/05/11 Data-Intensive Research with DISPEL 2
Overview 29/05/11 Data-Intensive Research with DISPEL 3 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
Overview 29/05/11 Data-Intensive Research with DISPEL 4 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 5
Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 6
Quasars are highly energetic cores of galaxies, where matter is falling into black holes, releasing prodigious quantities of energy in the process. Star-like in appearance (quasi-stellar radio sources)  Distinguishing quasars from stars requires information from the distribution of their light across the electromagnetic spectrum. Most star-like objects are stars not quasars. Quasars 29/05/11 Data-Intensive Research with DISPEL 7
Traditional method: Spectroscopic Expensive Slow Single object ,[object Object]
Research question: Does using 9 photometric bands improve the classification?Detection of quasars 29/05/11 Data-Intensive Research with DISPEL 8 Alternative method: Photometric Cheaper Quicker All objects in area Combine multiple bands to approximate spectroscopic study
Sloan Digital Sky Survey (SDSS) 450m astronomical features 5 optical wavelength bands (u, g, r, i, and z) 120,000 spectroscopically confirmed quasars UKIRT Infrared Deep Sky Survey (UKIDSS) 60m astronomical features 4 infrared wavelengths (Y, J, H, and K) Link table with distances between objects in SDSS and UKIDSS The data 29/05/11 Data-Intensive Research with DISPEL 9
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 10
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 11
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 12
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 13
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 14
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 15
Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 16
Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 17
AmbientNoise Data Processing 29/05/11 Data-Intensive Research with DISPEL 18 Time segmentation Filtering & normalization Cross-correlation & stacking . . .
High level workflow 29/05/11 Data-Intensive Research with DISPEL 19
Cross-correlations 29/05/11 Data-Intensive Research with DISPEL 20
Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 21
Business goal Recognizecustomersthatareprobable to quitcompanyservices Findout whichconditionsinfluence churning Knowledge discoveryphases Model training Designedand executed by data analyst Long-lasting and complicated Model exploitation Executedby domainexperts (callingagents) Quickand simple CRM CustomerChurnPrediction 29/05/11 Data-Intensive Research with DISPEL 22
Architecture 29/05/11 Data-Intensive Research with DISPEL 23 Model training (Workbench) Model exploitation (Portal) ResultsDeliveryToPortal Training data extraction Data transformation Classification Model training Test datasetextraction Model evaluation DeliveryToRepository ObtainingFromRepository
Business goal To find out hintsaboutadditional products or services to be provided to potentialcustomers Market analysis Knowledge discovery Get frequentitemsets/association/sequentialrules from historical data set CRM Cross-Selling 29/05/11 Data-Intensive Research with DISPEL 24 Roaming = TRUE & GSM_Prepaid = TRUE => Voice_mail = TRUE
Architecture 29/05/11 Data-Intensive Research with DISPEL 25 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery
Architecture 29/05/11 Data-Intensive Research with DISPEL 26 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery
Architecture 29/05/11 Data-Intensive Research with DISPEL 27 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery
Architecture 29/05/11 Data-Intensive Research with DISPEL 28 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery GSM_Prepaid = TRUE & Age = YOUNG => Roaming = TRUE
Architecture 29/05/11 Data-Intensive Research with DISPEL 29 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery GSM_Prepaid = TRUE & Age = YOUNG => Roaming = TRUE ... <_40:AssociationModel minimumConfidence="0.3" minimumSupport="0.01" numberOfItems="6" numberOfItemsets="12" numberOfRules="25" numberOfTransactions="4525">   <_40:Item id="0" value=„GSM_Prepaid=TRUE"/>  <_40:Item id="1" value=„Age=YOUNG"/>  <_40:Item id="2" value=„Roaming=TRUE”/> …
Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 30
Understand the gene function and interactions of genes in a mouse embryo Generate a collection of images by employing RNA in-situ hybridisation process Identify anatomical components expressing as a gene by annotating the images Understanding Mouse Embryos 29/05/11 Data-Intensive Research with DISPEL 31
Annotated Mouse Embryo 29/05/11 Data-Intensive Research with DISPEL 32
18,000 genes’ collection for mouse embryo established by RNA in-situ hybridisation 1,500 anatomical terms ontology used for annotations 4 Terabytes of images 80% manually annotated 20% remaining (over 85,000 images) TheNumbers 29/05/11 Data-Intensive Research with DISPEL 33
EURExpress-II Workflow 29/05/11 Data-Intensive Research with DISPEL 34
What do theyallhave in common? MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 35
TheKnowledgeDiscoveryProcess 29/05/11 Data-Intensive Research with DISPEL 36
Commoncharacteristics Needfor a range of data mining and integrationfunctionalities Large-scale data Most of traditional/widely-availabletools are notenough Needtomanagestreaming-basedmodels Sometimeshighcomputationaldemand Domainexpertsbecome data mining and integrationexperts, and evendistributedcomputingexperts Suchspecialised human resources are difficulttofind MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 37
What do weneedthen?  An all-in-one framework that combines… Data integration, processing and mining Extensible with domain specific requirements e.g., ADQL queries in Astronomy Support for reusable building blocks  e.g., n-fold validation Support for distributed and parallel execution of workflows Native support for a streaming data model e.g., ordered merge joins Automated optimisation Motivational examples 29/05/11 Data-Intensive Research with DISPEL 38
What do weneedthen (cont.)?  A frameworkthatseparatesconcerns of DomainExperts Theyunderstandtheproblems of theirdomain, and thedatasetsto be used Data-intensiveAnalysts Theyunderstandtheknowledgediscoveryprocess and knowthealgorithms and techniquesto be used (e.g., association rules, clustering, etc.) Data-intensiveEngineers Theyunderstandthefoundations of distributedcomputing, and theirplatforms and technologies (e.g., Grid Computing, Clouds, etc.) Motivational examples 29/05/11 Data-Intensive Research with DISPEL 39
Overview 29/05/11 Data-Intensive Research with DISPEL 40 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 41
The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 42 Data-exploitation develops diversity and supports a wide range of user behaviours, tools and services
The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 43 Data-service providers compete for a consolidated load offering generic or specialised services and business models
The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 44 The main focus of this talk
Domain Experts using tools tuned to their way of working that behind the familiar façade generate DISPEL Data-Intensive Analysts writing and describing specific PEs composing & packaging PEs using predefined patterns  Data-Intensive Engineers Providing distributed data-intensive platforms Delivering distributed data-intensive services Defining patterns for data-intensive computations Optimising the enactments and platform performance DISPEL enableslooselycoupling 29/05/11 Data-Intensive Research with DISPEL 45 Domain experts do not read and write DISPEL. They discuss parameters and graphs with Data-Intensive Analysts. They work by controlling enactments via their familiar tools: portals, spreadsheets, R, Matlab, ...  But there are Domain experts who are also Data-Intensive Analysts! Particularly in research and academic contexts. Data-Intensive Analysts read and write DISPEL. They are experts in data mining, text mining, image processing, time series analysis, statistics, etc. They may discuss parameters and graphs with Domain experts. They work in a familiar development environment such as Eclipse. They discuss DISPEL patterns, sentences and performance with Data-IntensiveEngineers. They expect robust enactment and effective optimisation.  Data-Intensive Engineers read and write DISPEL, and build data-intensive platforms. They rarely meet Domain Experts.  They are committed to improving all stages of DISPEL processing. They talk with Data-Intensive Analysts to help them do their work and to better understand requirements and workloads.
Separation of concerns 29/05/11 Data-Intensive Research with DISPEL 46
DISPEL: anexample… 29/05/11 Data-Intensive Research with DISPEL 47 use admire.dataAccess.relational.DAS1;	// Get definition of DAS1 use admire.transforms.statistical.Stats;	// Get definition of Stats  use admire.dataAccess.relational.DAS2;	// Get definition of DAS2  String q1 = “SELECT * FROM db.table1”;  // Define literals String q2 = “SELECT * FROM db.table2”;  String update = “INSERT ? INTO db2.columnStatistics”;  DAS1 das1 = new DAS1; // Create PEs Stats stats = new Stats;  DAS2 das2 = new DAS2;  |- q1; q2 -| => das1.query;  // Create graph das1.result => stats.in;  stats.out=> das2.data;  |- update; update -| => das2.SQL;
Data-IntensiveAnalysts. ModelBuilding 29/05/11 Data-Intensive Research with DISPEL 48
Data-IntensiveAnalysts. ModelDeployment 29/05/11 Data-Intensive Research with DISPEL 49
Data-IntensiveAnalysts. Lower-level of Details 29/05/11 Data-Intensive Research with DISPEL 50
Overview 29/05/11 Data-Intensive Research with DISPEL 51 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
User-defined functions encapsulating a data transforming algorithm PE descriptions A unique name A short definition of what they do A precise description of their input and output streams a structure of Connections A precise description of their iterative behaviour A precise description of their termination and error reporting The (S&D)type propagation rules from inputs to outputs A precise description of their properties that may permit or limit optimisation Their known subtype hierarchy ProcessingElements 29/05/11 Data-Intensive Research with DISPEL 52
ProcessingElements 29/05/11 Data-Intensive Research with DISPEL 53
PEs are instantiated before they are used in an enactment newPE_expression There may be many instances of a given PE Think PE is a class PEI is an instance of that class Assertions may refine the properties of a PE instance newSQLquerywith data as :[<Integer i, j; Real r; String s>] ProcessingElementInstances 29/05/11 Data-Intensive Research with DISPEL 54 Stating the structural type of this particular instance’s result; the programmer knows the query and schema it will be used with.
Connections carry a stream of data from a PE instance to a PE instance 1 source => multiple receivers Typically a PE processes one element of the stream at a time These elements are as small as makes computational sense a tuple a row of a matrix The system is responsible for buffering and optimising their flow pass by reference when possible serialised and compressed for long-haul data movement only buffer to disk when requested or buffer spill unavoidable Connections 29/05/11 Data-Intensive Research with DISPEL 55
Two types describe the values passed structural type (Stype) the format / representation of the elemental value domain type (Dtype) the `meaning’ of the elemental value Connections may have finite or continuous streams Stream end, EoS, indicates no more data available A PE transmits EoS when it has no more data to send A connection may transmit a “no more” message from receiver to source Receiver discard throws away data it sends a “no more” message immediately Stream literals have the form     |- expression -| Connections 29/05/11 Data-Intensive Research with DISPEL 56
The default termination behaviour occurs when either all the inputs are exhausted or all the receivers of outputs have indicated they do not want more data When all of a PE’s inputs have received EoS a PE completes the use of its current data then sends an EoS on all of its outputs then stops When all of a PE’s outputs have received a “no more” a PE sends a “no more” on all of its inputs then stops Termination should propagate across a distributed enactment there may be a stop operation & external event as well PE Termination 29/05/11 Data-Intensive Research with DISPEL 57 This is the default, a PE may stop when a particular stream delivers EoS This is the default, a PE may stop when a particular stream receives “no more”
packageeu.admire{      TypeConverterPEis PE(            <Connection:Any::Thing input>  =>          <Connection:Any::Thing output>  );       TypeCombiner isPE(           <Connection[ ] inputs> =>           <Connection output>  ); TypeErrorStreamisConnection:                      < error:String::”lang:ErrorMessage”; culprit >;      TypeProgrammableCombineris PE(          <Connection[] inputs; Connection:String::”lang:JavascriptCode” controlExpression> =>  <Connection output;       ErrorStreamerrors >  );       registerConverterPE, Combiner, ErrorStream, ProgrammableCombiner;} Languagetypes 29/05/11 Data-Intensive Research with DISPEL 58
PEs as subtypes of PEs 29/05/11 Data-Intensive Research with DISPEL 59 Indicate order of inputs is not significant   package eu.admire{  use eu.admire.Combiner; use eu.admire.ProgrammableCombiner; Type SymmetricCombiner is Combiner with inputs permutable; Type SymmetricProgrammableCombiner is ProgrammableCombiner with inputs permutable; register SymmetricCombiner, SymmetricGenericCombiner;} Make these new PE types subtypes of the previously declared types.
Overview 29/05/11 Data-Intensive Research with DISPEL 60 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 61 packagecoral.reef.ecology{ StypeImage is Pixel[ ][ ]; Stype Camera is <Integer cNumber; String cType; Real x, y, z, theta, phi, alpha,                     aperture, magnification, frequency; Real[ ] settings; rest >; Stype Illuminator is <Integer iNumber; String iType; Real x, y, z, theta, phi,   beamWidth, intensity, iFrequency, duration; Real[ ] iSettings; rest >; Stype Frame is <Time t; Image[ ] images; Camera[ ] cameras;                                Illuminator[ ] lighting>; register Image, Camera, Illuminator, Frame;   }
Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 62 packagecoral.reef.ecology{ namespacecre“Coral_Reef_Ecology.observingStation.terms:”; use coral.reef.ecology.Frame; Stype Object is <Real x, y, z, radius>; StypeObjectMapis <Frame:: cre:Primary_PIV_data frame;                               Object[ ]:: cre:Putative_Individuals objects>; TypeObjectRecogniseris PE (       <Connection: Frame:: cre:PrimaryPIVdataframes> =>       <Connection: ObjectMap:: cre:First_ReconstructionputativeIndividuals>    ); register Object, ObjectMap, ObjectRecogniser;}
Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 63 packagecoral.reef.ecology{ namespacecre“Coral_Reef_Ecology.observingStation.terms:”; use coral.reef.ecology.ObjectMap; Stype Individual is <Object:: cre:Individual_SubjectconfirmeIndividual;                                     Integer:: cre:Unique_Arbitrary_TagidNumber;                                     String:: cre:Species_Or_Inert taxa; Boolean swimmer;                                    Real:: cre:Mass_Estimate1_Kilograms mass>; StypeIndividualMapis <Frame:: cre:Primary_PIV_data frame;                                           Individual[ ]:: cre:Confirmed_Individuals individuals>; TypeIndividualRecogniseris PE (       <Connection: ObjectMap:: cre: First_ReconstructionputativeIndividuals       > =>       <Connection: IndividualMap:: cre:Second_ReconstructiontaggedIndividuals       >    ); register Individual, IndividualMap, IndividualRecogniser;}
Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 64 packagecoral.reef.ecology{ namespacecre“Coral_Reef_Ecology.observingStation.terms:”; use coral.reef.ecology.IndividualMap; StypeMovingIndividualis <Individual:: cre:Tagged_SubjectconfirmeIndividual;                                                 Real[ ]:: cre:Meters_Per_Second[ ] velocity;                                                Real:: cre:JouleskineticEnergy>; StypeMovementMapis <Frame:: cre:Primary_PIV_data frame; MovingIndividual[ ]:: cre:Moving_Individuals individuals>; TypeMovementRecogniseris PE (       <Connection: IndividualMap:: cre:Second_ReconstructiontaggedIndividuals       > =>       <Connection: MovementMap:: cre:Third_ReconstructionmovingIndividuals       >    ); register MovingIndividual, MovementMap, MovementRecogniser;}
Overview 29/05/11 Data-Intensive Research with DISPEL 65 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
Coupling portlet to DISPEL function 29/05/11 Data-Intensive Research with DISPEL 66 Step 1: user inputs solicited values 1, A Step 2: portal system validates values Step 3: portal system constructs DISPEL sentence with these values as parameters of a provided function Step 4: portal system sends DISPEL sentence to gateway Steps 5 to n: gateway and systems behind it validate and enact the sentence Step n+1: gateway sends summary/partial results to progress viewer Step n+m: gateway sends final (summary) results to result viewer
Thefunctionthatiscalled 29/05/11 Data-Intensive Research with DISPEL 67 package eu.admire.seismology{  use eu.admire.seismology.proj1portlet4invokeCorrelations; use eu.admire.Time;   Time startTime = <year=1996, day = 53, seconds = 0>;   Time endTime = <year=1996, day = 54, seconds = 0>;   Real minFreq = 0.01; //frequencies considered in Hertz   Real maxFreq = 1.0;    Real maxOffset = 10*60*60; proj1portlet4invokeCorrelations(                               startTime, endTime, minFreq, maxFreq, maxOffset );} Values solicited from user and inserted into a minimal template; some of these could equally well be `constants’ embedded as hidden defaults in the template
Overview 29/05/11 Data-Intensive Research with DISPEL 68 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
DISPEL evaluation 29/05/11 Data-Intensive Research with DISPEL 69 A DISPEL sentence is prepared… Sent to a gateway… which may inspect it and the sender’s credentials and then accept it and initiate enactment Enacted in four phases DISPEL language processing  to produce a graph and/or register definitions  Optimisation Deployment  across hosting platforms Execution and control including termination and tidying
Architecture (e.g., ACRM) 29/05/11 Data-Intensive Research with DISPEL 70 Submission of the DISPEL document (model exploitation) Data mining expert Telco domain expert ACRM Portal GW2 Workbench GW1 GW3 Gateways Repository Submission of the DISPEL document (model trainingprocess) Delivery of trainingresults/ Obtainingtrainingresults Obtainingtrainingresults by domainexpert
Overview 29/05/11 Data-Intensive Research with DISPEL 71 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
“Making the hourglass bottleneck narrower” ,[object Object],By means of describing patterns Functions and composite PEs …withrichsemantics Coreontologyforthesedescriptions Domain-specificontologies can be incorporated …plus human-focuseddescriptions (sincedomainexpertsmustunderstandthem) Dublincoreproperties, social discussion, etc. The DISPEL Registry 29/05/11 Data-Intensive Research with DISPEL 72
Semantics in DISPEL processing 29/05/11 Data-Intensive Research with DISPEL 73
Registry Contents Initial library of 81 domain-dependent and independent Processing Elements DISPEL PE library initialisation file Many PEs inherited from OGSA-DAI activities Incorporating PEs from ADMIRE use cases Open distributed registration of new domain-specific (or generic) PEs to start soon Sorting out the social networking part
Advanced Functionalities Find candidate PEs and functions Find “similar” PEs Queries for sibling PEs in the ontology hierarchy Find “compatible” and “functionally-equivalent” PEs and functions if explicitly-defined in the ontology Infer “compatible” PEs and functions
Reusing the myExperiment frontend
Example of a PE list
Example of a structural type
Overview 29/05/11 Data-Intensive Research with DISPEL 79 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
DISPEL resources 29/05/11 Data-Intensive Research with DISPEL 80
DISPEL resources 29/05/11 Data-Intensive Research with DISPEL 81

Weitere ähnliche Inhalte

Was ist angesagt?

Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Robert Grossman
 
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysGoon83
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsDatabricks
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)Robert Grossman
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Robert Grossman
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Robert Grossman
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 

Was ist angesagt? (20)

Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
 
SuperGraph visualization
SuperGraph visualizationSuperGraph visualization
SuperGraph visualization
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 

Ähnlich wie Data Intensive Research with DISPEL

Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationAlan Sill
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsAdvanced-Concepts-Team
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big datacthanopoulos
 
To architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositoriesTo architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositoriesjiscdatapool
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudData Finder
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Gridnoho
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data MiningPhdtopiccom
 
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...OpenNebula Project
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical ExcellenceNETWAYS
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
2010 Future of Advanced Computing
2010 Future of Advanced Computing2010 Future of Advanced Computing
2010 Future of Advanced ComputingBob Marcus
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
 

Ähnlich wie Data Intensive Research with DISPEL (20)

Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and Innovation
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA DatalabsPablo Gomez - Solving Large-scale Challenges with ESA Datalabs
Pablo Gomez - Solving Large-scale Challenges with ESA Datalabs
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big data
 
To architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositoriesTo architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositories
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data Mining
 
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
2010 Future of Advanced Computing
2010 Future of Advanced Computing2010 Future of Advanced Computing
2010 Future of Advanced Computing
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
03 data mining : data warehouse
03 data mining : data warehouse03 data mining : data warehouse
03 data mining : data warehouse
 

Mehr von Oscar Corcho

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Oscar Corcho
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management Oscar Corcho
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaOscar Corcho
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceOscar Corcho
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101Oscar Corcho
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET Oscar Corcho
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016Oscar Corcho
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Oscar Corcho
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
 

Mehr von Oscar Corcho (20)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 

Kürzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Data Intensive Research with DISPEL

  • 1. Data-Intensive Researchwith DISPEL Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid (in collaboration with all the ADMIRE project consortium) Special thanks to Malcolm Atkinson, from whom most of the slides have been reused 29/05/11 Data-Intensive Research with DISPEL 1
  • 2. There are manynames of manypeoplewhohavecontributedtotheseslides I am almostjust a simple story-tellerorwork done byothers… Difficulttoprovideallnames Especiallywhenyoufinishcompilingslidesthedaybefore. Thisslidewill be completedforthe online versionwithallnames Forsimplicity, thankstothe ADMIRE consortiummembers Recognitionslide… 29/05/11 Data-Intensive Research with DISPEL 2
  • 3. Overview 29/05/11 Data-Intensive Research with DISPEL 3 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 4. Overview 29/05/11 Data-Intensive Research with DISPEL 4 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 5. Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 5
  • 6. Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 6
  • 7. Quasars are highly energetic cores of galaxies, where matter is falling into black holes, releasing prodigious quantities of energy in the process. Star-like in appearance (quasi-stellar radio sources) Distinguishing quasars from stars requires information from the distribution of their light across the electromagnetic spectrum. Most star-like objects are stars not quasars. Quasars 29/05/11 Data-Intensive Research with DISPEL 7
  • 8.
  • 9. Research question: Does using 9 photometric bands improve the classification?Detection of quasars 29/05/11 Data-Intensive Research with DISPEL 8 Alternative method: Photometric Cheaper Quicker All objects in area Combine multiple bands to approximate spectroscopic study
  • 10. Sloan Digital Sky Survey (SDSS) 450m astronomical features 5 optical wavelength bands (u, g, r, i, and z) 120,000 spectroscopically confirmed quasars UKIRT Infrared Deep Sky Survey (UKIDSS) 60m astronomical features 4 infrared wavelengths (Y, J, H, and K) Link table with distances between objects in SDSS and UKIDSS The data 29/05/11 Data-Intensive Research with DISPEL 9
  • 11. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 10
  • 12. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 11
  • 13. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 12
  • 14. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 13
  • 15. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 14
  • 16. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 15
  • 17. Data IntegrationWorkflow 29/05/11 Data-Intensive Research with DISPEL 16
  • 18. Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 17
  • 19. AmbientNoise Data Processing 29/05/11 Data-Intensive Research with DISPEL 18 Time segmentation Filtering & normalization Cross-correlation & stacking . . .
  • 20. High level workflow 29/05/11 Data-Intensive Research with DISPEL 19
  • 22. Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 21
  • 23. Business goal Recognizecustomersthatareprobable to quitcompanyservices Findout whichconditionsinfluence churning Knowledge discoveryphases Model training Designedand executed by data analyst Long-lasting and complicated Model exploitation Executedby domainexperts (callingagents) Quickand simple CRM CustomerChurnPrediction 29/05/11 Data-Intensive Research with DISPEL 22
  • 24. Architecture 29/05/11 Data-Intensive Research with DISPEL 23 Model training (Workbench) Model exploitation (Portal) ResultsDeliveryToPortal Training data extraction Data transformation Classification Model training Test datasetextraction Model evaluation DeliveryToRepository ObtainingFromRepository
  • 25. Business goal To find out hintsaboutadditional products or services to be provided to potentialcustomers Market analysis Knowledge discovery Get frequentitemsets/association/sequentialrules from historical data set CRM Cross-Selling 29/05/11 Data-Intensive Research with DISPEL 24 Roaming = TRUE & GSM_Prepaid = TRUE => Voice_mail = TRUE
  • 26. Architecture 29/05/11 Data-Intensive Research with DISPEL 25 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery
  • 27. Architecture 29/05/11 Data-Intensive Research with DISPEL 26 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery
  • 28. Architecture 29/05/11 Data-Intensive Research with DISPEL 27 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery
  • 29. Architecture 29/05/11 Data-Intensive Research with DISPEL 28 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery GSM_Prepaid = TRUE & Age = YOUNG => Roaming = TRUE
  • 30. Architecture 29/05/11 Data-Intensive Research with DISPEL 29 CRM Database Data access Data Preparation AssociationRule Mining Rulesdelivery GSM_Prepaid = TRUE & Age = YOUNG => Roaming = TRUE ... <_40:AssociationModel minimumConfidence="0.3" minimumSupport="0.01" numberOfItems="6" numberOfItemsets="12" numberOfRules="25" numberOfTransactions="4525"> <_40:Item id="0" value=„GSM_Prepaid=TRUE"/> <_40:Item id="1" value=„Age=YOUNG"/> <_40:Item id="2" value=„Roaming=TRUE”/> …
  • 31. Astronomy: detection of quasars Seismology: ambientnoise data processing CRM: customerchurn and cross-selling Genetics: understanding mouse embryos MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 30
  • 32. Understand the gene function and interactions of genes in a mouse embryo Generate a collection of images by employing RNA in-situ hybridisation process Identify anatomical components expressing as a gene by annotating the images Understanding Mouse Embryos 29/05/11 Data-Intensive Research with DISPEL 31
  • 33. Annotated Mouse Embryo 29/05/11 Data-Intensive Research with DISPEL 32
  • 34. 18,000 genes’ collection for mouse embryo established by RNA in-situ hybridisation 1,500 anatomical terms ontology used for annotations 4 Terabytes of images 80% manually annotated 20% remaining (over 85,000 images) TheNumbers 29/05/11 Data-Intensive Research with DISPEL 33
  • 35. EURExpress-II Workflow 29/05/11 Data-Intensive Research with DISPEL 34
  • 36. What do theyallhave in common? MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 35
  • 38. Commoncharacteristics Needfor a range of data mining and integrationfunctionalities Large-scale data Most of traditional/widely-availabletools are notenough Needtomanagestreaming-basedmodels Sometimeshighcomputationaldemand Domainexpertsbecome data mining and integrationexperts, and evendistributedcomputingexperts Suchspecialised human resources are difficulttofind MotivationalExamples 29/05/11 Data-Intensive Research with DISPEL 37
  • 39. What do weneedthen? An all-in-one framework that combines… Data integration, processing and mining Extensible with domain specific requirements e.g., ADQL queries in Astronomy Support for reusable building blocks e.g., n-fold validation Support for distributed and parallel execution of workflows Native support for a streaming data model e.g., ordered merge joins Automated optimisation Motivational examples 29/05/11 Data-Intensive Research with DISPEL 38
  • 40. What do weneedthen (cont.)? A frameworkthatseparatesconcerns of DomainExperts Theyunderstandtheproblems of theirdomain, and thedatasetsto be used Data-intensiveAnalysts Theyunderstandtheknowledgediscoveryprocess and knowthealgorithms and techniquesto be used (e.g., association rules, clustering, etc.) Data-intensiveEngineers Theyunderstandthefoundations of distributedcomputing, and theirplatforms and technologies (e.g., Grid Computing, Clouds, etc.) Motivational examples 29/05/11 Data-Intensive Research with DISPEL 39
  • 41. Overview 29/05/11 Data-Intensive Research with DISPEL 40 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 42. The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 41
  • 43. The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 42 Data-exploitation develops diversity and supports a wide range of user behaviours, tools and services
  • 44. The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 43 Data-service providers compete for a consolidated load offering generic or specialised services and business models
  • 45. The DISPEL Hourglass 29/05/11 Data-Intensive Research with DISPEL 44 The main focus of this talk
  • 46. Domain Experts using tools tuned to their way of working that behind the familiar façade generate DISPEL Data-Intensive Analysts writing and describing specific PEs composing & packaging PEs using predefined patterns Data-Intensive Engineers Providing distributed data-intensive platforms Delivering distributed data-intensive services Defining patterns for data-intensive computations Optimising the enactments and platform performance DISPEL enableslooselycoupling 29/05/11 Data-Intensive Research with DISPEL 45 Domain experts do not read and write DISPEL. They discuss parameters and graphs with Data-Intensive Analysts. They work by controlling enactments via their familiar tools: portals, spreadsheets, R, Matlab, ... But there are Domain experts who are also Data-Intensive Analysts! Particularly in research and academic contexts. Data-Intensive Analysts read and write DISPEL. They are experts in data mining, text mining, image processing, time series analysis, statistics, etc. They may discuss parameters and graphs with Domain experts. They work in a familiar development environment such as Eclipse. They discuss DISPEL patterns, sentences and performance with Data-IntensiveEngineers. They expect robust enactment and effective optimisation. Data-Intensive Engineers read and write DISPEL, and build data-intensive platforms. They rarely meet Domain Experts. They are committed to improving all stages of DISPEL processing. They talk with Data-Intensive Analysts to help them do their work and to better understand requirements and workloads.
  • 47. Separation of concerns 29/05/11 Data-Intensive Research with DISPEL 46
  • 48. DISPEL: anexample… 29/05/11 Data-Intensive Research with DISPEL 47 use admire.dataAccess.relational.DAS1; // Get definition of DAS1 use admire.transforms.statistical.Stats; // Get definition of Stats use admire.dataAccess.relational.DAS2; // Get definition of DAS2 String q1 = “SELECT * FROM db.table1”; // Define literals String q2 = “SELECT * FROM db.table2”; String update = “INSERT ? INTO db2.columnStatistics”; DAS1 das1 = new DAS1; // Create PEs Stats stats = new Stats; DAS2 das2 = new DAS2; |- q1; q2 -| => das1.query; // Create graph das1.result => stats.in; stats.out=> das2.data; |- update; update -| => das2.SQL;
  • 49. Data-IntensiveAnalysts. ModelBuilding 29/05/11 Data-Intensive Research with DISPEL 48
  • 50. Data-IntensiveAnalysts. ModelDeployment 29/05/11 Data-Intensive Research with DISPEL 49
  • 51. Data-IntensiveAnalysts. Lower-level of Details 29/05/11 Data-Intensive Research with DISPEL 50
  • 52. Overview 29/05/11 Data-Intensive Research with DISPEL 51 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 53. User-defined functions encapsulating a data transforming algorithm PE descriptions A unique name A short definition of what they do A precise description of their input and output streams a structure of Connections A precise description of their iterative behaviour A precise description of their termination and error reporting The (S&D)type propagation rules from inputs to outputs A precise description of their properties that may permit or limit optimisation Their known subtype hierarchy ProcessingElements 29/05/11 Data-Intensive Research with DISPEL 52
  • 55. PEs are instantiated before they are used in an enactment newPE_expression There may be many instances of a given PE Think PE is a class PEI is an instance of that class Assertions may refine the properties of a PE instance newSQLquerywith data as :[<Integer i, j; Real r; String s>] ProcessingElementInstances 29/05/11 Data-Intensive Research with DISPEL 54 Stating the structural type of this particular instance’s result; the programmer knows the query and schema it will be used with.
  • 56. Connections carry a stream of data from a PE instance to a PE instance 1 source => multiple receivers Typically a PE processes one element of the stream at a time These elements are as small as makes computational sense a tuple a row of a matrix The system is responsible for buffering and optimising their flow pass by reference when possible serialised and compressed for long-haul data movement only buffer to disk when requested or buffer spill unavoidable Connections 29/05/11 Data-Intensive Research with DISPEL 55
  • 57. Two types describe the values passed structural type (Stype) the format / representation of the elemental value domain type (Dtype) the `meaning’ of the elemental value Connections may have finite or continuous streams Stream end, EoS, indicates no more data available A PE transmits EoS when it has no more data to send A connection may transmit a “no more” message from receiver to source Receiver discard throws away data it sends a “no more” message immediately Stream literals have the form |- expression -| Connections 29/05/11 Data-Intensive Research with DISPEL 56
  • 58. The default termination behaviour occurs when either all the inputs are exhausted or all the receivers of outputs have indicated they do not want more data When all of a PE’s inputs have received EoS a PE completes the use of its current data then sends an EoS on all of its outputs then stops When all of a PE’s outputs have received a “no more” a PE sends a “no more” on all of its inputs then stops Termination should propagate across a distributed enactment there may be a stop operation & external event as well PE Termination 29/05/11 Data-Intensive Research with DISPEL 57 This is the default, a PE may stop when a particular stream delivers EoS This is the default, a PE may stop when a particular stream receives “no more”
  • 59. packageeu.admire{ TypeConverterPEis PE( <Connection:Any::Thing input> => <Connection:Any::Thing output> ); TypeCombiner isPE( <Connection[ ] inputs> => <Connection output> ); TypeErrorStreamisConnection: < error:String::”lang:ErrorMessage”; culprit >; TypeProgrammableCombineris PE( <Connection[] inputs; Connection:String::”lang:JavascriptCode” controlExpression> => <Connection output; ErrorStreamerrors > ); registerConverterPE, Combiner, ErrorStream, ProgrammableCombiner;} Languagetypes 29/05/11 Data-Intensive Research with DISPEL 58
  • 60. PEs as subtypes of PEs 29/05/11 Data-Intensive Research with DISPEL 59 Indicate order of inputs is not significant package eu.admire{ use eu.admire.Combiner; use eu.admire.ProgrammableCombiner; Type SymmetricCombiner is Combiner with inputs permutable; Type SymmetricProgrammableCombiner is ProgrammableCombiner with inputs permutable; register SymmetricCombiner, SymmetricGenericCombiner;} Make these new PE types subtypes of the previously declared types.
  • 61. Overview 29/05/11 Data-Intensive Research with DISPEL 60 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 62. Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 61 packagecoral.reef.ecology{ StypeImage is Pixel[ ][ ]; Stype Camera is <Integer cNumber; String cType; Real x, y, z, theta, phi, alpha, aperture, magnification, frequency; Real[ ] settings; rest >; Stype Illuminator is <Integer iNumber; String iType; Real x, y, z, theta, phi, beamWidth, intensity, iFrequency, duration; Real[ ] iSettings; rest >; Stype Frame is <Time t; Image[ ] images; Camera[ ] cameras; Illuminator[ ] lighting>; register Image, Camera, Illuminator, Frame; }
  • 63. Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 62 packagecoral.reef.ecology{ namespacecre“Coral_Reef_Ecology.observingStation.terms:”; use coral.reef.ecology.Frame; Stype Object is <Real x, y, z, radius>; StypeObjectMapis <Frame:: cre:Primary_PIV_data frame; Object[ ]:: cre:Putative_Individuals objects>; TypeObjectRecogniseris PE ( <Connection: Frame:: cre:PrimaryPIVdataframes> => <Connection: ObjectMap:: cre:First_ReconstructionputativeIndividuals> ); register Object, ObjectMap, ObjectRecogniser;}
  • 64. Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 63 packagecoral.reef.ecology{ namespacecre“Coral_Reef_Ecology.observingStation.terms:”; use coral.reef.ecology.ObjectMap; Stype Individual is <Object:: cre:Individual_SubjectconfirmeIndividual; Integer:: cre:Unique_Arbitrary_TagidNumber; String:: cre:Species_Or_Inert taxa; Boolean swimmer; Real:: cre:Mass_Estimate1_Kilograms mass>; StypeIndividualMapis <Frame:: cre:Primary_PIV_data frame; Individual[ ]:: cre:Confirmed_Individuals individuals>; TypeIndividualRecogniseris PE ( <Connection: ObjectMap:: cre: First_ReconstructionputativeIndividuals > => <Connection: IndividualMap:: cre:Second_ReconstructiontaggedIndividuals > ); register Individual, IndividualMap, IndividualRecogniser;}
  • 65. Stypes and Dtypes 29/05/11 Data-Intensive Research with DISPEL 64 packagecoral.reef.ecology{ namespacecre“Coral_Reef_Ecology.observingStation.terms:”; use coral.reef.ecology.IndividualMap; StypeMovingIndividualis <Individual:: cre:Tagged_SubjectconfirmeIndividual; Real[ ]:: cre:Meters_Per_Second[ ] velocity; Real:: cre:JouleskineticEnergy>; StypeMovementMapis <Frame:: cre:Primary_PIV_data frame; MovingIndividual[ ]:: cre:Moving_Individuals individuals>; TypeMovementRecogniseris PE ( <Connection: IndividualMap:: cre:Second_ReconstructiontaggedIndividuals > => <Connection: MovementMap:: cre:Third_ReconstructionmovingIndividuals > ); register MovingIndividual, MovementMap, MovementRecogniser;}
  • 66. Overview 29/05/11 Data-Intensive Research with DISPEL 65 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 67. Coupling portlet to DISPEL function 29/05/11 Data-Intensive Research with DISPEL 66 Step 1: user inputs solicited values 1, A Step 2: portal system validates values Step 3: portal system constructs DISPEL sentence with these values as parameters of a provided function Step 4: portal system sends DISPEL sentence to gateway Steps 5 to n: gateway and systems behind it validate and enact the sentence Step n+1: gateway sends summary/partial results to progress viewer Step n+m: gateway sends final (summary) results to result viewer
  • 68. Thefunctionthatiscalled 29/05/11 Data-Intensive Research with DISPEL 67 package eu.admire.seismology{ use eu.admire.seismology.proj1portlet4invokeCorrelations; use eu.admire.Time; Time startTime = <year=1996, day = 53, seconds = 0>; Time endTime = <year=1996, day = 54, seconds = 0>; Real minFreq = 0.01; //frequencies considered in Hertz Real maxFreq = 1.0; Real maxOffset = 10*60*60; proj1portlet4invokeCorrelations( startTime, endTime, minFreq, maxFreq, maxOffset );} Values solicited from user and inserted into a minimal template; some of these could equally well be `constants’ embedded as hidden defaults in the template
  • 69. Overview 29/05/11 Data-Intensive Research with DISPEL 68 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 70. DISPEL evaluation 29/05/11 Data-Intensive Research with DISPEL 69 A DISPEL sentence is prepared… Sent to a gateway… which may inspect it and the sender’s credentials and then accept it and initiate enactment Enacted in four phases DISPEL language processing to produce a graph and/or register definitions Optimisation Deployment across hosting platforms Execution and control including termination and tidying
  • 71. Architecture (e.g., ACRM) 29/05/11 Data-Intensive Research with DISPEL 70 Submission of the DISPEL document (model exploitation) Data mining expert Telco domain expert ACRM Portal GW2 Workbench GW1 GW3 Gateways Repository Submission of the DISPEL document (model trainingprocess) Delivery of trainingresults/ Obtainingtrainingresults Obtainingtrainingresults by domainexpert
  • 72. Overview 29/05/11 Data-Intensive Research with DISPEL 71 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 73.
  • 74. Semantics in DISPEL processing 29/05/11 Data-Intensive Research with DISPEL 73
  • 75. Registry Contents Initial library of 81 domain-dependent and independent Processing Elements DISPEL PE library initialisation file Many PEs inherited from OGSA-DAI activities Incorporating PEs from ADMIRE use cases Open distributed registration of new domain-specific (or generic) PEs to start soon Sorting out the social networking part
  • 76. Advanced Functionalities Find candidate PEs and functions Find “similar” PEs Queries for sibling PEs in the ontology hierarchy Find “compatible” and “functionally-equivalent” PEs and functions if explicitly-defined in the ontology Infer “compatible” PEs and functions
  • 78. Example of a PE list
  • 79. Example of a structural type
  • 80. Overview 29/05/11 Data-Intensive Research with DISPEL 79 Motivational examples DISPEL: a language for data-driven research Architecture DISPEL components Processing Elements Types Functions DISPEL processing/evaluation The role of the DISPEL gateway The role of the DISPEL registry DISPEL resources
  • 81. DISPEL resources 29/05/11 Data-Intensive Research with DISPEL 80
  • 82. DISPEL resources 29/05/11 Data-Intensive Research with DISPEL 81
  • 83. Data-Intensive Researchwith DISPEL Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid (in collaboration with all the ADMIRE project consortium) Special thanks to Malcolm Atkinson, from whom most of the slides have been reused 29/05/11 Data-Intensive Research with DISPEL 82