3. Big Data -4Vs
Big
Data
Volume
Velocity
Variety
Veracity
• Must be considered at: data acquisition, data
processing and data display (“fresh” results) level
• A need to find a solution to accommodate all 3
levels
• It is an important concern to most SC
• Common feeling “better integration solution of
wider variety of data leads to better statistics”
• Most help in this direction is needed by SC1 and SC5
and it remains an important aspect for all SCs
• Decisions depend on results of statistics which are as
good as the data quality which is used
5. Functional requirements
Societal Challenges have specific requirement
needs and we did not see a way to cluster them
Next, we started looking at common functional
requirements over all Societal Challenges
These functional blocks will be reflected in
components which are:
o base platform components that are prepared in WP4;
o pilot-specific components that are already available;
o pilot-specific components that will be developed for the
pilot.
6. Functional Blocks–Data acquisition
Data ingestion Real time traffic data ingestion SC4
Data ingestion via ftp, or otherwise from an
intermediate (local) processing level
SC3
Monitor multiple text services and stored
together with provenance and any other
metadata
SC7
Download satellite images SC7
Data
integration
Integrate multiple metadata services, including
Strabon for geodata
SC5
Datasets are aligned and linked at data
ingestion time
SC1,
SC2
7. Functional Blocks – Data analysis
Data
type
Multimedia
data
Pattern recognition algorithms SC2,
SC7
Detection of changes in areas of interest using
satellite image
SC7
Automatic video coding data SC4
Web data Event detection over text SC7
Sensor
data
Signal processing SC3
Stored data Analysis of historical data SC7
8. Functional Blocks –Data analysis 2
Modelling Complex agricultural and environmental modelling SC2
Model parameterization SC3
Prediction and
forecasting
Demand forecasting SC4
Traffic flow prediction SC4
Support the incremental, data-oriented carrying out
of climate-related experiments
SC5
Assessment Power production and operation assessment SC3
Planning Power production prognostics SC3
Maintenance planning SC3
Annotation tools Text annotation SC2
Statistical
analysis
Integration with IBM SPSS Statistics SC6
Integration with Matlab SC6
9. Functional Blocks – Data curation
Variance Measure the quality of the information SC4,
SC7
Track provenance during the chain of executions SC1
Usage of Open
PHACTS
platform
functionalities
Aligning and linking various datasets SC1
Data update Periodical updates of the datasets SC1
Cleaning Spike removal, integrity check, faulty data labelling SC3
10. Functional Blocks – Data storage
Store a set of geocoded areas SC7
Storing intermediate data products and associated data lineage SC5
Storing basic data provenance SC5,
SC2
RDF data storage SC1
11. Functional Blocks – Data usage
Condition
monitoring
system
assessment
Logging functionality of the whole process for
debugging
SC1, SC7
Monitoring of the operational status of units,
considering the cluster operation in total
SC3
Incorporating third party systems (condition
monitoring systems, experimental research modules)
into the monitoring process and performing
correlated assessment
SC3, SC7
Other View results on a map SC5, SC7
Query a set of geocoded areas, map based GUI SC7, SC2
Publishing Aggregated data must be publicly available at least
in RDF/XML, JSON, CSV formats [through API]
SC1, SC2
Alert system Receives as input the areas with detected changes
and presents them to the user as an alert
SC7
Visualization Present data in a dashboard SC3
A GUI that exposes search over data SC2