SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
http://cascaderesearch.org
DATA-DRIVEN SENSEMAKING INAN EVOLVING, NOISY
WORLD
K. Selçuk Candan
Professor of Computer Science and Engineering
Director, Center for Assured and Scalable Data Engineering (CASCADE)
Supported by
• NSF; “Data Management for Real-Time Data Driven Epidemic Spread Simulations”
• NSF; “RAPID - Understanding the Evolution Patterns of the Ebola Outbreak in West-Africa and Supporting Real-Time Decision
Making and Hypothesis Testing through Large Scale Simulations”
• NSF; “E-SDMS: Energy Simulation Data Management System Software”
• JCI; “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations”
• NSF; “An Infrastructure to Support Complex Financial Patterns (CFP) based Real-Time Services Delivery and Visual Analytics”
• NSF I/UCRC planning grant (NSF-IIP1464579) for “Center for Assured and Scalable Data Engineering”
http://cascaderesearch.org
“Sense”making…what does it mean?
• Etymology:
• 1st sense: from latin “sentire” or “to perceive”
• any of the faculties, as sight, hearing, smell, taste, or touch, by which
humans and animals perceive stimuli originating from outside or inside the
body
• 2nd sense: “to attain awareness or understanding of…”
• “awareness” implies vigilance in observing or alertness in drawing
inferences from what one experiences
• “understanding” is the power to make experience intelligible by applying
concepts and categories
http://cascaderesearch.org
..did you notice something?
• …there is a gap between the first meaning (feel, measurement) and
the second (awareness, understanding)
• ..and that gap (or the data infrastructure needed to bridge that gap)
is what my research is about
knowledgebasessensors
awareness, understanding, control
sensing
applicat
ion
sensemaking
http://cascaderesearch.org
energy
business/enterprise
We are living in a dynamic world…
health-care
entertainment
education
rehabilitation
elderly-care
production life-sciences
sports
security
defense
transportation
supply-chain
retail
arts
advertisement
child-care
pet-care
personal-data management
robotics
smart-rooms
smart-offices
training
space exploration
sciences
http://cascaderesearch.org
Epidemics….
• SARS (Severe Acute Respiratory Syndrome) epidemic is estimated to have started in
China in November 2002, had spread to 29 countries by August 2003
• A pandemic similar to the swine flu in 2009 is estimated to cost $360 billion in a mild
scenario to the global economy and up to $4 trillion in an ultra scenario, within the first
year of the outbreak
• The World Health Organization declared the Ebola epidemic in West Africa a Public
Health Emergency of International Concern on August 8th, 2014, with exponential
dynamics characterizing the initial growth in numbers of new cases in some areas
K. Selcuk Candan @ ASU
• NSF III#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations”
http://cascaderesearch.org
Epidemics….
K. Selcuk Candan @ ASU
• NSF III#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations”
Not much room for error
Both action and inaction can have high costs in terms
of their economic impacts and human lives affected
http://cascaderesearch.org
Bad news…
• Challenge #1: Epidemic data involves
• 100s of inter-dependent parameters,
• spanning multiple layers and geo-spatial frames,
• affected by complex dynamic processes operating at different resolutions.
• Challenge #2: Given the
• unpredictability of an epidemic and
• unpredictability of the actions of various independent agencies,
decision makers need to generate many thousands of simulations,
each with different parameters corresponding to plausible scenarios.
• Challenge #3: Models and simulations need to be continuously revised
based on real-world data as the epidemic and intervention mechanisms
evolve.
K. Selcuk Candan @ ASU
http://cascaderesearch.org
Building energy sector…
• Building sector was responsible for nearly half of CO2 emissions in US in 2009.
• According to the US Energy Information Administration, buildings consume more
energy than any other sector, with 48.7% of the overall energy consumption and
building energy consumption is projected to grow faster than the industry and
transportation sectors.
K. Selcuk Candan @ ASU
• NSF SI^2#1339835 “E-SDMS: Energy Simulation Data Management System Software”
• JCI Grant “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations ”
U.S. Energy Information Administration. 2008.
International Energy Statistics
http://cascaderesearch.org
Good news….
• By 2030, 82% of the US building stock is expected to be relying on smart and cleaner
energy technologies
• Building energy management systems (BEMSs) process large volumes of data, including
• continuously collected heating, ventilation, and air conditioning (HVAC) sensor and actuation data of
residential and commercial buildings of all types and sizes
• other sensory data, such as occupancy, humidity, lighting levels, air speed and quality,
• architectural, mechanical, and building automation system configuration data,
• local whether and GIS data that provide contextual information, as well as
• price, consumption, and cost data from electricity (such as smart grid) and gas utilities
K. Selcuk Candan @ ASU
• NSF SI^2#1339835 “E-SDMS: Energy Simulation Data Management System Software”
• JCI Grant “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations ”
http://econtrol.me/Smart%20Building.html
http://customloungeuk.com
Because of the
• size and complexity of the data and
• the varying spatial and temporal scales at which the key
processes operate;
experts lack the means to understand and predict relevant
processes.
http://cascaderesearch.org
energy
business/enterprise
Sensemaking in a dynamic world…
health-care
entertainment
education
rehabilitation
elderly-care
production life-sciences
sports
security
defense
transportation
supply-chain
retail
arts
advertisement
child-care
pet-care
personal-data management
robotics
smart-rooms
smart-offices
training
space exploration
sciences
Sense &
Integrate
Simulate
& Predict
Validate
&
Interpret
Act &
Adapt
(a) Sense & Integrate:
take as inputs, and integrate, data, and models of
the application space and continuously sensed real-
time observational data,
(b) Simulate & Predict:
support data-driven simulation and predictive
analysis over integrated data sets and models,
(c) Validate & Interpret:
enable validation of observations, models, and
simulation/prediction results and intuitive data and
result representation to provide trustworthy and
accurate decision making, and
(d) Act & Adapt:
provide continuous adaptation of models and
predictions based on the validated predictions and
observations.
http://cascaderesearch.org
energy
business/enterprise
Data challenges in a dynamic world
health-care
entertainment
education
rehabilitation
elderly-care
production life-sciences
sports
security
defense
transportation
supply-chain
retail
arts
advertisement
child-care
pet-care
personal-data management
robotics
smart-rooms
smart-offices
training
space exploration
sciences
(I)mprecision
(S)parsity
(Q)uality/Noise
ISQ
(H)igh-dimensional
(M)ulti-modal
Inter-(L)inked
(E)volving
HMLE
(V)olume
(V)elocity
(V)ariety
3Vs
http://cascaderesearch.org
energy
business/enterprise
Data challenges in a dynamic world
health-care
entertainment
education
rehabilitation
elderly-care
production life-sciences
sports
security
defense
transportation
supply-chain
retail
arts
advertisement
child-care
pet-care
personal-data management
robotics
smart-rooms
smart-offices
training
space exploration
sciences
(I)mprecision
(S)parsity
(Q)uality/Noise
ISQ
(H)igh-dimensional
(M)ulti-modal
Inter-(L)inked
(E)volving
HMLE
(V)olume
(V)elocity
(V)ariety
3Vs
http://cascaderesearch.org
ASU Center forAssured and Scalable Data Engineering
CASCADE-IUCRC
Industry/University Collaborative Research
Center (I/UCRC) * NSF I/UCRC planning grant (NSF-IIP1464579)
http://cascaderesearch.org
“Big Data” Industry Roundtable at ASU
• Co-organized with IBM
• On-site or off-site participation
• Aerojet,
• Avnet,
• Boeing,
• Facebook
• Google
• IBM TJ Watson (Exascale System Software),
• IBM Smart Analytics
• IO Data Centers,
• Johnson Controls,
• LinkedIn,
• Lockhed Martin,
• Mayo Clinic,
• NEC Labs,
• Oracle,
• Salt River Project,
• SAP
http://cascaderesearch.org
2nd Event…
http://cascaderesearch.org
Key knowledge gaps..
• Six most critical knowledge competency groups (in terms of the value
gap – i.e., the difference between current and desired states of the
knowledge area)
1. temporal and spatial analyses,
2. summarization, cleaning, visualization, anomaly detection,
3. real-time processing for streaming data,
• media analytics
4. representations and fusion for unstructured/structured data, semantic Web,
• make unstructured data queriable, prioritize and rank data, correlate and identify the
gaps in the data
5. graph-based models, social networks,
• entity analytics, (social and other) network analytics
6. performance and scalability, distributed architectures.
"Hunting for the Value Gaps in Data Management, Services, and Analytics”
ACM SIGMOD blog; http://wp.sigmod.org/
http://cascaderesearch.org
CASCADE Mission
• Mission: to support the innovation of data architectures
and tools that can match the scale of the data and support
timely and assured decision making to generate value.
Validate &
Interpret
Act &
Adapt
Sense &
Integrate
Simulate
& Predict
Data
Management
Data Analysis
Data
Assurance
http://cascaderesearch.org
modeling
organization
storage/indexing
replication
fusion/integration
ingest compression visualization
partitioning
hiding
security encryption
repudiation provenance
authentication
trust models
access control
finger printing
tamper detectionsummarization/aggregation
sampling
cleaning
normalization
annotation
dimensionality reduction
media analysis
machine learning
FUNDAMENTAL
KNOWLEDGE
ENABLING
TECHNOLOGIES
SYSTEMS
Technology Element:
Real-time
Data Processing
and Analysis
Technology Element:
Parallel and Distributed
Data Processing
and Analysis
Technology Element:
High-dimensional
and
Multi-modal
Data Processing
and Analysis
Technology Element:
Trusted and
Privacy-preserving
Data Processing
and Analysis
Fundamental Insights
Partners &
Stakeholders
SystemRequirements
TECHNOLOGY
BARRIERS:
• availability,
• timeliness,
• cost,
• consistency,
• trust,
• privacy,
• security,
• compliance, and
• accessibility
FUNDAMENTAL BARRIERS:
• heterogeneous data and models,
• transient, mobile, and distributed data,
• multi-scale, multi-resolution data,
• data with different quality, precision,
privacy, security, and trust levels, and
• varying data volume and characteristics
• high dimensional, complex data
Requirements
Product and
Outcomes
http://cascaderesearch.orgmodeling
hiding
security encryption
repudiation provenance
authentication
trust models
access control
finger printing
tamper detectionsummarization/aggregation
sampling
cleaning
normalization
annotation
dimensionality reduction
media analysis
machine learning
ENABLING
TECHNOLOGIES
SYSTEMS
Technology Element:
Real-time
Data Processing
and Analysis
Technology Element:
Parallel and Distributed
Data Processing
and Analysis
Technology Element:
High-dimensional
and
Multi-modal
Data Processing
and Analysis
Technology Element:
Trusted and
Privacy-preserving
Data Processing
and Analysis
Fundamental Insights
&
rs
FUND
• hetero
• transient
• multi-s
• data wit
privacy,
Requirements
Product and
Outcomes
http://cascaderesearch.org
CASCADE team
Name Title Area(s) of Specialization as they relate to proposed
concentration
K. Selcuk Candan Professor Scalable data management and media analysis
Hasan Davulcu Assoc. Professor Databases and data extraction
Gail Joon Ahn Professor Security and privacy in distributed data systems
Huan Liu Professor Data mining and analysis
Ross Maciejewski Assistant Professor Data visualization
Baoxin Li Professor Statistical machine learning, media analysis
Rao Kambhampati Professor Data integration, data cleaning
Chitta Baral Professor Knowledge representation, NLP
Dijuang Huang Associate Professor Data clouds
Hanghang Tong Assistant Professor Graph structured data
Mohamed Sarwat Assistant Professor Data management systems
Jingrui He Assistant Professor Data analysis and sparse learning
Paolo Shakarian Assistant Professor Data and network analysis
Rong Pan Assoc. Professor Data analytics
Jing Li Assoc. Professor Data analytics
Ron Askin Professor Data-driven decision models
Teresa Wu Professor Decision support, health informatics
Ming Zhao Associate Professor Scalable data processing
Adam Doupe Assistant Professor Data security
Paolo Papotti Assistant Professor Data integration and management
21
http://cascaderesearch.org
CASCADE team
http://cascaderesearch.org
http://cascaderesearch.org
So what about my team’s work?
http://cascaderesearch.org
Common approaches to learning
• There are several technical approaches.
• factorization, matrix/tensor decomposition
• probabilistic (Bayesian/graphical model) learning
• deep structured learning and neural networks.
http://cascaderesearch.org
• There are several technical approaches.
• factorization, matrix/tensor decomposition
• probabilistic (Bayesian/graphical model) learning
• deep structured learning and neural networks.
Common approaches to learning
http://cascaderesearch.org
Tensor analysis…
• Tensor decomposition [CP,Tucker] can be used for
• understanding spectral characteristics of the data and
• clustering the data based on inter-dependencies.
CP-decomposition:
R clusters and
cluster memberships
Factor Matrix
Factor Matrix
Factor Matrix
Core Tensor
http://cascaderesearch.org
Tensor representation of data
• Most media and sensor data are
• multi-dimensional and
• multi-relational
• Temporally evolving data…
or
represented as
E.g.
A B C
: : :
a b 2
: : :
a
b
2
1a
b
2
time
Alternative #1: incrementally growing tensor
time
……
Alternative #2: sequence of tensor snapshots
http://cascaderesearch.org
Tensor analysis…
• Tensor decomposition [CP,Tucker] can be used for
• understanding spectral characteristics of the data and
• clustering the data based on inter-dependencies.
CP-decomposition:
R clusters and
cluster memberships
Factor Matrix
Factor Matrix
Factor Matrix
Core Tensor
http://cascaderesearch.org
Tensor analysis…
• Tensor decomposition [CP,Tucker] can be used for
• understanding spectral characteristics of the data and
• clustering the data based on inter-dependencies.
Tucker-
decomposition:
r1xr2xr3 clusters and
cluster memberships
Factor Matrix
Factor Matrix
Factor Matrix
Core Tensor
Problems:
• these are very computationally expensive operations,
• they are also memory intensive,
• they do not go hand-in-hand with other data
manipulation operations (selection, join, union)
http://cascaderesearch.org
Common data characteristics…
• The key characteristics of the real worlddata sets
include the following:
• multi-variate
• multi-modal
• temporal,
• spatial,
• hierarchical,
• graphical
• multi-layer
• multi-resolution
• inter-dependent
• observations of interest depend on and impact each
other
time
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
time
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
time
hierarchy
Tempe PHX
AZ CA
US
SF
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
time
distance matrix
hierarchy
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
time
distance matrix
hierarchy
graph
Differently-Modal Tensors (DMT)
http://cascaderesearch.org
..alternatively…Networks of Time Series (NoTS)
http://cascaderesearch.org
Research challenges…
Questions:
• how to best account for the different modalities of the
data?
• can we leverage metadata to support multi-resolution
and incremental tensor analysis operations?
• can we implement a memory hierarchy supported
tensor analysis?
• can we co-optimize tensor analysis and other data
manipulation operations?
http://cascaderesearch.org
What about other approaches?
• There are several technical approaches.
• factorization, matrix/tensor decomposition
• probabilistic (Bayesian/graphical model) learning
• deep structured learning and neural networks.
….many of the algorithms are based on iterative processes, such as
alternating least squares (ALS) or stochastic gradient descent (SGD), which
approximate the best solution until a convergence condition is reached
Question: Can we develop metadata-supported and multi-scale
techniques that can leverage the volume/cost trade-offs provided by
storage hierarchies to provide high accuracy at minimum cost?
http://cascaderesearch.org
Conclusions…
Making sense of a dynamically evolving world is a really
really challenging task……
modeling
organization
storage/indexing
replication
fusion/integration
ingest compression visualization
partitioning
hiding
security encryption
repudiation provenance
authentication
trust models
access control
finger printing
tamper detectionsummarization/aggregation
sampling
cleaning
normalization
annotation
dimensionality reduction
media analysis
machine learning
FUNDAMENTAL
KNOWLEDGE
ENABLING
TECHNOLOGIES
SYSTEMS
Technology Element:
Real-time
Data Processing
and Analysis
Technology Element:
Parallel and Distributed
Data Processing
and Analysis
Technology Element:
High-dimensional
and
Multi-modal
Data Processing
and Analysis
Technology Element:
Trusted and
Privacy-preserving
Data Processing
and Analysis
Fundamental Insights
Partners &
Stakeholders
SystemRequirements
TECHNOLOGY
BARRIERS:
• availability,
• timeliness,
• cost,
• consistency,
• trust,
• privacy,
• security,
• compliance, and
• accessibility
FUNDAMENTAL BARRIERS:
• heterogeneous data and models,
• transient, mobile, and distributed data,
• multi-scale, multi-resolution data,
• data with different quality, precision,
privacy, security, and trust levels, and
• varying data volume and characteristics
• high dimensional, complex data
Requirements
Product and
Outcomes
http://cascaderesearch.org
candan@asu.edu
cascaderesearch.orgcascade.asu.edu
http://cascaderesearch.org
Relevant Publications
• Xinsheng Li, Shenyu Huang, K. Selcuk Candan, Maria Luisa Sapino. 2PCP: Two-Phase CP
Decomposition for Billion-Scale Dense Tensors. IEEE Int. Conference on Data Engineering (ICDE)
2016.
• Jung Hyun Kim, K. Selcuk Candan, Maria Luisa Sapino, PageRank Revisited: On the Relationship
between Node Degrees and Node Significances in Different Applications, International Workshop on
Querying Graph Structured Data (GraphQ'16), in conjunct with EDBT 2016.
• Mijung Kim, K. Selcuk Candan: Decomposition-by-normalization (DBN): leveraging approximate
functional dependencies for efficient CP and tucker decompositions. Data Min. Knowl. Discov, 30(1):
1-46 (2016)
• Shengyu Huang, Xinsheng Li, K. Selcuk Candan, Maria Luisa Sapino: Reducing seed noise in
personalized PageRank. Social Netw. Analys. Mining. 6(1): 6:1-6:25 (2016)
• Mithila Nagendra, K. Selcuk Candan: Efficient Processing of Skyline-Join Queries over Multiple
Data Sources. ACM Trans. Database Syst. 40(2): 10 (2015)
• Jung Hyun Kim, K. Selcuk Candan, Maria Luisa Sapino: Locality-sensitive and Re-use Promoting
Personalized PageRank Computations. Knowledge and Information Systems, pp 1-39, First online:
18 June 2015.
• Parth Nagarkar, K. Selcuk Candan, Aneesha Bhat: Compressed Spatial Hierarchical Bitmap (cSHB)
Indexes for Efficiently Processing Spatial Range Query Workloads. PVLDB 8(12): 1382-1393 (2015)
• Xilun Chen, K. Selcuk Candan, Maria Luisa Sapino, Paulo Shakarian: KSGM: Keynode-driven
Scalable Graph Matching. CIKM 2015: 1101-1110
http://cascaderesearch.org
Relevant Publications
• Xilun Chen and K. Selcuk Candan. LWI-SVD: Low-rank, Windowed, Incremental Singular Value
Decompositions on Time-Evolving Data Sets. KDD'14, NY, USA. 2014.
• Xilun Chen and K. Selcuk Candan. GI-NMF: Group Incremental Non-Negative Matrix Factorization
on Data Streams. ACM International Conference on Conference on Information and Knowledge
Management (CIKM'14). Shaghai, China. 2014.
• Mijung Kim and K. Selcuk Candan. Efficient Static and Dynamic In-Database Tensor
Decompositions on Chunk-Based Array Stores. ACM International Conference on Conference on
Information and Knowledge Management (CIKM'14). Shaghai, China. 2014.
• Xinsheng Li, Shenyu huang, K. Selcuk Candan, Maria Luisa Sapino. Focusing Decomposition
Accuracy by Personalizing Tensor Decomposition (PTD). ACM International Conference on
Information and Knowledge Management (CIKM'14). Shanghai, China. 2014.
• Mijung Kim and K. Selcuk Candan. Pushing-Down Tensor Decompositions over Unions to Promote
Reuse of Materialized Decompositions. The European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD'14). Nancy, France.
2014.
• Shengyu Huang, Xinsheng Li, K. Selcuk Candan, Maria Luisa Sapino. “Can you really trust that
seed?”•: Reducing the Impact of Seed Noise in Personalized PageRank. International Conference
on Advances in Social Network Analysis and Mining (ASONAM). Beijing, China. 2014
• Parth Nagarkar and K. Selcuk Candan. HCS: Hierarchical Cut Selection for Efficiently Processing
Queries on Data Columns using Hierarchical Bitmap Indices. EDBT'14: pp. 271-282, 2014.
http://cascaderesearch.org
Relevant Publications
• Xiaolan Wang, K. Selcuk Candan, and Maria Luisa Sapino. Leveraging Metadata for Identifying
Local, Robust Multi-variate Temporal (RMT) Features. accepted to ICDE 2014
• Claudio Schifanella, K. Selcuk Candan, and Maria Luisa Sapino. Multiresolution Tensor
Decompositions with Mode Hierarchies. Trans. on Knowledge Discovery from Data (TKDD), ACM
Transactions on Knowledge Discovery from Data (TKDD), 8(2), June 2014.
• Jung W. Kim, K. Selcuk Candan, and M. L. Sapino. LR-PPR: Locality-Sensitive, Re-use Promoting,
Approximate Personalized PageRank Computation. CIKM'13, 2013.
• Mithila Nagendra and K. Selcuk Candan. Layered Processing of Skyline-Window-Join (SWJ)
Queries using Iteration-Fabric. ICDE'13, pp. 985-996, 2013.
• Mithila Nagendra and K. Selcuk Candan. SkySuite: A Framework of Skyline Join Operators for
Static and Stream Environments. VLDB'13, 2013.
• Jung Hyun Kim, Xilun Chen, K. Selcuk Candan, and Maria Luisa Sapino. Hive Open Research
Network Platform, at EDBT'13, pp. 985-996, 2013.
• Mijung Kim, K. Selçuk Candan: SBV-Cut: Vertex-cut based graph partitioning using structural
balance vertices. Data Knowl. Eng. 72: 285-303 (2012)
• Claudio Schifanella, Maria Luisa Sapino, K. Selçuk Candan: On context-aware co-clustering with
metadata support. J. Intell. Inf. Syst. 38(1): 209-239 (2012)
http://cascaderesearch.org
Relevant Publications
• K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: sDTW: Computing DTW
Distances using Locally Relevant Constraints based on Salient Feature Alignments. PVLDB 5(11):
1519-1530 (2012)
• Mijung Kim, K. Selçuk Candan: Decomposition-by-normalization (DBN): leveraging approximate
functional dependencies for efficient Tensor Decomposition. CIKM 2012: 355-364
• Jung Hyun Kim, K. Selçuk Candan, Maria Luisa Sapino: Impact Neighborhood Indexing (INI) in
diffusion graphs. CIKM 2012: 2184-2188
• K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: STFMap: query- and
feature-driven visualization of large time series data sets. CIKM 2012: 2743-2745
• Mithila Nagendra, K. Selçuk Candan: Skyline-sensitive joins with LR-pruning. EDBT 2012: 252-263
• Songling Liu, Juan P. Cedeño, K. Selçuk Candan, Maria Luisa Sapino, Shengyu Huang, Xinsheng
Li: R2DB: A System for Querying and Visualizing Weighted RDF Graphs. ICDE 2012: 1313-1316.

Weitere ähnliche Inhalte

Was ist angesagt?

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Artificial Intelligence Institute at UofSC
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Amit Sheth
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Amit Sheth
 
Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...
Amit Sheth
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Amit Sheth
 
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Artificial Intelligence Institute at UofSC
 

Was ist angesagt? (19)

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
COGNITIVE COMPUTING
COGNITIVE COMPUTINGCOGNITIVE COMPUTING
COGNITIVE COMPUTING
 
Cognitive Computing at University Osnabrück
Cognitive Computing at University OsnabrückCognitive Computing at University Osnabrück
Cognitive Computing at University Osnabrück
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated ConferencesComputing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
Computing for Human Experience [v4]: Keynote @ OnTheMove Federated Conferences
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
 
Cognitive computing
Cognitive computingCognitive computing
Cognitive computing
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
 
Cognitive Computing and the future of Artificial Intelligence
Cognitive Computing and the future of Artificial IntelligenceCognitive Computing and the future of Artificial Intelligence
Cognitive Computing and the future of Artificial Intelligence
 
Cognitive computing
Cognitive computing Cognitive computing
Cognitive computing
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Smart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingSmart IoT for Connected Manufacturing
Smart IoT for Connected Manufacturing
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
How can we har­ness the Human Brain Project to max­i­mize its future health a...
How can we har­ness the Human Brain Project to max­i­mize its future health a...How can we har­ness the Human Brain Project to max­i­mize its future health a...
How can we har­ness the Human Brain Project to max­i­mize its future health a...
 
Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...Augmented Personalized Health: using AI techniques on semantically integrated...
Augmented Personalized Health: using AI techniques on semantically integrated...
 
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
Ontology-enabled Healthcare Applications exploiting Physical-Cyber-Social Big...
 
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 

Ähnlich wie Cognitive systems16

wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
parry prabhu
 
Zpryme Report on Modeling and Simulation
Zpryme Report on Modeling and SimulationZpryme Report on Modeling and Simulation
Zpryme Report on Modeling and Simulation
Paula Smith
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Amit Sheth
 
resume v 5.0
resume v 5.0resume v 5.0
resume v 5.0
Ye Xu
 

Ähnlich wie Cognitive systems16 (20)

ACCESSIBILITY OF MOBILE CYBER PHYSICAL SYSTEM 02
ACCESSIBILITY OF MOBILE CYBER PHYSICAL SYSTEM 02ACCESSIBILITY OF MOBILE CYBER PHYSICAL SYSTEM 02
ACCESSIBILITY OF MOBILE CYBER PHYSICAL SYSTEM 02
 
GTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsGTU GeekDay Data Science and Applications
GTU GeekDay Data Science and Applications
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
Big Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictionsBig Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictions
 
The Emergence of Digital Mirror Worlds
The Emergence of Digital Mirror WorldsThe Emergence of Digital Mirror Worlds
The Emergence of Digital Mirror Worlds
 
The Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsThe Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research Efforts
 
Tech Jam 2015: Action Cluster Highlights
Tech Jam 2015: Action Cluster HighlightsTech Jam 2015: Action Cluster Highlights
Tech Jam 2015: Action Cluster Highlights
 
Zpryme Report on Modeling and Simulation
Zpryme Report on Modeling and SimulationZpryme Report on Modeling and Simulation
Zpryme Report on Modeling and Simulation
 
Report-Fog Based Emergency System For Smart Enhanced Living Environment
Report-Fog Based Emergency System For Smart Enhanced Living EnvironmentReport-Fog Based Emergency System For Smart Enhanced Living Environment
Report-Fog Based Emergency System For Smart Enhanced Living Environment
 
A FALL DETECTION SMART WATCH USING IOT AND DEEP LEARNING
A FALL DETECTION SMART WATCH USING IOT AND DEEP LEARNINGA FALL DETECTION SMART WATCH USING IOT AND DEEP LEARNING
A FALL DETECTION SMART WATCH USING IOT AND DEEP LEARNING
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
SenseDroid
SenseDroidSenseDroid
SenseDroid
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
 
resume v 5.0
resume v 5.0resume v 5.0
resume v 5.0
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Staying Ahead of the Race - Quantum computing in Cybersecurity
Staying Ahead of the Race - Quantum computing in Cybersecurity Staying Ahead of the Race - Quantum computing in Cybersecurity
Staying Ahead of the Race - Quantum computing in Cybersecurity
 
Four Disruptive Trends for the Next Decade
Four Disruptive Trends for the Next DecadeFour Disruptive Trends for the Next Decade
Four Disruptive Trends for the Next Decade
 
Sinnott Paper
Sinnott PaperSinnott Paper
Sinnott Paper
 
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
 

Mehr von diannepatricia

Mehr von diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Cognitive systems16

  • 1. http://cascaderesearch.org DATA-DRIVEN SENSEMAKING INAN EVOLVING, NOISY WORLD K. Selçuk Candan Professor of Computer Science and Engineering Director, Center for Assured and Scalable Data Engineering (CASCADE) Supported by • NSF; “Data Management for Real-Time Data Driven Epidemic Spread Simulations” • NSF; “RAPID - Understanding the Evolution Patterns of the Ebola Outbreak in West-Africa and Supporting Real-Time Decision Making and Hypothesis Testing through Large Scale Simulations” • NSF; “E-SDMS: Energy Simulation Data Management System Software” • JCI; “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations” • NSF; “An Infrastructure to Support Complex Financial Patterns (CFP) based Real-Time Services Delivery and Visual Analytics” • NSF I/UCRC planning grant (NSF-IIP1464579) for “Center for Assured and Scalable Data Engineering”
  • 2. http://cascaderesearch.org “Sense”making…what does it mean? • Etymology: • 1st sense: from latin “sentire” or “to perceive” • any of the faculties, as sight, hearing, smell, taste, or touch, by which humans and animals perceive stimuli originating from outside or inside the body • 2nd sense: “to attain awareness or understanding of…” • “awareness” implies vigilance in observing or alertness in drawing inferences from what one experiences • “understanding” is the power to make experience intelligible by applying concepts and categories
  • 3. http://cascaderesearch.org ..did you notice something? • …there is a gap between the first meaning (feel, measurement) and the second (awareness, understanding) • ..and that gap (or the data infrastructure needed to bridge that gap) is what my research is about knowledgebasessensors awareness, understanding, control sensing applicat ion sensemaking
  • 4. http://cascaderesearch.org energy business/enterprise We are living in a dynamic world… health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences
  • 5. http://cascaderesearch.org Epidemics…. • SARS (Severe Acute Respiratory Syndrome) epidemic is estimated to have started in China in November 2002, had spread to 29 countries by August 2003 • A pandemic similar to the swine flu in 2009 is estimated to cost $360 billion in a mild scenario to the global economy and up to $4 trillion in an ultra scenario, within the first year of the outbreak • The World Health Organization declared the Ebola epidemic in West Africa a Public Health Emergency of International Concern on August 8th, 2014, with exponential dynamics characterizing the initial growth in numbers of new cases in some areas K. Selcuk Candan @ ASU • NSF III#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations”
  • 6. http://cascaderesearch.org Epidemics…. K. Selcuk Candan @ ASU • NSF III#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations” Not much room for error Both action and inaction can have high costs in terms of their economic impacts and human lives affected
  • 7. http://cascaderesearch.org Bad news… • Challenge #1: Epidemic data involves • 100s of inter-dependent parameters, • spanning multiple layers and geo-spatial frames, • affected by complex dynamic processes operating at different resolutions. • Challenge #2: Given the • unpredictability of an epidemic and • unpredictability of the actions of various independent agencies, decision makers need to generate many thousands of simulations, each with different parameters corresponding to plausible scenarios. • Challenge #3: Models and simulations need to be continuously revised based on real-world data as the epidemic and intervention mechanisms evolve. K. Selcuk Candan @ ASU
  • 8. http://cascaderesearch.org Building energy sector… • Building sector was responsible for nearly half of CO2 emissions in US in 2009. • According to the US Energy Information Administration, buildings consume more energy than any other sector, with 48.7% of the overall energy consumption and building energy consumption is projected to grow faster than the industry and transportation sectors. K. Selcuk Candan @ ASU • NSF SI^2#1339835 “E-SDMS: Energy Simulation Data Management System Software” • JCI Grant “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations ” U.S. Energy Information Administration. 2008. International Energy Statistics
  • 9. http://cascaderesearch.org Good news…. • By 2030, 82% of the US building stock is expected to be relying on smart and cleaner energy technologies • Building energy management systems (BEMSs) process large volumes of data, including • continuously collected heating, ventilation, and air conditioning (HVAC) sensor and actuation data of residential and commercial buildings of all types and sizes • other sensory data, such as occupancy, humidity, lighting levels, air speed and quality, • architectural, mechanical, and building automation system configuration data, • local whether and GIS data that provide contextual information, as well as • price, consumption, and cost data from electricity (such as smart grid) and gas utilities K. Selcuk Candan @ ASU • NSF SI^2#1339835 “E-SDMS: Energy Simulation Data Management System Software” • JCI Grant “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations ” http://econtrol.me/Smart%20Building.html http://customloungeuk.com Because of the • size and complexity of the data and • the varying spatial and temporal scales at which the key processes operate; experts lack the means to understand and predict relevant processes.
  • 10. http://cascaderesearch.org energy business/enterprise Sensemaking in a dynamic world… health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences Sense & Integrate Simulate & Predict Validate & Interpret Act & Adapt (a) Sense & Integrate: take as inputs, and integrate, data, and models of the application space and continuously sensed real- time observational data, (b) Simulate & Predict: support data-driven simulation and predictive analysis over integrated data sets and models, (c) Validate & Interpret: enable validation of observations, models, and simulation/prediction results and intuitive data and result representation to provide trustworthy and accurate decision making, and (d) Act & Adapt: provide continuous adaptation of models and predictions based on the validated predictions and observations.
  • 11. http://cascaderesearch.org energy business/enterprise Data challenges in a dynamic world health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences (I)mprecision (S)parsity (Q)uality/Noise ISQ (H)igh-dimensional (M)ulti-modal Inter-(L)inked (E)volving HMLE (V)olume (V)elocity (V)ariety 3Vs
  • 12. http://cascaderesearch.org energy business/enterprise Data challenges in a dynamic world health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences (I)mprecision (S)parsity (Q)uality/Noise ISQ (H)igh-dimensional (M)ulti-modal Inter-(L)inked (E)volving HMLE (V)olume (V)elocity (V)ariety 3Vs
  • 13. http://cascaderesearch.org ASU Center forAssured and Scalable Data Engineering CASCADE-IUCRC Industry/University Collaborative Research Center (I/UCRC) * NSF I/UCRC planning grant (NSF-IIP1464579)
  • 14. http://cascaderesearch.org “Big Data” Industry Roundtable at ASU • Co-organized with IBM • On-site or off-site participation • Aerojet, • Avnet, • Boeing, • Facebook • Google • IBM TJ Watson (Exascale System Software), • IBM Smart Analytics • IO Data Centers, • Johnson Controls, • LinkedIn, • Lockhed Martin, • Mayo Clinic, • NEC Labs, • Oracle, • Salt River Project, • SAP
  • 16. http://cascaderesearch.org Key knowledge gaps.. • Six most critical knowledge competency groups (in terms of the value gap – i.e., the difference between current and desired states of the knowledge area) 1. temporal and spatial analyses, 2. summarization, cleaning, visualization, anomaly detection, 3. real-time processing for streaming data, • media analytics 4. representations and fusion for unstructured/structured data, semantic Web, • make unstructured data queriable, prioritize and rank data, correlate and identify the gaps in the data 5. graph-based models, social networks, • entity analytics, (social and other) network analytics 6. performance and scalability, distributed architectures. "Hunting for the Value Gaps in Data Management, Services, and Analytics” ACM SIGMOD blog; http://wp.sigmod.org/
  • 17. http://cascaderesearch.org CASCADE Mission • Mission: to support the innovation of data architectures and tools that can match the scale of the data and support timely and assured decision making to generate value. Validate & Interpret Act & Adapt Sense & Integrate Simulate & Predict Data Management Data Analysis Data Assurance
  • 18. http://cascaderesearch.org modeling organization storage/indexing replication fusion/integration ingest compression visualization partitioning hiding security encryption repudiation provenance authentication trust models access control finger printing tamper detectionsummarization/aggregation sampling cleaning normalization annotation dimensionality reduction media analysis machine learning FUNDAMENTAL KNOWLEDGE ENABLING TECHNOLOGIES SYSTEMS Technology Element: Real-time Data Processing and Analysis Technology Element: Parallel and Distributed Data Processing and Analysis Technology Element: High-dimensional and Multi-modal Data Processing and Analysis Technology Element: Trusted and Privacy-preserving Data Processing and Analysis Fundamental Insights Partners & Stakeholders SystemRequirements TECHNOLOGY BARRIERS: • availability, • timeliness, • cost, • consistency, • trust, • privacy, • security, • compliance, and • accessibility FUNDAMENTAL BARRIERS: • heterogeneous data and models, • transient, mobile, and distributed data, • multi-scale, multi-resolution data, • data with different quality, precision, privacy, security, and trust levels, and • varying data volume and characteristics • high dimensional, complex data Requirements Product and Outcomes
  • 19. http://cascaderesearch.orgmodeling hiding security encryption repudiation provenance authentication trust models access control finger printing tamper detectionsummarization/aggregation sampling cleaning normalization annotation dimensionality reduction media analysis machine learning ENABLING TECHNOLOGIES SYSTEMS Technology Element: Real-time Data Processing and Analysis Technology Element: Parallel and Distributed Data Processing and Analysis Technology Element: High-dimensional and Multi-modal Data Processing and Analysis Technology Element: Trusted and Privacy-preserving Data Processing and Analysis Fundamental Insights & rs FUND • hetero • transient • multi-s • data wit privacy, Requirements Product and Outcomes
  • 20. http://cascaderesearch.org CASCADE team Name Title Area(s) of Specialization as they relate to proposed concentration K. Selcuk Candan Professor Scalable data management and media analysis Hasan Davulcu Assoc. Professor Databases and data extraction Gail Joon Ahn Professor Security and privacy in distributed data systems Huan Liu Professor Data mining and analysis Ross Maciejewski Assistant Professor Data visualization Baoxin Li Professor Statistical machine learning, media analysis Rao Kambhampati Professor Data integration, data cleaning Chitta Baral Professor Knowledge representation, NLP Dijuang Huang Associate Professor Data clouds Hanghang Tong Assistant Professor Graph structured data Mohamed Sarwat Assistant Professor Data management systems Jingrui He Assistant Professor Data analysis and sparse learning Paolo Shakarian Assistant Professor Data and network analysis Rong Pan Assoc. Professor Data analytics Jing Li Assoc. Professor Data analytics Ron Askin Professor Data-driven decision models Teresa Wu Professor Decision support, health informatics Ming Zhao Associate Professor Scalable data processing Adam Doupe Assistant Professor Data security Paolo Papotti Assistant Professor Data integration and management 21
  • 23. http://cascaderesearch.org Common approaches to learning • There are several technical approaches. • factorization, matrix/tensor decomposition • probabilistic (Bayesian/graphical model) learning • deep structured learning and neural networks.
  • 24. http://cascaderesearch.org • There are several technical approaches. • factorization, matrix/tensor decomposition • probabilistic (Bayesian/graphical model) learning • deep structured learning and neural networks. Common approaches to learning
  • 25. http://cascaderesearch.org Tensor analysis… • Tensor decomposition [CP,Tucker] can be used for • understanding spectral characteristics of the data and • clustering the data based on inter-dependencies. CP-decomposition: R clusters and cluster memberships Factor Matrix Factor Matrix Factor Matrix Core Tensor
  • 26. http://cascaderesearch.org Tensor representation of data • Most media and sensor data are • multi-dimensional and • multi-relational • Temporally evolving data… or represented as E.g. A B C : : : a b 2 : : : a b 2 1a b 2 time Alternative #1: incrementally growing tensor time …… Alternative #2: sequence of tensor snapshots
  • 27. http://cascaderesearch.org Tensor analysis… • Tensor decomposition [CP,Tucker] can be used for • understanding spectral characteristics of the data and • clustering the data based on inter-dependencies. CP-decomposition: R clusters and cluster memberships Factor Matrix Factor Matrix Factor Matrix Core Tensor
  • 28. http://cascaderesearch.org Tensor analysis… • Tensor decomposition [CP,Tucker] can be used for • understanding spectral characteristics of the data and • clustering the data based on inter-dependencies. Tucker- decomposition: r1xr2xr3 clusters and cluster memberships Factor Matrix Factor Matrix Factor Matrix Core Tensor Problems: • these are very computationally expensive operations, • they are also memory intensive, • they do not go hand-in-hand with other data manipulation operations (selection, join, union)
  • 29. http://cascaderesearch.org Common data characteristics… • The key characteristics of the real worlddata sets include the following: • multi-variate • multi-modal • temporal, • spatial, • hierarchical, • graphical • multi-layer • multi-resolution • inter-dependent • observations of interest depend on and impact each other time
  • 30. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time
  • 31. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time hierarchy Tempe PHX AZ CA US SF
  • 32. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time distance matrix hierarchy
  • 33. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time distance matrix hierarchy graph Differently-Modal Tensors (DMT)
  • 35. http://cascaderesearch.org Research challenges… Questions: • how to best account for the different modalities of the data? • can we leverage metadata to support multi-resolution and incremental tensor analysis operations? • can we implement a memory hierarchy supported tensor analysis? • can we co-optimize tensor analysis and other data manipulation operations?
  • 36. http://cascaderesearch.org What about other approaches? • There are several technical approaches. • factorization, matrix/tensor decomposition • probabilistic (Bayesian/graphical model) learning • deep structured learning and neural networks. ….many of the algorithms are based on iterative processes, such as alternating least squares (ALS) or stochastic gradient descent (SGD), which approximate the best solution until a convergence condition is reached Question: Can we develop metadata-supported and multi-scale techniques that can leverage the volume/cost trade-offs provided by storage hierarchies to provide high accuracy at minimum cost?
  • 37. http://cascaderesearch.org Conclusions… Making sense of a dynamically evolving world is a really really challenging task…… modeling organization storage/indexing replication fusion/integration ingest compression visualization partitioning hiding security encryption repudiation provenance authentication trust models access control finger printing tamper detectionsummarization/aggregation sampling cleaning normalization annotation dimensionality reduction media analysis machine learning FUNDAMENTAL KNOWLEDGE ENABLING TECHNOLOGIES SYSTEMS Technology Element: Real-time Data Processing and Analysis Technology Element: Parallel and Distributed Data Processing and Analysis Technology Element: High-dimensional and Multi-modal Data Processing and Analysis Technology Element: Trusted and Privacy-preserving Data Processing and Analysis Fundamental Insights Partners & Stakeholders SystemRequirements TECHNOLOGY BARRIERS: • availability, • timeliness, • cost, • consistency, • trust, • privacy, • security, • compliance, and • accessibility FUNDAMENTAL BARRIERS: • heterogeneous data and models, • transient, mobile, and distributed data, • multi-scale, multi-resolution data, • data with different quality, precision, privacy, security, and trust levels, and • varying data volume and characteristics • high dimensional, complex data Requirements Product and Outcomes
  • 39. http://cascaderesearch.org Relevant Publications • Xinsheng Li, Shenyu Huang, K. Selcuk Candan, Maria Luisa Sapino. 2PCP: Two-Phase CP Decomposition for Billion-Scale Dense Tensors. IEEE Int. Conference on Data Engineering (ICDE) 2016. • Jung Hyun Kim, K. Selcuk Candan, Maria Luisa Sapino, PageRank Revisited: On the Relationship between Node Degrees and Node Significances in Different Applications, International Workshop on Querying Graph Structured Data (GraphQ'16), in conjunct with EDBT 2016. • Mijung Kim, K. Selcuk Candan: Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions. Data Min. Knowl. Discov, 30(1): 1-46 (2016) • Shengyu Huang, Xinsheng Li, K. Selcuk Candan, Maria Luisa Sapino: Reducing seed noise in personalized PageRank. Social Netw. Analys. Mining. 6(1): 6:1-6:25 (2016) • Mithila Nagendra, K. Selcuk Candan: Efficient Processing of Skyline-Join Queries over Multiple Data Sources. ACM Trans. Database Syst. 40(2): 10 (2015) • Jung Hyun Kim, K. Selcuk Candan, Maria Luisa Sapino: Locality-sensitive and Re-use Promoting Personalized PageRank Computations. Knowledge and Information Systems, pp 1-39, First online: 18 June 2015. • Parth Nagarkar, K. Selcuk Candan, Aneesha Bhat: Compressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads. PVLDB 8(12): 1382-1393 (2015) • Xilun Chen, K. Selcuk Candan, Maria Luisa Sapino, Paulo Shakarian: KSGM: Keynode-driven Scalable Graph Matching. CIKM 2015: 1101-1110
  • 40. http://cascaderesearch.org Relevant Publications • Xilun Chen and K. Selcuk Candan. LWI-SVD: Low-rank, Windowed, Incremental Singular Value Decompositions on Time-Evolving Data Sets. KDD'14, NY, USA. 2014. • Xilun Chen and K. Selcuk Candan. GI-NMF: Group Incremental Non-Negative Matrix Factorization on Data Streams. ACM International Conference on Conference on Information and Knowledge Management (CIKM'14). Shaghai, China. 2014. • Mijung Kim and K. Selcuk Candan. Efficient Static and Dynamic In-Database Tensor Decompositions on Chunk-Based Array Stores. ACM International Conference on Conference on Information and Knowledge Management (CIKM'14). Shaghai, China. 2014. • Xinsheng Li, Shenyu huang, K. Selcuk Candan, Maria Luisa Sapino. Focusing Decomposition Accuracy by Personalizing Tensor Decomposition (PTD). ACM International Conference on Information and Knowledge Management (CIKM'14). Shanghai, China. 2014. • Mijung Kim and K. Selcuk Candan. Pushing-Down Tensor Decompositions over Unions to Promote Reuse of Materialized Decompositions. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD'14). Nancy, France. 2014. • Shengyu Huang, Xinsheng Li, K. Selcuk Candan, Maria Luisa Sapino. “Can you really trust that seed?”•: Reducing the Impact of Seed Noise in Personalized PageRank. International Conference on Advances in Social Network Analysis and Mining (ASONAM). Beijing, China. 2014 • Parth Nagarkar and K. Selcuk Candan. HCS: Hierarchical Cut Selection for Efficiently Processing Queries on Data Columns using Hierarchical Bitmap Indices. EDBT'14: pp. 271-282, 2014.
  • 41. http://cascaderesearch.org Relevant Publications • Xiaolan Wang, K. Selcuk Candan, and Maria Luisa Sapino. Leveraging Metadata for Identifying Local, Robust Multi-variate Temporal (RMT) Features. accepted to ICDE 2014 • Claudio Schifanella, K. Selcuk Candan, and Maria Luisa Sapino. Multiresolution Tensor Decompositions with Mode Hierarchies. Trans. on Knowledge Discovery from Data (TKDD), ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2), June 2014. • Jung W. Kim, K. Selcuk Candan, and M. L. Sapino. LR-PPR: Locality-Sensitive, Re-use Promoting, Approximate Personalized PageRank Computation. CIKM'13, 2013. • Mithila Nagendra and K. Selcuk Candan. Layered Processing of Skyline-Window-Join (SWJ) Queries using Iteration-Fabric. ICDE'13, pp. 985-996, 2013. • Mithila Nagendra and K. Selcuk Candan. SkySuite: A Framework of Skyline Join Operators for Static and Stream Environments. VLDB'13, 2013. • Jung Hyun Kim, Xilun Chen, K. Selcuk Candan, and Maria Luisa Sapino. Hive Open Research Network Platform, at EDBT'13, pp. 985-996, 2013. • Mijung Kim, K. Selçuk Candan: SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data Knowl. Eng. 72: 285-303 (2012) • Claudio Schifanella, Maria Luisa Sapino, K. Selçuk Candan: On context-aware co-clustering with metadata support. J. Intell. Inf. Syst. 38(1): 209-239 (2012)
  • 42. http://cascaderesearch.org Relevant Publications • K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: sDTW: Computing DTW Distances using Locally Relevant Constraints based on Salient Feature Alignments. PVLDB 5(11): 1519-1530 (2012) • Mijung Kim, K. Selçuk Candan: Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient Tensor Decomposition. CIKM 2012: 355-364 • Jung Hyun Kim, K. Selçuk Candan, Maria Luisa Sapino: Impact Neighborhood Indexing (INI) in diffusion graphs. CIKM 2012: 2184-2188 • K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: STFMap: query- and feature-driven visualization of large time series data sets. CIKM 2012: 2743-2745 • Mithila Nagendra, K. Selçuk Candan: Skyline-sensitive joins with LR-pruning. EDBT 2012: 252-263 • Songling Liu, Juan P. Cedeño, K. Selçuk Candan, Maria Luisa Sapino, Shengyu Huang, Xinsheng Li: R2DB: A System for Querying and Visualizing Weighted RDF Graphs. ICDE 2012: 1313-1316.