SlideShare ist ein Scribd-Unternehmen logo
1 von 14
May 2013 Blue Canopy
Knowledge at the Speed
of Need: HADOOP
Integration and the
Semantic Web
National Cyber Practice
Blue Canopy White Paper
Written by Nick Savage, Robert Bergstrom
May 2012 Page 2 Blue Canopy Group, LLC Internal Only
Contents
Executive Summary..........................................................................................................3
Solving the “BIG DATA” Challenge – Volume, Velocity, and Variety .................4
Identifying a Hi Volume Data Challenge .....................................................................4
Identifying a Hi Velocity Data Challenge ....................................................................6
Identifying a Hi Variety Data Challenge ......................................................................7
The Distinct Roles of HADOOP Integration versus the Semantic web ...................8
The Blue Fusion Technical Architecture........................................................................9
Tools of the Trade.............................................................................................................11
Conclusion........................................................................................................................12
May 2012 Page 3 2013 Blue Canopy Group, LLC CN Savage
Section 1 fro
Executive Summary
In the world of government and business, there are a variety of industries that require the
capabilities to efficiently find solutions to exchange trillions of dollars, petabytes of data
for billions of consumers and workers in picoseconds to get the job done daily. The data
produced consist of continuous data written to log files for tracking security
requirements, patient or client records required to remain online or archived offline to
ensure standards are met to comply with mandates or laws. For example, in healthcare
patient records must be maintained typically 7 to 10 years, whereas military patient
records must be maintained beyond the life of the soldier and beneficiaries. How do you
look for trends and patterns in so much data where the future value cannot be predicted or
measured? For example, before ingesting the data you need to determine where to store
the data to avoid read versus write issues and query the data effectively.
Another challenge for the technologist is managing stream data generated rapidly by
networks, web searches, sensors, or phone conversations that involve or incur large
volumes of data may be read infrequently with limited computing and storage capabilities
available. How do you mine this data? How do you integrate and correlate this
dynamically changing data with your structured data (i.e., database, marts, or warehouses
without available foreign keys).
Data velocity is another operational challenge faced by institutions where time is critical
searching for one defined state to save lives, avoid catastrophic events, or acknowledge
positive major activities. Certain systems must perform complex event processing
required by extracting information from various data sources quickly reducing the events
to one activity, state or alert; the analyst or expert system must then predict certain events
based on patterns in the data and infer an anomaly or complex event. For example in
real world operations (i.e. Situational Awareness scenarios), a complex event requiring
data and operational involvement from multiple federal agencies (i.e., Intel, CDC,
Healthcare, FBI) and systems must be identified with velocity or expediency prior to
Bioterrorist unleashing a pandemic smallpox threat. Blue Fusion is the Blue Canopy
solution that adapts to new information, new ontologies, and new relationships based on
the domain and presents the results to the user in simple way. Organizations must also
overcome disparate data challenges by managing stovepipe systems in the enterprise in
structured (e.g., database, XML) and unstructured (e.g., email, blobs, PowerPoint slides)
formats to extract the most knowledge based on different data delivery requirements.
How do you maintain performance and system tolerance to integrate and correlate the
multiple data sources?
This paper provides the architectural details for the Blue Canopy technical approach,
Blue Fusion, to design a loosely couple architecture and develop an ontology-based
May 2012 Page 4 2013 Blue Canopy Group, LLC CN Savage
application environment that leverages the open source distributed Hadoop framework
and the flexibility of the Semantic web to produce a BIG DATA integration solution that
is fast, dynamic, cost-effective and reliable.
Solving the “Big Data” Challenge - Volume,
Velocity, and Variety
You must first formulatea data processing profile and analyzethe typeof "Big Data" challenge
the user and the organization is facing by establishing whether you areaddressingdata velocity
requirements to address critical conditions,data volume issues (e.g., fusing data infrequently to a
large integrated data store), or a third category where the obstacleis consolidating a large variety
of data sources , types, and delivery systems intoa seamless but loosely- coupled environment.
A well conceived loosely-coupled architecture accommodates combinations ofthe threedata
profiles (i.e., hi-volume, hi-velocity,and hi-variety). In the first case, we will explore a big
data high volume scenario.
Identifying a High Volume Data Scenario
Figure 1 Banking Use Case - Hi Volume Data Scenario
May 2012 Page 5 2013 Blue Canopy Group, LLC CN Savage
(Hi-Volume Data) As depicted in the Figure 1 drawing at a high level in this
Banking use case the banking analyst runs analytics on very large amounts of data
from possibly disparate data sources, identifying “gaps” and “overlaps” in composite
view of the subject that is not time sensitive;Forensics can be performed in this case
to search for the pre-defined conditions and anomalies based on the ontology. To
reiterate the objective, the emphasis is on addressing high volume not data velocity,
in which the disparate data must be ingested, collected, normalized, and analyzed.
The Situation
The banking system runs thousands of transactions daily. Somehow an internal
criminal has inserted an algorithm depositing “half pennies” into a “shadow"
account that collects funds from all banking accounts regionally. These miniscule
series of thefts may not be detected by the analyst. The funds add up rapidly to
accruing millions of dollars. Without the aid of a forward thinking expert system,
the analyst may take months to uncover the pattern, if at all.
The Approach
The Blue Fusion approach rolls up all of the banking information into a nightly
backup containing all transactions, and emails, for the day to be analyzed later
based on the ontology definitions and business rules established at the business
ruler loaded into memory, created to look for correlations and patterns related to
fraud generating an alert to the bank analyst. In this case, the system is alerted to
seek out a pattern identifying a large series of very unusually small transactions
that correlate with very large withdrawals or withdrawals committed just below the
$10,000 threshold.
1. The Blue Fusion Rules server determines where the information will be
stored and/or queried once the data has been ingested onto HDFS.
2. Once the data is map reduced, a non-RDF triple is stored in HBASE
cache using the timestamp as the unique identifier; Why? HBASE does
not currently recognize data graphs.
3. After each ingest of new data, all data is pushed to the RDF Database for
storage and possible display. The notification systemlooks for the pre-
defined alerts or conditions defined in the ontologies.
4. The triples are pushed to the RDF Database for immediate storage or
SPARQL queries, and the result set from the query is displayed to the
user's dashboard.
May 2012 Page 6 2013 Blue Canopy Group, LLC CN Savage
5. Basedon user’s rights and settings, the analyst may see the user account
number, number of associatedtransactions, and the total amount stolen.
Whereas, the investigator may have access to the culprits name and other
vital detailed information.
Identifying a High Velocity Data Scenario
In the second case, we will explore a high velocity data scenario.
Figure 2 Bioterrorism Hi-Velocity Data Scenario
(Hi-Velocity) In this scenario the analyst must focus on mining near real-time and
streaming data to resolve critical conditions. As depicted in the Figure 2 drawing at a
high level in this Bioterrorism use case the law enforcement analyst runs analytics on
very large amounts of data from possibly disparate data sources, identifying “gaps”
and “overlaps” in composite view of the subject but this time the information returned
is time sensitive; Forensics can be performed in this case to search for the pre-defined
conditions and anomalies based on the ontology. The emphasis is on addressing data
velocity, in which the disparate data must be ingested, collected, map reduced, saved
in triple stores to identify a state based monitoring the network, sensors in the
atmosphere, weather indicators in addition to the structured data contained in the
transactional database, the business intelligence, and the reporting system.
May 2012 Page 7 2013 Blue Canopy Group, LLC CN Savage
The Situation
Bioterrorist infect members of an operational cell with smallpox. Their plans are
to fly in planes to football games held in domed facilities where 100,000 sports fans
are in attendance during the late summer while the weather is still very hot.
Authorities will have seven to seventeen days to contain the emergency. Several
civilians become infected prior to the planned attack and visit area hospitals
showing symptoms of high fever and rash. After several days, the hospitals conduct
blood test, the results return positive for smallpox. A powerful sensor that can
detect airborne pathogens generates data indicating higher levels of variola virus
in a subway tunnel area where the terrorist traveled from the airport.
The Operation Approach
The local hospitals continue to treat the patient based on a limited amount of
information. Patients have a high fever which is a common symptom for many
diseases. Meanwhile, the CDC analyst monitors the systems searching for any
critical conditions related to pandemics. As the disease severely progresses the rash
and lesions appear on the patients. Blood tests are conducted on the severely ill
patients. Results return positive for variola indicating smallpox. At this stage,the
race against the clock begins as the medical and criminal organizations are
notified.
The Blue Fusion Approach
1. In the background the Blue Fusion ontology rules created to look for
correlations and patterns is processed by the Complex Event server.
2. The rules server queries the data for reports of large sums of capital
distributed to a suspicious non-profit, criminal activity on 911 calls, web
activity, symptoms recorded for other patients and hospitals;
3. Initially this information was stored in the HBASE cache.
4. Within days, the Blue Fusion approach receives information from the
pathogen bio sensorsystemretained in the triple stores in the RDF
database and simultaneously the CDC receives single blood test returned
positive for a patient;
5. The CDC and Health Analyst alert the authorities. The expert system
correlates the new data and retrieves the information to be processedin
RDF database to conduct additional queries for law enforcement
personnel to track down the operational cell.
6. The community of interest mobilizes to immunize the population and
capture the criminals.
May 2012 Page 8 2013 Blue Canopy Group, LLC CN Savage
Identifying a High Variety Data Scenario
In the final case, we will explore the hybrid where the data fusion requirements
require the developer to manage a high variety of data sources, types, and delivery
requirements.
Figure 3 Hi-Variety Data Scenario
(Hi-Variety) In this financial hacker scenario the analyst must focus on defining
relationships and uncovering links between data that identifies, segments, and
stratifies subject populations based on a wide variety of variables, types, and
structures; Semantically integrating data to create a holistic 360-degree view of the
subject across a defined continuum. As depicted in the Figure 3 drawing at a high
level in this Financial use case the financial analyst and law enforcement analyst
work together to runs analytics on very large amounts of data from possibly disparate
data sources, identifying “gaps” and “overlaps” in composite view of the subject
where the mobile applications exist to initiate trades, video must be merged with
transactional data and emails and flat files; Forensics can be performed in this case to
search for the pre-defined conditions and anomalies based on the ontology; However,
the system is in a constant flux building on the original ontologies defined in the
repository to ensure the system is adapting and learning from the new relationships;
the emphasis is on addressing flexibility and complex data integration, in which the
disparate data must be ingested, collected, map reduced, saved in a very high number
of triple stores to identify the relationship at the data graph level.
May 2012 Page 9 2013 Blue Canopy Group, LLC CN Savage
The Situation
An international Hackers' organization has been hired to create havoc on the stock
market by contaminating and limiting access to major banking, financial and
corporate sites in addition to many high profile social media sites; the hiring
criminal firm has plans on exploiting the stock market by placing "short sales" on
the stock of the affected firms. As the global market plummets and panic ensues,
the firm makes billions of dollars based on the negative events routing the gains to
private accounts. One hacker with similar features to a high level deceased
executive has assumed his identity with all required ID cards and access codes;
The individual liquidates millions and disappears in the night; All parties are
unaware of the Executive's whereabouts and demise. The body of the missing
Executive is discovered the next day. In the meantime, the hackers have disabled
blogs, the email systems, and penetrated the network accessing personal credit card
and banking information for millions. The identities are sold on the black market.
The Blue Fusion Approach
The Blue Fusion approach gathers BIG DATA from multiple sources. First
ingesting the data exchanged between several large organizations. Based on
established data exchange agreements between the FBI, the financial institutions
affected, ingesting the structured data into HDFS and the HBASE cache for
integration with related emails, security and network audit log file for future data
mining purposes to be processed after highest level processing events have been
completed. The rules server map reduces the higher priority real time data
produced (i.e., network intruder alert detections, phone activity) and stores the
triples in the RDF database based on the ontologies that define the relationships
between stock price changes and formulas that identify anomalies in selling and
buying patterns on the market. The expert system identifies the companies and
individuals that capitalize on the major losses. The location and description of
related videos from the financial office of the missing executive indicates strange
activity in his office after a Blue Fusion alert is generated indicating his logout
time did not coincide with his departure from the office. The link to the video is
accessed on a query through the data graph stored in the RDF database identifying
imposter in the FBI Hacker Database. Initial information stored in the HBASE
cache is mined and moves the relevant triples to the RDF database. The expert
system correlates the new data and retrieves the information to be processed in
RDF database to conduct additional queries for law enforcement personnel. The
communities of interest mobilize to trace the money trail and correlate the Hacker
methods of operation from the Hacker Database and locate their current
whereabouts.
The Distinct Roles of HADOOP Integration versus the Semantic web
May 2012 Page 10 2013 Blue Canopy Group, LLC CN Savage
Technically, what role does HADOOP framework and the Semantic web provide?
In this case the HADOOP components fulfill several functions, the primary role is
storage, distributed processing, and fault tolerance to ingest the data and function as a
data cache using HBASE. HBASE stores data in a scalable and large distributed
database that runs on HADOOP. From a technological perspective, HBASE is
designed better for fast reading purposes not writing; Thus, it's better to run queries
on "time sensitive" data in the Resource Description Framework (RDF) Database;
Why? Web semantics graph data of RDF form triple stores (i.e., subject-predicate-
object triplet) requires optimization on the data with indices which don't exist in the
HBASE database structure thus significant coding required in Java and use of JENA
and REST servers to poll for data changes and maintain notifications to the user's
dashboard ; however, when time is not critical, HDFS and HBASE is best used when
it has yet to be determined what data mining must occur based on the data source;
Based on time sensitivity a rules server must determine if the information should:
1. Remain in HDFS and stored in a RDF database for immediate processing
2. Batched at a later timeframe and stored in the HBASE for possible future
mining.
Once the data is stored in the RDF database in the graph data format in triples the analyst
or the expert system can utilize SPARQL to execute queries in the data. It is
recommended to leverage RDF database that have the capacity to store in the range over
a billion to 50 billion triples to ensure you capture the various permutations of
relationships is well defined for the domain while maintaining speed on writing and
querying the database.
What do the Ontology approach and the Semantic Web provide?
First and foremost the ontological approach allows for the designers and developers the
flexibility and adaptability to dynamically define the rules and relationships between the
structured data, semi-structured and unstructured data. BlueFusion uses the XML
structure of incoming data and reference ontology to automatically derive a conceptual
graph representing the semantics of the data. RDF schema contains the metadata (i.e., the
description of the data). For each data source, most importantly a conceptual graph is
created according to the framework that can be saved in RDF database to ensure triples
can be processed and queried using SPARQL (Resource Description Framework Query
Language). This data graph represents semantic integration of the original data sources .
Integration ontologies can be extended to repurpose or share data and to support different
tasks across disparate applications and eventually across the boundaries if it is an
internet-based application.
The automated agent evaluates incoming XML messages and compares information to
the integrated ontology using established reference ontology for lookup purposes. The
May 2012 Page 11 2013 Blue Canopy Group, LLC CN Savage
reference ontology is implemented in OWL (Ontology Web Language) and contains a
model of XML document structure, and rules and axioms to interpret a generic XML
document. With regards to integration, each data instance is stored as an instance of a
concept, each of which has relations to other concepts that represent metadata such as
time stamps and source data.
The Blue Fusion Architecture
Figure 4 The BlueFusion Technical Approach
The BlueFusion approach provides the analyst, administrator, developer and most
importantly the end user the flexibility to expand the robustness of the enterprise by
providing the methodology and the tools required to leverage the strengths of HADOOP
and the semantic web.
The Blue Fusion Architecture
May 2012 Page 12 2013 Blue Canopy Group, LLC CN Savage
As depicted in the previous drawing, Figure 4, an administrator has the ability to build a
configurable loosely coupled architecture based on the use case and the tools available
tools available that are the "best fit" for the requirements.
Tools of the Trade
The Administrator can perform the following task thru the Bluefusion Admin Tool:
 Configure Storage Requirements
1. Determine which data source will exist for the life of the enterprise and
remain as the system of record or purged at certain intervals :
A. Original XML documents created prior to ingestion into HDFS
B. The Cache which contain the tuples in non-RDF Format; this
information cannot utilize SPARQL to conduct queries; Java custom
coding is required.
C. The RDF Database Data Incubator which contains the triplestores
that may contain information for mining purposes where conditions
have not been identified by the analyst or expert system but is ready to
be analyzed quickly or queried; Based on the configuration, the data
incubator may be a replica of the HBASE cache but in RDF format
where data graphs are recognized and SPARQL can be executed.
2. Based on the business rules and ontologies, configure which triples will be
stored in the High Priority Database (Red - Alert Condition Database) for
immediate processing and which triples will be stored in Medium Priority
Database (Yellow - Discovery Condition Database) certain partial criteria
has been identified for a defined event. For example in the case of
bioterrorism early symptoms for some pandemic diseases are the same as
many common diseases; The victim has a rash and a high fever. The
analyst must have a way to keep monitoring that activity until the
condition is correlated or eliminated as opportunity or threat. Once the
condition has been defined and processed through the dashboard the event
will be promoted to the Alert Condition Database to identify it earlier if it
happens as a future event.
 Configure Access Control Levels granting Create, Read, Update, and Delete
(CRUD) rights to the :
1. User Dashboard Features to determine what is presented to the end user
based on user level and use case.
May 2012 Page 13 2013 Blue Canopy Group, LLC CN Savage
2. Visual & Command line SPARQL applets to ensure the analyst has
access to query and perform analysis on the use case.
3. HADOOP Cache to determine duration for storing the data the ACL to
the cache and related logs.
4. Semantic Materialized Views to ensure appropriate user level
information is reported based on use case.
5. HADOOP Performance tuning features to manage the workload
balance between the various data stores and distribution management.
6. Semantic Situational Awareness Data Profile performance variables or
template based on the use case determine temporal settings, subject
settings, as relates to data velocity,volume and variety. Workload balance
tuning to spawn more expert agents to mine data.
The Bluefusion approach is to establish three separate RDF Database instances
using HBASE as a cache.
May 2012 Page 14 2013 Blue Canopy Group, LLC CN Savage
This White Paper is for informational purposes only. BLUE CANOPY MAKES NO WARRANTIES, EXPRESSOR
IMPLIED, IN THIS WHITE PAPER. Other trademarks and trade names may be usedin this documentto refer to either
the entities claiming the marks and names or their products. Blue Canopy disclaims proprietary interestinthe marks and
names ofothers.
©Copyright2011 Blue Canopy Group, LLC. All rights reserved. Reproductioninany manner whatsoever withoutthe
express writtenpermissionofBlue Canopy Group, LLC is strictly forbidden. Informationin this documentis subjectto
change withoutnotice.

Weitere ähnliche Inhalte

Was ist angesagt?

Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)Chaitanya Kn
 
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...IJMTST Journal
 
Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...IJMTST Journal
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
 
Data Management for the Arts and Humanities
Data Management for the Arts and HumanitiesData Management for the Arts and Humanities
Data Management for the Arts and HumanitiesRebekah Cummings
 
Understanding Dark Data
Understanding Dark DataUnderstanding Dark Data
Understanding Dark DataAhmed Banafa
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018suresh sood
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 
White Paper Data Quality Process Design For Ad Hoc Reporting
White Paper   Data Quality Process Design For Ad Hoc ReportingWhite Paper   Data Quality Process Design For Ad Hoc Reporting
White Paper Data Quality Process Design For Ad Hoc Reportingmacrochaotic
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 

Was ist angesagt? (20)

Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Remadder tutorial
Remadder tutorialRemadder tutorial
Remadder tutorial
 
Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)Jpdcs1(data lekage detection)
Jpdcs1(data lekage detection)
 
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
 
Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 
Data Management for the Arts and Humanities
Data Management for the Arts and HumanitiesData Management for the Arts and Humanities
Data Management for the Arts and Humanities
 
Understanding Dark Data
Understanding Dark DataUnderstanding Dark Data
Understanding Dark Data
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
White Paper Data Quality Process Design For Ad Hoc Reporting
White Paper   Data Quality Process Design For Ad Hoc ReportingWhite Paper   Data Quality Process Design For Ad Hoc Reporting
White Paper Data Quality Process Design For Ad Hoc Reporting
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Data mining
Data miningData mining
Data mining
 
Big data
Big dataBig data
Big data
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Linked data in pharma R&D
Linked data in pharma R&DLinked data in pharma R&D
Linked data in pharma R&D
 

Andere mochten auch

381.причина повреждения вала и зубчатой передачи
381.причина повреждения вала и зубчатой передачи381.причина повреждения вала и зубчатой передачи
381.причина повреждения вала и зубчатой передачиivanov1566359955
 
บัวขาว
บัวขาวบัวขาว
บัวขาวMSGameming
 
NPS Wood Equivalent Steel Poles
NPS Wood Equivalent Steel PolesNPS Wood Equivalent Steel Poles
NPS Wood Equivalent Steel PolesBenjamin Riemersma
 
Evaluation 3 finished
Evaluation 3 finishedEvaluation 3 finished
Evaluation 3 finishedCallum Fisher
 
газета 1 выпуск приложение
газета 1 выпуск   приложениегазета 1 выпуск   приложение
газета 1 выпуск приложениеAnastasia Simonova
 

Andere mochten auch (14)

Latihan 6
Latihan 6Latihan 6
Latihan 6
 
381.причина повреждения вала и зубчатой передачи
381.причина повреждения вала и зубчатой передачи381.причина повреждения вала и зубчатой передачи
381.причина повреждения вала и зубчатой передачи
 
บัวขาว
บัวขาวบัวขาว
บัวขาว
 
Visual_Resume
Visual_ResumeVisual_Resume
Visual_Resume
 
NPS Wood Equivalent Steel Poles
NPS Wood Equivalent Steel PolesNPS Wood Equivalent Steel Poles
NPS Wood Equivalent Steel Poles
 
Pres album rimbaud 13
Pres album rimbaud 13Pres album rimbaud 13
Pres album rimbaud 13
 
Editing Updates
Editing UpdatesEditing Updates
Editing Updates
 
Evaluation 3 finished
Evaluation 3 finishedEvaluation 3 finished
Evaluation 3 finished
 
Pres album rimbaud 13
Pres album rimbaud 13Pres album rimbaud 13
Pres album rimbaud 13
 
5968-4545E
5968-4545E5968-4545E
5968-4545E
 
газета 1 выпуск приложение
газета 1 выпуск   приложениегазета 1 выпуск   приложение
газета 1 выпуск приложение
 
Lect5.4.1
Lect5.4.1Lect5.4.1
Lect5.4.1
 
Lect4.3.2
Lect4.3.2Lect4.3.2
Lect4.3.2
 
газета 1 выпуск
газета 1 выпускгазета 1 выпуск
газета 1 выпуск
 

Ähnlich wie Blue Canopy Semantic Web Approach v25 brief

Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionIOSR Journals
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesKaran Deep Singh
 
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data ProjectCitiusTech
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big datasarfraznawaz
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfrajsharma159890
 
eBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data GovernanceeBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data GovernanceKim Cook
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Delivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudDelivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudBooz Allen Hamilton
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139IJRAT
 
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
Privacy Preserving Aggregate Statistics for Mobile CrowdsensingPrivacy Preserving Aggregate Statistics for Mobile Crowdsensing
Privacy Preserving Aggregate Statistics for Mobile CrowdsensingIJSRED
 

Ähnlich wie Blue Canopy Semantic Web Approach v25 brief (20)

Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
Abstract
AbstractAbstract
Abstract
 
Unit 1
Unit 1Unit 1
Unit 1
 
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
 
A1802030104
A1802030104A1802030104
A1802030104
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
Datamining
DataminingDatamining
Datamining
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
eBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data GovernanceeBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data Governance
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
AIS 3 - EDITED.pdf
AIS 3 - EDITED.pdfAIS 3 - EDITED.pdf
AIS 3 - EDITED.pdf
 
Delivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudDelivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the Cloud
 
U - 2 Emerging.pptx
U - 2 Emerging.pptxU - 2 Emerging.pptx
U - 2 Emerging.pptx
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139
 
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
Privacy Preserving Aggregate Statistics for Mobile CrowdsensingPrivacy Preserving Aggregate Statistics for Mobile Crowdsensing
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 

Blue Canopy Semantic Web Approach v25 brief

  • 1. May 2013 Blue Canopy Knowledge at the Speed of Need: HADOOP Integration and the Semantic Web National Cyber Practice Blue Canopy White Paper Written by Nick Savage, Robert Bergstrom
  • 2. May 2012 Page 2 Blue Canopy Group, LLC Internal Only Contents Executive Summary..........................................................................................................3 Solving the “BIG DATA” Challenge – Volume, Velocity, and Variety .................4 Identifying a Hi Volume Data Challenge .....................................................................4 Identifying a Hi Velocity Data Challenge ....................................................................6 Identifying a Hi Variety Data Challenge ......................................................................7 The Distinct Roles of HADOOP Integration versus the Semantic web ...................8 The Blue Fusion Technical Architecture........................................................................9 Tools of the Trade.............................................................................................................11 Conclusion........................................................................................................................12
  • 3. May 2012 Page 3 2013 Blue Canopy Group, LLC CN Savage Section 1 fro Executive Summary In the world of government and business, there are a variety of industries that require the capabilities to efficiently find solutions to exchange trillions of dollars, petabytes of data for billions of consumers and workers in picoseconds to get the job done daily. The data produced consist of continuous data written to log files for tracking security requirements, patient or client records required to remain online or archived offline to ensure standards are met to comply with mandates or laws. For example, in healthcare patient records must be maintained typically 7 to 10 years, whereas military patient records must be maintained beyond the life of the soldier and beneficiaries. How do you look for trends and patterns in so much data where the future value cannot be predicted or measured? For example, before ingesting the data you need to determine where to store the data to avoid read versus write issues and query the data effectively. Another challenge for the technologist is managing stream data generated rapidly by networks, web searches, sensors, or phone conversations that involve or incur large volumes of data may be read infrequently with limited computing and storage capabilities available. How do you mine this data? How do you integrate and correlate this dynamically changing data with your structured data (i.e., database, marts, or warehouses without available foreign keys). Data velocity is another operational challenge faced by institutions where time is critical searching for one defined state to save lives, avoid catastrophic events, or acknowledge positive major activities. Certain systems must perform complex event processing required by extracting information from various data sources quickly reducing the events to one activity, state or alert; the analyst or expert system must then predict certain events based on patterns in the data and infer an anomaly or complex event. For example in real world operations (i.e. Situational Awareness scenarios), a complex event requiring data and operational involvement from multiple federal agencies (i.e., Intel, CDC, Healthcare, FBI) and systems must be identified with velocity or expediency prior to Bioterrorist unleashing a pandemic smallpox threat. Blue Fusion is the Blue Canopy solution that adapts to new information, new ontologies, and new relationships based on the domain and presents the results to the user in simple way. Organizations must also overcome disparate data challenges by managing stovepipe systems in the enterprise in structured (e.g., database, XML) and unstructured (e.g., email, blobs, PowerPoint slides) formats to extract the most knowledge based on different data delivery requirements. How do you maintain performance and system tolerance to integrate and correlate the multiple data sources? This paper provides the architectural details for the Blue Canopy technical approach, Blue Fusion, to design a loosely couple architecture and develop an ontology-based
  • 4. May 2012 Page 4 2013 Blue Canopy Group, LLC CN Savage application environment that leverages the open source distributed Hadoop framework and the flexibility of the Semantic web to produce a BIG DATA integration solution that is fast, dynamic, cost-effective and reliable. Solving the “Big Data” Challenge - Volume, Velocity, and Variety You must first formulatea data processing profile and analyzethe typeof "Big Data" challenge the user and the organization is facing by establishing whether you areaddressingdata velocity requirements to address critical conditions,data volume issues (e.g., fusing data infrequently to a large integrated data store), or a third category where the obstacleis consolidating a large variety of data sources , types, and delivery systems intoa seamless but loosely- coupled environment. A well conceived loosely-coupled architecture accommodates combinations ofthe threedata profiles (i.e., hi-volume, hi-velocity,and hi-variety). In the first case, we will explore a big data high volume scenario. Identifying a High Volume Data Scenario Figure 1 Banking Use Case - Hi Volume Data Scenario
  • 5. May 2012 Page 5 2013 Blue Canopy Group, LLC CN Savage (Hi-Volume Data) As depicted in the Figure 1 drawing at a high level in this Banking use case the banking analyst runs analytics on very large amounts of data from possibly disparate data sources, identifying “gaps” and “overlaps” in composite view of the subject that is not time sensitive;Forensics can be performed in this case to search for the pre-defined conditions and anomalies based on the ontology. To reiterate the objective, the emphasis is on addressing high volume not data velocity, in which the disparate data must be ingested, collected, normalized, and analyzed. The Situation The banking system runs thousands of transactions daily. Somehow an internal criminal has inserted an algorithm depositing “half pennies” into a “shadow" account that collects funds from all banking accounts regionally. These miniscule series of thefts may not be detected by the analyst. The funds add up rapidly to accruing millions of dollars. Without the aid of a forward thinking expert system, the analyst may take months to uncover the pattern, if at all. The Approach The Blue Fusion approach rolls up all of the banking information into a nightly backup containing all transactions, and emails, for the day to be analyzed later based on the ontology definitions and business rules established at the business ruler loaded into memory, created to look for correlations and patterns related to fraud generating an alert to the bank analyst. In this case, the system is alerted to seek out a pattern identifying a large series of very unusually small transactions that correlate with very large withdrawals or withdrawals committed just below the $10,000 threshold. 1. The Blue Fusion Rules server determines where the information will be stored and/or queried once the data has been ingested onto HDFS. 2. Once the data is map reduced, a non-RDF triple is stored in HBASE cache using the timestamp as the unique identifier; Why? HBASE does not currently recognize data graphs. 3. After each ingest of new data, all data is pushed to the RDF Database for storage and possible display. The notification systemlooks for the pre- defined alerts or conditions defined in the ontologies. 4. The triples are pushed to the RDF Database for immediate storage or SPARQL queries, and the result set from the query is displayed to the user's dashboard.
  • 6. May 2012 Page 6 2013 Blue Canopy Group, LLC CN Savage 5. Basedon user’s rights and settings, the analyst may see the user account number, number of associatedtransactions, and the total amount stolen. Whereas, the investigator may have access to the culprits name and other vital detailed information. Identifying a High Velocity Data Scenario In the second case, we will explore a high velocity data scenario. Figure 2 Bioterrorism Hi-Velocity Data Scenario (Hi-Velocity) In this scenario the analyst must focus on mining near real-time and streaming data to resolve critical conditions. As depicted in the Figure 2 drawing at a high level in this Bioterrorism use case the law enforcement analyst runs analytics on very large amounts of data from possibly disparate data sources, identifying “gaps” and “overlaps” in composite view of the subject but this time the information returned is time sensitive; Forensics can be performed in this case to search for the pre-defined conditions and anomalies based on the ontology. The emphasis is on addressing data velocity, in which the disparate data must be ingested, collected, map reduced, saved in triple stores to identify a state based monitoring the network, sensors in the atmosphere, weather indicators in addition to the structured data contained in the transactional database, the business intelligence, and the reporting system.
  • 7. May 2012 Page 7 2013 Blue Canopy Group, LLC CN Savage The Situation Bioterrorist infect members of an operational cell with smallpox. Their plans are to fly in planes to football games held in domed facilities where 100,000 sports fans are in attendance during the late summer while the weather is still very hot. Authorities will have seven to seventeen days to contain the emergency. Several civilians become infected prior to the planned attack and visit area hospitals showing symptoms of high fever and rash. After several days, the hospitals conduct blood test, the results return positive for smallpox. A powerful sensor that can detect airborne pathogens generates data indicating higher levels of variola virus in a subway tunnel area where the terrorist traveled from the airport. The Operation Approach The local hospitals continue to treat the patient based on a limited amount of information. Patients have a high fever which is a common symptom for many diseases. Meanwhile, the CDC analyst monitors the systems searching for any critical conditions related to pandemics. As the disease severely progresses the rash and lesions appear on the patients. Blood tests are conducted on the severely ill patients. Results return positive for variola indicating smallpox. At this stage,the race against the clock begins as the medical and criminal organizations are notified. The Blue Fusion Approach 1. In the background the Blue Fusion ontology rules created to look for correlations and patterns is processed by the Complex Event server. 2. The rules server queries the data for reports of large sums of capital distributed to a suspicious non-profit, criminal activity on 911 calls, web activity, symptoms recorded for other patients and hospitals; 3. Initially this information was stored in the HBASE cache. 4. Within days, the Blue Fusion approach receives information from the pathogen bio sensorsystemretained in the triple stores in the RDF database and simultaneously the CDC receives single blood test returned positive for a patient; 5. The CDC and Health Analyst alert the authorities. The expert system correlates the new data and retrieves the information to be processedin RDF database to conduct additional queries for law enforcement personnel to track down the operational cell. 6. The community of interest mobilizes to immunize the population and capture the criminals.
  • 8. May 2012 Page 8 2013 Blue Canopy Group, LLC CN Savage Identifying a High Variety Data Scenario In the final case, we will explore the hybrid where the data fusion requirements require the developer to manage a high variety of data sources, types, and delivery requirements. Figure 3 Hi-Variety Data Scenario (Hi-Variety) In this financial hacker scenario the analyst must focus on defining relationships and uncovering links between data that identifies, segments, and stratifies subject populations based on a wide variety of variables, types, and structures; Semantically integrating data to create a holistic 360-degree view of the subject across a defined continuum. As depicted in the Figure 3 drawing at a high level in this Financial use case the financial analyst and law enforcement analyst work together to runs analytics on very large amounts of data from possibly disparate data sources, identifying “gaps” and “overlaps” in composite view of the subject where the mobile applications exist to initiate trades, video must be merged with transactional data and emails and flat files; Forensics can be performed in this case to search for the pre-defined conditions and anomalies based on the ontology; However, the system is in a constant flux building on the original ontologies defined in the repository to ensure the system is adapting and learning from the new relationships; the emphasis is on addressing flexibility and complex data integration, in which the disparate data must be ingested, collected, map reduced, saved in a very high number of triple stores to identify the relationship at the data graph level.
  • 9. May 2012 Page 9 2013 Blue Canopy Group, LLC CN Savage The Situation An international Hackers' organization has been hired to create havoc on the stock market by contaminating and limiting access to major banking, financial and corporate sites in addition to many high profile social media sites; the hiring criminal firm has plans on exploiting the stock market by placing "short sales" on the stock of the affected firms. As the global market plummets and panic ensues, the firm makes billions of dollars based on the negative events routing the gains to private accounts. One hacker with similar features to a high level deceased executive has assumed his identity with all required ID cards and access codes; The individual liquidates millions and disappears in the night; All parties are unaware of the Executive's whereabouts and demise. The body of the missing Executive is discovered the next day. In the meantime, the hackers have disabled blogs, the email systems, and penetrated the network accessing personal credit card and banking information for millions. The identities are sold on the black market. The Blue Fusion Approach The Blue Fusion approach gathers BIG DATA from multiple sources. First ingesting the data exchanged between several large organizations. Based on established data exchange agreements between the FBI, the financial institutions affected, ingesting the structured data into HDFS and the HBASE cache for integration with related emails, security and network audit log file for future data mining purposes to be processed after highest level processing events have been completed. The rules server map reduces the higher priority real time data produced (i.e., network intruder alert detections, phone activity) and stores the triples in the RDF database based on the ontologies that define the relationships between stock price changes and formulas that identify anomalies in selling and buying patterns on the market. The expert system identifies the companies and individuals that capitalize on the major losses. The location and description of related videos from the financial office of the missing executive indicates strange activity in his office after a Blue Fusion alert is generated indicating his logout time did not coincide with his departure from the office. The link to the video is accessed on a query through the data graph stored in the RDF database identifying imposter in the FBI Hacker Database. Initial information stored in the HBASE cache is mined and moves the relevant triples to the RDF database. The expert system correlates the new data and retrieves the information to be processed in RDF database to conduct additional queries for law enforcement personnel. The communities of interest mobilize to trace the money trail and correlate the Hacker methods of operation from the Hacker Database and locate their current whereabouts. The Distinct Roles of HADOOP Integration versus the Semantic web
  • 10. May 2012 Page 10 2013 Blue Canopy Group, LLC CN Savage Technically, what role does HADOOP framework and the Semantic web provide? In this case the HADOOP components fulfill several functions, the primary role is storage, distributed processing, and fault tolerance to ingest the data and function as a data cache using HBASE. HBASE stores data in a scalable and large distributed database that runs on HADOOP. From a technological perspective, HBASE is designed better for fast reading purposes not writing; Thus, it's better to run queries on "time sensitive" data in the Resource Description Framework (RDF) Database; Why? Web semantics graph data of RDF form triple stores (i.e., subject-predicate- object triplet) requires optimization on the data with indices which don't exist in the HBASE database structure thus significant coding required in Java and use of JENA and REST servers to poll for data changes and maintain notifications to the user's dashboard ; however, when time is not critical, HDFS and HBASE is best used when it has yet to be determined what data mining must occur based on the data source; Based on time sensitivity a rules server must determine if the information should: 1. Remain in HDFS and stored in a RDF database for immediate processing 2. Batched at a later timeframe and stored in the HBASE for possible future mining. Once the data is stored in the RDF database in the graph data format in triples the analyst or the expert system can utilize SPARQL to execute queries in the data. It is recommended to leverage RDF database that have the capacity to store in the range over a billion to 50 billion triples to ensure you capture the various permutations of relationships is well defined for the domain while maintaining speed on writing and querying the database. What do the Ontology approach and the Semantic Web provide? First and foremost the ontological approach allows for the designers and developers the flexibility and adaptability to dynamically define the rules and relationships between the structured data, semi-structured and unstructured data. BlueFusion uses the XML structure of incoming data and reference ontology to automatically derive a conceptual graph representing the semantics of the data. RDF schema contains the metadata (i.e., the description of the data). For each data source, most importantly a conceptual graph is created according to the framework that can be saved in RDF database to ensure triples can be processed and queried using SPARQL (Resource Description Framework Query Language). This data graph represents semantic integration of the original data sources . Integration ontologies can be extended to repurpose or share data and to support different tasks across disparate applications and eventually across the boundaries if it is an internet-based application. The automated agent evaluates incoming XML messages and compares information to the integrated ontology using established reference ontology for lookup purposes. The
  • 11. May 2012 Page 11 2013 Blue Canopy Group, LLC CN Savage reference ontology is implemented in OWL (Ontology Web Language) and contains a model of XML document structure, and rules and axioms to interpret a generic XML document. With regards to integration, each data instance is stored as an instance of a concept, each of which has relations to other concepts that represent metadata such as time stamps and source data. The Blue Fusion Architecture Figure 4 The BlueFusion Technical Approach The BlueFusion approach provides the analyst, administrator, developer and most importantly the end user the flexibility to expand the robustness of the enterprise by providing the methodology and the tools required to leverage the strengths of HADOOP and the semantic web. The Blue Fusion Architecture
  • 12. May 2012 Page 12 2013 Blue Canopy Group, LLC CN Savage As depicted in the previous drawing, Figure 4, an administrator has the ability to build a configurable loosely coupled architecture based on the use case and the tools available tools available that are the "best fit" for the requirements. Tools of the Trade The Administrator can perform the following task thru the Bluefusion Admin Tool:  Configure Storage Requirements 1. Determine which data source will exist for the life of the enterprise and remain as the system of record or purged at certain intervals : A. Original XML documents created prior to ingestion into HDFS B. The Cache which contain the tuples in non-RDF Format; this information cannot utilize SPARQL to conduct queries; Java custom coding is required. C. The RDF Database Data Incubator which contains the triplestores that may contain information for mining purposes where conditions have not been identified by the analyst or expert system but is ready to be analyzed quickly or queried; Based on the configuration, the data incubator may be a replica of the HBASE cache but in RDF format where data graphs are recognized and SPARQL can be executed. 2. Based on the business rules and ontologies, configure which triples will be stored in the High Priority Database (Red - Alert Condition Database) for immediate processing and which triples will be stored in Medium Priority Database (Yellow - Discovery Condition Database) certain partial criteria has been identified for a defined event. For example in the case of bioterrorism early symptoms for some pandemic diseases are the same as many common diseases; The victim has a rash and a high fever. The analyst must have a way to keep monitoring that activity until the condition is correlated or eliminated as opportunity or threat. Once the condition has been defined and processed through the dashboard the event will be promoted to the Alert Condition Database to identify it earlier if it happens as a future event.  Configure Access Control Levels granting Create, Read, Update, and Delete (CRUD) rights to the : 1. User Dashboard Features to determine what is presented to the end user based on user level and use case.
  • 13. May 2012 Page 13 2013 Blue Canopy Group, LLC CN Savage 2. Visual & Command line SPARQL applets to ensure the analyst has access to query and perform analysis on the use case. 3. HADOOP Cache to determine duration for storing the data the ACL to the cache and related logs. 4. Semantic Materialized Views to ensure appropriate user level information is reported based on use case. 5. HADOOP Performance tuning features to manage the workload balance between the various data stores and distribution management. 6. Semantic Situational Awareness Data Profile performance variables or template based on the use case determine temporal settings, subject settings, as relates to data velocity,volume and variety. Workload balance tuning to spawn more expert agents to mine data. The Bluefusion approach is to establish three separate RDF Database instances using HBASE as a cache.
  • 14. May 2012 Page 14 2013 Blue Canopy Group, LLC CN Savage This White Paper is for informational purposes only. BLUE CANOPY MAKES NO WARRANTIES, EXPRESSOR IMPLIED, IN THIS WHITE PAPER. Other trademarks and trade names may be usedin this documentto refer to either the entities claiming the marks and names or their products. Blue Canopy disclaims proprietary interestinthe marks and names ofothers. ©Copyright2011 Blue Canopy Group, LLC. All rights reserved. Reproductioninany manner whatsoever withoutthe express writtenpermissionofBlue Canopy Group, LLC is strictly forbidden. Informationin this documentis subjectto change withoutnotice.