SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Cassandra-Based Image
Processing: Two Case
Studies
KERRY KOITZSCH
‘IMAGE AS BIG DATA SYSTEMS LLC’ (IABDS.COM)
Who Am I?
 Image processing enthusiast
 Big data technology enthusiast
 Robotics enthusiast
 NoSQL database enthusiast
 Microscopy enthusiast
 Software engineer
 Author
Overview of this Presentation
 This quick overview of some of our ongoing projects describes why Cassandra is a key
part of our ongoing research, development, and client support activities.
 The presentation highlights to areas of research which involve Cassandra technologies
in the “images as big data” arena: an automated microscope slide application
prototype, and a stereo ”smart sensor” mobile robot prototype.
 Both of the use cases describe rely heavily on Cassandra and related “helper libraries”
to provide data storage capabilities for the two software prototypes.
 Throughout the presentation we discuss how the flexibility, high performance, and
ability to “play well with” other components makes Cassandra an essential part of the
applications described here.
Some Context About Our Use Cases
 The automated microscopy case study is based upon our current POCs for a
distributed microscope slide analytics application. In it, we use standard optical
microscopes fitted with color digital cameras and an automatic slide-loading cartridge
system to process “decks” of glass or plastic microscope slides, usually of human cell
tissue of various kinds, the staining process depending on what the nature of the tissue
is, and, in the case of hematology slides (human blood) no staining at all. Different
resolutions of magnification, slide quality, and metadata must be processed.
 The stereo vision robotics platform case study is based upon our current POCs for a
stereo vision mobile robotics application. The application uses Hadoop, Spark, Flink,
Solr, and a number of additional machine learning libraries to analyze stereo image
pairs, build two- and three-dimensional models of the environment, and uses Bayesian
sensor fusion to enhance the 3D models with additional environmental information.
Our Objectives and How Cassandra Helps us Achieve Them
 In our use cases, we have to integrate mechanical control systems, complex ‘image as big data’
analytics, and a variety of data formats and metadata in a common representation and with
efficient implementation.
 Cassandra and associated ”helper libraries” such as Spring Data Cassandra, Solandra, and Katta
allow us to maintain a flexible technology stack while using off-the-shelf proven technologies
which “play well” together.
 Cassandra has well-defined parameters for optimization, scalability, and availability.
 Cassandra enables us to meet our objective of maintaining flexibility while maintaining high
performance capabilities with a changeable server topology.
 Cassandra based technologies are flexible enough to accommodate a wide range of different
application domains, including microscopy (medical imagery of all kinds) and mobile robotics
applications (stereo vision ‘image as big data’ technology)
Our little software secret! :D
 We are not always software engineers
.
 Sometimes, we are software ARCHAEOLOGISTS!
 But what is software archaeology? And how does it assist us build modern high-
performance distributed software systems?
We are re-inventing software archaeology!
Software Archaeology is Based Upon:
 The realization that the “first wave of AI technologies” (up to the mid-1990s)
introduced a host of valid ideas, concepts, strategies, and approaches which were not
completely assimilated into succeeding waves of AI technology.
 The realization that “looking back at our software roots” may be a good thing.
 Identifying “best of breed” strategies of the past and re-thinking them, using up-to-
date software stacks. Some examples of these technologies include pre-Hadoop
distributed processing systems and algorithms, DeKleer’s implementations of “problem
solvers” [1], geometric toolkits such as VANTAGE and Geometer [2],[3].
 This re-thinking and re-implementing process has already been successfully done for
POCs, products, research extensions and particularly for software patents [4]

Migrating ‘First Wave AI’ Concepts
 Quality image processing systems have been lost in the course of time (for
example General Electric’s Geometer [4], the Vantage modeling system [5], and
many others) 
 while the implementation may need updating many of the
concepts may be --- and have been --- reimplemented using modern
technologies (and in particular Cassandra-centric technologies)
 Migrating our inspirations to modern implementation
 *Inspiration* does not *equal* implementation: we may choose to re-implement in
Clojure rather than Lisp, for example, and we use Cassandra-centric technology
stacks for almost *all* our implementations

Why Our Case Studies are Cassandra-Centric
 Cassandra is a flexible, full-featured noSQL database
 Integrates well with Hadoop, Spark, and Flink-based software components
 A lot of development focus has already been done on different technology stacks, for
example, the SMACK stack
 Integrates well with the battery of Spring Framework components we use
 Cassandra is scalable, available, and highly-tunable
 “Plays well” with Lucene/Solr based technologies (via Solandra and other “glue-ware”)
 It’s easy to integrate Cassandra into use cases and software prototypes, allowing a
“rapid prototyping/development” style
 Its also easy to use Cassandra in a plugin/module based architecture style
 Cassandra ’plays well’ with a variety of machine-learning libraries
 Cassandra is flexible enough for research, solid enough for products
POJOs to Cassandra via Spring Data
 Plain old java objects (POJOs) may be mapped to Cassandra using Spring Data
Cassandra
 We would also like to map ontology objects to Cassandra
 We begin with fundamental feature objects of images: regions, contours, lines,
junctions, and points, the kind of object instances generated from, for example,
connected component algorithms, edge extraction operators, and the like
 For our domain examples, the extracted image metadata and associated “fused”
data from other sensors [ for example microscope slide data sets might have other
sensor results associated with them, like spectrometry and chromatographic data ]
Images ARE Big Data
 Images have volume, variety, velocity, veracity
 While the image volume may be relatively low, the complexity of the individual
image objects may be extremely complex
 There may be associated signals (such as LIDAR and geolocation data)
 Images may be multispectral and multiresolution in nature
 Images may have significant associated metadata (as with medical imagery)
 Images may play only one part in a larger “big data enterprise”
Considerations About the SMACK Stack
Case Study 1: An Automated Microscopy
System
 We want to be able to analyze large numbers of microscope slides automatically.
 The image analysis component may be complex and time consuming: we require a
distributed solution
 Images may require cleaning, rescaling, metadata extraction, and image format
conversion
 Analysis results are stored in Cassandra database with associated search
mechanisms via Solandra and custom code.
 The physical apparatus is a standard medical optical microscope with automated
stage and lighting controls and an automated microscope slide cartridge system
holding 25 microscope slides per deck.
A View of the Physical Apparatus


 And How We See the Problem...
Samples of Input Data from a typical data set
Cassandra-Centric Infrastructure
 Cassandra’s flexibility allows us to adapt data models appropriately for new
experiments and ideas.
 We are also able to maintain consistent ontology development by progressively
evolving our table definitions and POJOs.
 We can develop our data models in separate Cassandra instances for purposes of
experimentation.
 Metadata can be easily expressed: this is particularly important for experimental
medical data, as citations, references to other data sets, parameters, and
associated sensor data may be present, as well as timestamps and audit trail /
handler data if information comes from a doctors office, insurance company, or
hospital.
Architecture Diagram, Case Study I
Canova is now DataVec
Ellipse Feature Extraction Example
Technology Stack, Case Study I
 Mechanical controls are implemented in Java and C++: these control the microscope
stage controls, lighting, positioning, cartridge control, and others
 Timestamps are generated by the microscope controls and are associated with analysis
records
 Slide images may be analyzed one or more times using different data analysis pipelines
 Slide images may be retained in the file system and subjected to feature extraction
and analysis
 We avoid saving images as blobs in Cassandra at this time
 We can “ramp up” our complexity gradually as we evolve our software. For example,
the next slide shows contour extraction from a microscope slide: the contour data may
be stored directly into Cassandra.
Contour Extraction Example
Conclusions, Case Study I
We learned in this case study that Cassandra’s flexible data modeling and
straightforward seamless integration with other ‘image as big data’ components,
including machine learning libraries such as Mahout and deeplearning4j enabled us
to build high performance slide analysis prototypes which were able to store complex
microscopic slide information (derived from feature extraction) and metadata
(associated with the image itself, such as citations, other sensor data, authorship and
auditing/handler data, and the like)
Case Study II: A Stereo Vision Robotics
Platform
Using a standard dual-image synchronized stereo camera on a mobile robotics platform, we
are able to obtain timestamped, geolocated/microgrid located image pairs for analysis. Image
pairs may be generated periodically to produce a set of sequential image pairs.
These image pairs are subjected to feature extraction, model building, and other analytic
techniques to produce scene models, which are associated with scene ontologies and scene
model templates to incorporate within a navigation program.
The navigation program and its data, along with internal sensors within the robot itself, allows
the mobile robot to successfully navigate scenes and perform simple tasks.
The navigation program is also able to perform qualitative navigation based on landmarks.
Goals of the Stereo Vision Platform
 Use a standard stereo image pair generating dual camera for experimentation
with a mobile robot
 We want to investigate two software problems: qualitative navigation and
integrated sensor fusion with robotic control
 We want to use inspiration from Poggio’s MIT Vision Machine in particular as a
starting point for our software
 However, we wish to use an integrated data pipeline design with “best of breed”
third-party components, including Cassandra, Hadoop/Spark, and Solr.
 We model much of our stereo three-dimensional modeling on Hartley and
Zisserman [5] and Faugeras [6]
Hardware of the Robotics Vision System
Architecture Diagram: Simple Stereo Vision
Images from Our Stereo Camera
Image Analysis Example I
Image Analysis Example II
Feature Extraction
 We can use many of the standard feature extraction techniques in a distributed
computing environment, including corner-finding, edges, junctions, t-junctions,
and the like.
 Connected component, canny edge extraction, deep learning, genetic algorithms
are all appropriate feature analysis technologies.
 Importing and exporting feature extraction data to and from Cassandra is easy
and makes sustainable experimentation possible.
 We can migrate from ontology to POJO to schema based design with ease as our
software can handle each of these representations.
Technology Stack, Stereo Vision Platform
Data Flow in the Robotics System
 Data flow in the robotics system is primarily mediated by Apache Kafka
 It turns out much of the extraneous ‘image information’ may be ‘thrown away’ at
least temporarily --- for purposes of specific data flows
 Data flows may be mediated by Hadoop, Spark, or Flink based components
 Some machine learning components used in the data pipeline processing are now
technology agnostic : for example, Mahout supports Spark
 All of the image processing components have been found to work extremely well
and seamlessly with Cassandra
Conclusions, Case Study II
 Cassandra was a key component in our robotics software system
 We were able to run a Cassandra “image database”, “control database”, and
“environmental database” concurrently on three different servers, and coordinate
the data effectively for sensor analysis, fusion, and control purposes.
 We found we were able to do “rapid prototyping” of robotics software using
Cassandra as our primary data store technology
Some Observations about the Technologies
 In spite of the fact that the two applications we discussed here were so different, we evolved both
of the projects in parallel in order to leverage the common elements of the projects with a mind
towards development an ‘image as big data’ toolkit, with which it would be possible to develop a
wide range of domain applications, using the idea of ‘images as big data’ --- and the secondary
technologies of sensor fusion --- as unifying themes.
 Appropriate database technologies are key to the success of the POCs shown here, and Cassandra
proved to be the superior choice for a number of compelling reasons.
 While the idea of distributed image processing is by no means new --- a variety of
implementations on different hardware configurations including Symbolics LISP machines,
Connection Machine, and other parallel/multiprocessor hardware were implemented throughout
the 1990s
 These older distributed concepts have been replaced by Hadoop, Spark, Flink, and multicore/GPU
based applications at the low level, and, to a certain extent, by semantic web/ontology/reasoning
engines at the high level vision processing phase.
Conclusions and Future Work
 We intend to pursue our concept of innovation + software archaeology : best of
breed new implementation --- with inspiration from quality software systems of
past and present
 Applying it to new domain areas and POCs
 Particularly sensor fusion, ”smart sensors”, drone and mobile robotics applications
 We intend to pursue the research aspects of “image as big data”, and to
implement POCs, products, and patents accordingly
Thank You!
kkoitzsch@kildane.com
References and Citations
 [1] Forbus, Kenneth D., and de Kleer, Johan. Building Problem Solvers, Cambridge,
MA: MIT Press, 1993.
 [2] Balakumar, P. et al. VANTAGE: A Frame-Based Geometric Modeling System,
Programmer/Users Manual V1.0. Robotics Institute, Carnagie Mellon University,
1988.
 [3] Barry, Michele, Cyrluk, David, Kapur, Deepak, Mundy, Joseph, and Nguyen,
Van-Duc. A Multi-Level Geometric Reasoning System for Vision. In Geometric
Reasoning, Depak Kapur and Joseph Mundy, eds. Cambridge, MA: MIT Press,
1989.
 [4] Patent, OCR Enabled Management of Accounts Payable and/or Accounts
Receivable Auditing Data, Patent Number 20110213685
 [5] Hartley, Richard, and Zisserman, Andrew. Multiple View Geometry in Computer
Vision. Cambridge University Press, 2000.
 [6] Faugeras, Olivier. Three Dimensional Computer Vision: A Geometric Viewpoint.
Cambridge MA: MIT Press, 1993.

Weitere Àhnliche Inhalte

Was ist angesagt?

Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Databricks
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 
Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
Joydeep Sen Sarma
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
Amazon Web Services
 

Was ist angesagt? (20)

Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
AWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS ExperienceAWS Customer Presentation - VMIX AWS Experience
AWS Customer Presentation - VMIX AWS Experience
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 

Andere mochten auch

World’s Best Data Modeling Tool
World’s Best Data Modeling ToolWorld’s Best Data Modeling Tool
World’s Best Data Modeling Tool
Artem Chebotko
 

Andere mochten auch (6)

Extending Cassandra with Doradus OLAP for High Performance Analytics
Extending Cassandra with Doradus OLAP for High Performance AnalyticsExtending Cassandra with Doradus OLAP for High Performance Analytics
Extending Cassandra with Doradus OLAP for High Performance Analytics
 
Cassandra - how to fail?
Cassandra - how to fail?Cassandra - how to fail?
Cassandra - how to fail?
 
Cassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data ModelingCassandra Day Chicago 2015: Advanced Data Modeling
Cassandra Day Chicago 2015: Advanced Data Modeling
 
Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradus
 
World’s Best Data Modeling Tool
World’s Best Data Modeling ToolWorld’s Best Data Modeling Tool
World’s Best Data Modeling Tool
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 

Ähnlich wie Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 
Image Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack SwiftImage Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack Swift
Matthew Chang
 

Ähnlich wie Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016 (20)

research Paper face recognition attendance system
research Paper face recognition attendance systemresearch Paper face recognition attendance system
research Paper face recognition attendance system
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Scientific
Scientific Scientific
Scientific
 
Poster
PosterPoster
Poster
 
B040101007012
B040101007012B040101007012
B040101007012
 
A cloud service architecture for analyzing big monitoring data
A cloud service architecture for analyzing big monitoring dataA cloud service architecture for analyzing big monitoring data
A cloud service architecture for analyzing big monitoring data
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Satellite Volta
Satellite VoltaSatellite Volta
Satellite Volta
 
Image Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack SwiftImage Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack Swift
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloud
 
Cloud-Computing-Course-Description-and-Syllabus-Spring2020.pdf
Cloud-Computing-Course-Description-and-Syllabus-Spring2020.pdfCloud-Computing-Course-Description-and-Syllabus-Spring2020.pdf
Cloud-Computing-Course-Description-and-Syllabus-Spring2020.pdf
 

Mehr von DataStax

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandraℱ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandraℱ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandraℱ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandraℱ + What’s New in 4.0
Introduction to Apache Cassandraℱ + What’s New in 4.0Introduction to Apache Cassandraℱ + What’s New in 4.0
Introduction to Apache Cassandraℱ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

KĂŒrzlich hochgeladen

Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

KĂŒrzlich hochgeladen (20)

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ đŸ„ Women's Abortion Clinic In Pre...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] đŸ„ Women's Abortion Clinic in T...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016

  • 1. Cassandra-Based Image Processing: Two Case Studies KERRY KOITZSCH ‘IMAGE AS BIG DATA SYSTEMS LLC’ (IABDS.COM)
  • 2. Who Am I?  Image processing enthusiast  Big data technology enthusiast  Robotics enthusiast  NoSQL database enthusiast  Microscopy enthusiast  Software engineer  Author
  • 3. Overview of this Presentation  This quick overview of some of our ongoing projects describes why Cassandra is a key part of our ongoing research, development, and client support activities.  The presentation highlights to areas of research which involve Cassandra technologies in the “images as big data” arena: an automated microscope slide application prototype, and a stereo ”smart sensor” mobile robot prototype.  Both of the use cases describe rely heavily on Cassandra and related “helper libraries” to provide data storage capabilities for the two software prototypes.  Throughout the presentation we discuss how the flexibility, high performance, and ability to “play well with” other components makes Cassandra an essential part of the applications described here.
  • 4. Some Context About Our Use Cases  The automated microscopy case study is based upon our current POCs for a distributed microscope slide analytics application. In it, we use standard optical microscopes fitted with color digital cameras and an automatic slide-loading cartridge system to process “decks” of glass or plastic microscope slides, usually of human cell tissue of various kinds, the staining process depending on what the nature of the tissue is, and, in the case of hematology slides (human blood) no staining at all. Different resolutions of magnification, slide quality, and metadata must be processed.  The stereo vision robotics platform case study is based upon our current POCs for a stereo vision mobile robotics application. The application uses Hadoop, Spark, Flink, Solr, and a number of additional machine learning libraries to analyze stereo image pairs, build two- and three-dimensional models of the environment, and uses Bayesian sensor fusion to enhance the 3D models with additional environmental information.
  • 5. Our Objectives and How Cassandra Helps us Achieve Them  In our use cases, we have to integrate mechanical control systems, complex ‘image as big data’ analytics, and a variety of data formats and metadata in a common representation and with efficient implementation.  Cassandra and associated ”helper libraries” such as Spring Data Cassandra, Solandra, and Katta allow us to maintain a flexible technology stack while using off-the-shelf proven technologies which “play well” together.  Cassandra has well-defined parameters for optimization, scalability, and availability.  Cassandra enables us to meet our objective of maintaining flexibility while maintaining high performance capabilities with a changeable server topology.  Cassandra based technologies are flexible enough to accommodate a wide range of different application domains, including microscopy (medical imagery of all kinds) and mobile robotics applications (stereo vision ‘image as big data’ technology)
  • 6. Our little software secret! :D  We are not always software engineers
.  Sometimes, we are software ARCHAEOLOGISTS!  But what is software archaeology? And how does it assist us build modern high- performance distributed software systems?
  • 7. We are re-inventing software archaeology!
  • 8. Software Archaeology is Based Upon:  The realization that the “first wave of AI technologies” (up to the mid-1990s) introduced a host of valid ideas, concepts, strategies, and approaches which were not completely assimilated into succeeding waves of AI technology.  The realization that “looking back at our software roots” may be a good thing.  Identifying “best of breed” strategies of the past and re-thinking them, using up-to- date software stacks. Some examples of these technologies include pre-Hadoop distributed processing systems and algorithms, DeKleer’s implementations of “problem solvers” [1], geometric toolkits such as VANTAGE and Geometer [2],[3].  This re-thinking and re-implementing process has already been successfully done for POCs, products, research extensions and particularly for software patents [4]

  • 9. Migrating ‘First Wave AI’ Concepts  Quality image processing systems have been lost in the course of time (for example General Electric’s Geometer [4], the Vantage modeling system [5], and many others) 
 while the implementation may need updating many of the concepts may be --- and have been --- reimplemented using modern technologies (and in particular Cassandra-centric technologies)  Migrating our inspirations to modern implementation  *Inspiration* does not *equal* implementation: we may choose to re-implement in Clojure rather than Lisp, for example, and we use Cassandra-centric technology stacks for almost *all* our implementations

  • 10. Why Our Case Studies are Cassandra-Centric  Cassandra is a flexible, full-featured noSQL database  Integrates well with Hadoop, Spark, and Flink-based software components  A lot of development focus has already been done on different technology stacks, for example, the SMACK stack  Integrates well with the battery of Spring Framework components we use  Cassandra is scalable, available, and highly-tunable  “Plays well” with Lucene/Solr based technologies (via Solandra and other “glue-ware”)  It’s easy to integrate Cassandra into use cases and software prototypes, allowing a “rapid prototyping/development” style  Its also easy to use Cassandra in a plugin/module based architecture style  Cassandra ’plays well’ with a variety of machine-learning libraries  Cassandra is flexible enough for research, solid enough for products
  • 11. POJOs to Cassandra via Spring Data  Plain old java objects (POJOs) may be mapped to Cassandra using Spring Data Cassandra  We would also like to map ontology objects to Cassandra  We begin with fundamental feature objects of images: regions, contours, lines, junctions, and points, the kind of object instances generated from, for example, connected component algorithms, edge extraction operators, and the like  For our domain examples, the extracted image metadata and associated “fused” data from other sensors [ for example microscope slide data sets might have other sensor results associated with them, like spectrometry and chromatographic data ]
  • 12. Images ARE Big Data  Images have volume, variety, velocity, veracity  While the image volume may be relatively low, the complexity of the individual image objects may be extremely complex  There may be associated signals (such as LIDAR and geolocation data)  Images may be multispectral and multiresolution in nature  Images may have significant associated metadata (as with medical imagery)  Images may play only one part in a larger “big data enterprise”
  • 14. Case Study 1: An Automated Microscopy System  We want to be able to analyze large numbers of microscope slides automatically.  The image analysis component may be complex and time consuming: we require a distributed solution  Images may require cleaning, rescaling, metadata extraction, and image format conversion  Analysis results are stored in Cassandra database with associated search mechanisms via Solandra and custom code.  The physical apparatus is a standard medical optical microscope with automated stage and lighting controls and an automated microscope slide cartridge system holding 25 microscope slides per deck.
  • 15. A View of the Physical Apparatus

  • 16. 
 And How We See the Problem...
  • 17. Samples of Input Data from a typical data set
  • 18. Cassandra-Centric Infrastructure  Cassandra’s flexibility allows us to adapt data models appropriately for new experiments and ideas.  We are also able to maintain consistent ontology development by progressively evolving our table definitions and POJOs.  We can develop our data models in separate Cassandra instances for purposes of experimentation.  Metadata can be easily expressed: this is particularly important for experimental medical data, as citations, references to other data sets, parameters, and associated sensor data may be present, as well as timestamps and audit trail / handler data if information comes from a doctors office, insurance company, or hospital.
  • 19. Architecture Diagram, Case Study I Canova is now DataVec
  • 21. Technology Stack, Case Study I  Mechanical controls are implemented in Java and C++: these control the microscope stage controls, lighting, positioning, cartridge control, and others  Timestamps are generated by the microscope controls and are associated with analysis records  Slide images may be analyzed one or more times using different data analysis pipelines  Slide images may be retained in the file system and subjected to feature extraction and analysis  We avoid saving images as blobs in Cassandra at this time  We can “ramp up” our complexity gradually as we evolve our software. For example, the next slide shows contour extraction from a microscope slide: the contour data may be stored directly into Cassandra.
  • 23. Conclusions, Case Study I We learned in this case study that Cassandra’s flexible data modeling and straightforward seamless integration with other ‘image as big data’ components, including machine learning libraries such as Mahout and deeplearning4j enabled us to build high performance slide analysis prototypes which were able to store complex microscopic slide information (derived from feature extraction) and metadata (associated with the image itself, such as citations, other sensor data, authorship and auditing/handler data, and the like)
  • 24. Case Study II: A Stereo Vision Robotics Platform Using a standard dual-image synchronized stereo camera on a mobile robotics platform, we are able to obtain timestamped, geolocated/microgrid located image pairs for analysis. Image pairs may be generated periodically to produce a set of sequential image pairs. These image pairs are subjected to feature extraction, model building, and other analytic techniques to produce scene models, which are associated with scene ontologies and scene model templates to incorporate within a navigation program. The navigation program and its data, along with internal sensors within the robot itself, allows the mobile robot to successfully navigate scenes and perform simple tasks. The navigation program is also able to perform qualitative navigation based on landmarks.
  • 25. Goals of the Stereo Vision Platform  Use a standard stereo image pair generating dual camera for experimentation with a mobile robot  We want to investigate two software problems: qualitative navigation and integrated sensor fusion with robotic control  We want to use inspiration from Poggio’s MIT Vision Machine in particular as a starting point for our software  However, we wish to use an integrated data pipeline design with “best of breed” third-party components, including Cassandra, Hadoop/Spark, and Solr.  We model much of our stereo three-dimensional modeling on Hartley and Zisserman [5] and Faugeras [6]
  • 26. Hardware of the Robotics Vision System
  • 28. Images from Our Stereo Camera
  • 31. Feature Extraction  We can use many of the standard feature extraction techniques in a distributed computing environment, including corner-finding, edges, junctions, t-junctions, and the like.  Connected component, canny edge extraction, deep learning, genetic algorithms are all appropriate feature analysis technologies.  Importing and exporting feature extraction data to and from Cassandra is easy and makes sustainable experimentation possible.  We can migrate from ontology to POJO to schema based design with ease as our software can handle each of these representations.
  • 32. Technology Stack, Stereo Vision Platform
  • 33. Data Flow in the Robotics System  Data flow in the robotics system is primarily mediated by Apache Kafka  It turns out much of the extraneous ‘image information’ may be ‘thrown away’ at least temporarily --- for purposes of specific data flows  Data flows may be mediated by Hadoop, Spark, or Flink based components  Some machine learning components used in the data pipeline processing are now technology agnostic : for example, Mahout supports Spark  All of the image processing components have been found to work extremely well and seamlessly with Cassandra
  • 34. Conclusions, Case Study II  Cassandra was a key component in our robotics software system  We were able to run a Cassandra “image database”, “control database”, and “environmental database” concurrently on three different servers, and coordinate the data effectively for sensor analysis, fusion, and control purposes.  We found we were able to do “rapid prototyping” of robotics software using Cassandra as our primary data store technology
  • 35. Some Observations about the Technologies  In spite of the fact that the two applications we discussed here were so different, we evolved both of the projects in parallel in order to leverage the common elements of the projects with a mind towards development an ‘image as big data’ toolkit, with which it would be possible to develop a wide range of domain applications, using the idea of ‘images as big data’ --- and the secondary technologies of sensor fusion --- as unifying themes.  Appropriate database technologies are key to the success of the POCs shown here, and Cassandra proved to be the superior choice for a number of compelling reasons.  While the idea of distributed image processing is by no means new --- a variety of implementations on different hardware configurations including Symbolics LISP machines, Connection Machine, and other parallel/multiprocessor hardware were implemented throughout the 1990s  These older distributed concepts have been replaced by Hadoop, Spark, Flink, and multicore/GPU based applications at the low level, and, to a certain extent, by semantic web/ontology/reasoning engines at the high level vision processing phase.
  • 36. Conclusions and Future Work  We intend to pursue our concept of innovation + software archaeology : best of breed new implementation --- with inspiration from quality software systems of past and present  Applying it to new domain areas and POCs  Particularly sensor fusion, ”smart sensors”, drone and mobile robotics applications  We intend to pursue the research aspects of “image as big data”, and to implement POCs, products, and patents accordingly
  • 38. References and Citations  [1] Forbus, Kenneth D., and de Kleer, Johan. Building Problem Solvers, Cambridge, MA: MIT Press, 1993.  [2] Balakumar, P. et al. VANTAGE: A Frame-Based Geometric Modeling System, Programmer/Users Manual V1.0. Robotics Institute, Carnagie Mellon University, 1988.  [3] Barry, Michele, Cyrluk, David, Kapur, Deepak, Mundy, Joseph, and Nguyen, Van-Duc. A Multi-Level Geometric Reasoning System for Vision. In Geometric Reasoning, Depak Kapur and Joseph Mundy, eds. Cambridge, MA: MIT Press, 1989.  [4] Patent, OCR Enabled Management of Accounts Payable and/or Accounts Receivable Auditing Data, Patent Number 20110213685  [5] Hartley, Richard, and Zisserman, Andrew. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000.  [6] Faugeras, Olivier. Three Dimensional Computer Vision: A Geometric Viewpoint. Cambridge MA: MIT Press, 1993.

Hinweis der Redaktion

  1. There have already been several “wins” with software archaeology,including migrating the first wave neural network technologies to the so-called “deep learning” paradigm. Some of this re-thinking included reimplementation using multicore, GPUs, and distributed technologies. And our own successful patents with OCR technologies.
  2. Visual Data Ontology Objects mapped to Cassandra
  3. We use the SMACK stack pretty consistently, using Spring components, especially Spring Data Spring Integration and Spring Batch as “glue-ware”
  4. TOPN type queries become very important if you have a large number of (similar) recognitions
  5. Processin g data and mapping it into a format that neural nets can understand: Canova/DataVec is for this purpose.
  6. Position, orientation, scale, major and minor axis can all be part of a Cassandra feature table and of course various shape features can be expressed as Java POJos or within an ontology.
  7. Collage of images taken by the stereo camera : trinocular image processing is also possible and has some advantages
  8. Linear feature extraction “old school”