Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016

Cassandra-Based Image
Processing: Two Case
Studies
KERRY KOITZSCH
‘IMAGE AS BIG DATA SYSTEMS LLC’ (IABDS.COM)

Who Am I?
 Image processing enthusiast
 Big data technology enthusiast
 Robotics enthusiast
 NoSQL database enthusiast
 Microscopy enthusiast
 Software engineer
 Author

Overview of this Presentation
 This quick overview of some of our ongoing projects describes why Cassandra is a key
part of our ongoing research, development, and client support activities.
 The presentation highlights to areas of research which involve Cassandra technologies
in the “images as big data” arena: an automated microscope slide application
prototype, and a stereo ”smart sensor” mobile robot prototype.
 Both of the use cases describe rely heavily on Cassandra and related “helper libraries”
to provide data storage capabilities for the two software prototypes.
 Throughout the presentation we discuss how the flexibility, high performance, and
ability to “play well with” other components makes Cassandra an essential part of the
applications described here.

Some Context About Our Use Cases
 The automated microscopy case study is based upon our current POCs for a
distributed microscope slide analytics application. In it, we use standard optical
microscopes fitted with color digital cameras and an automatic slide-loading cartridge
system to process “decks” of glass or plastic microscope slides, usually of human cell
tissue of various kinds, the staining process depending on what the nature of the tissue
is, and, in the case of hematology slides (human blood) no staining at all. Different
resolutions of magnification, slide quality, and metadata must be processed.
 The stereo vision robotics platform case study is based upon our current POCs for a
stereo vision mobile robotics application. The application uses Hadoop, Spark, Flink,
Solr, and a number of additional machine learning libraries to analyze stereo image
pairs, build two- and three-dimensional models of the environment, and uses Bayesian
sensor fusion to enhance the 3D models with additional environmental information.

Our Objectives and How Cassandra Helps us Achieve Them
 In our use cases, we have to integrate mechanical control systems, complex ‘image as big data’
analytics, and a variety of data formats and metadata in a common representation and with
efficient implementation.
 Cassandra and associated ”helper libraries” such as Spring Data Cassandra, Solandra, and Katta
allow us to maintain a flexible technology stack while using off-the-shelf proven technologies
which “play well” together.
 Cassandra has well-defined parameters for optimization, scalability, and availability.
 Cassandra enables us to meet our objective of maintaining flexibility while maintaining high
performance capabilities with a changeable server topology.
 Cassandra based technologies are flexible enough to accommodate a wide range of different
application domains, including microscopy (medical imagery of all kinds) and mobile robotics
applications (stereo vision ‘image as big data’ technology)

Our little software secret! :D
 We are not always software engineers….
 Sometimes, we are software ARCHAEOLOGISTS!
 But what is software archaeology? And how does it assist us build modern high-
performance distributed software systems?

We are re-inventing software archaeology!

Software Archaeology is Based Upon:
 The realization that the “first wave of AI technologies” (up to the mid-1990s)
introduced a host of valid ideas, concepts, strategies, and approaches which were not
completely assimilated into succeeding waves of AI technology.
 The realization that “looking back at our software roots” may be a good thing.
 Identifying “best of breed” strategies of the past and re-thinking them, using up-to-
date software stacks. Some examples of these technologies include pre-Hadoop
distributed processing systems and algorithms, DeKleer’s implementations of “problem
solvers” [1], geometric toolkits such as VANTAGE and Geometer [2],[3].
 This re-thinking and re-implementing process has already been successfully done for
POCs, products, research extensions and particularly for software patents [4]…

Migrating ‘First Wave AI’ Concepts
 Quality image processing systems have been lost in the course of time (for
example General Electric’s Geometer [4], the Vantage modeling system [5], and
many others) … while the implementation may need updating many of the
concepts may be --- and have been --- reimplemented using modern
technologies (and in particular Cassandra-centric technologies)
 Migrating our inspirations to modern implementation
 *Inspiration* does not *equal* implementation: we may choose to re-implement in
Clojure rather than Lisp, for example, and we use Cassandra-centric technology
stacks for almost *all* our implementations…

Why Our Case Studies are Cassandra-Centric
 Cassandra is a flexible, full-featured noSQL database
 Integrates well with Hadoop, Spark, and Flink-based software components
 A lot of development focus has already been done on different technology stacks, for
example, the SMACK stack
 Integrates well with the battery of Spring Framework components we use
 Cassandra is scalable, available, and highly-tunable
 “Plays well” with Lucene/Solr based technologies (via Solandra and other “glue-ware”)
 It’s easy to integrate Cassandra into use cases and software prototypes, allowing a
“rapid prototyping/development” style
 Its also easy to use Cassandra in a plugin/module based architecture style
 Cassandra ’plays well’ with a variety of machine-learning libraries
 Cassandra is flexible enough for research, solid enough for products

POJOs to Cassandra via Spring Data
 Plain old java objects (POJOs) may be mapped to Cassandra using Spring Data
Cassandra
 We would also like to map ontology objects to Cassandra
 We begin with fundamental feature objects of images: regions, contours, lines,
junctions, and points, the kind of object instances generated from, for example,
connected component algorithms, edge extraction operators, and the like
 For our domain examples, the extracted image metadata and associated “fused”
data from other sensors [ for example microscope slide data sets might have other
sensor results associated with them, like spectrometry and chromatographic data ]

Images ARE Big Data
 Images have volume, variety, velocity, veracity
 While the image volume may be relatively low, the complexity of the individual
image objects may be extremely complex
 There may be associated signals (such as LIDAR and geolocation data)
 Images may be multispectral and multiresolution in nature
 Images may have significant associated metadata (as with medical imagery)
 Images may play only one part in a larger “big data enterprise”

Considerations About the SMACK Stack

Case Study 1: An Automated Microscopy
System
 We want to be able to analyze large numbers of microscope slides automatically.
 The image analysis component may be complex and time consuming: we require a
distributed solution
 Images may require cleaning, rescaling, metadata extraction, and image format
conversion
 Analysis results are stored in Cassandra database with associated search
mechanisms via Solandra and custom code.
 The physical apparatus is a standard medical optical microscope with automated
stage and lighting controls and an automated microscope slide cartridge system
holding 25 microscope slides per deck.

A View of the Physical Apparatus…

… And How We See the Problem...

Samples of Input Data from a typical data set

Cassandra-Centric Infrastructure
 Cassandra’s flexibility allows us to adapt data models appropriately for new
experiments and ideas.
 We are also able to maintain consistent ontology development by progressively
evolving our table definitions and POJOs.
 We can develop our data models in separate Cassandra instances for purposes of
experimentation.
 Metadata can be easily expressed: this is particularly important for experimental
medical data, as citations, references to other data sets, parameters, and
associated sensor data may be present, as well as timestamps and audit trail /
handler data if information comes from a doctors office, insurance company, or
hospital.

Architecture Diagram, Case Study I
Canova is now DataVec

Ellipse Feature Extraction Example

Technology Stack, Case Study I
 Mechanical controls are implemented in Java and C++: these control the microscope
stage controls, lighting, positioning, cartridge control, and others
 Timestamps are generated by the microscope controls and are associated with analysis
records
 Slide images may be analyzed one or more times using different data analysis pipelines
 Slide images may be retained in the file system and subjected to feature extraction
and analysis
 We avoid saving images as blobs in Cassandra at this time
 We can “ramp up” our complexity gradually as we evolve our software. For example,
the next slide shows contour extraction from a microscope slide: the contour data may
be stored directly into Cassandra.

Conclusions, Case Study I
We learned in this case study that Cassandra’s flexible data modeling and
straightforward seamless integration with other ‘image as big data’ components,
including machine learning libraries such as Mahout and deeplearning4j enabled us
to build high performance slide analysis prototypes which were able to store complex
microscopic slide information (derived from feature extraction) and metadata
(associated with the image itself, such as citations, other sensor data, authorship and
auditing/handler data, and the like)

Case Study II: A Stereo Vision Robotics
Platform
Using a standard dual-image synchronized stereo camera on a mobile robotics platform, we
are able to obtain timestamped, geolocated/microgrid located image pairs for analysis. Image
pairs may be generated periodically to produce a set of sequential image pairs.
These image pairs are subjected to feature extraction, model building, and other analytic
techniques to produce scene models, which are associated with scene ontologies and scene
model templates to incorporate within a navigation program.
The navigation program and its data, along with internal sensors within the robot itself, allows
the mobile robot to successfully navigate scenes and perform simple tasks.
The navigation program is also able to perform qualitative navigation based on landmarks.

Goals of the Stereo Vision Platform
 Use a standard stereo image pair generating dual camera for experimentation
with a mobile robot
 We want to investigate two software problems: qualitative navigation and
integrated sensor fusion with robotic control
 We want to use inspiration from Poggio’s MIT Vision Machine in particular as a
starting point for our software
 However, we wish to use an integrated data pipeline design with “best of breed”
third-party components, including Cassandra, Hadoop/Spark, and Solr.
 We model much of our stereo three-dimensional modeling on Hartley and
Zisserman [5] and Faugeras [6]

Hardware of the Robotics Vision System

Architecture Diagram: Simple Stereo Vision

Feature Extraction
 We can use many of the standard feature extraction techniques in a distributed
computing environment, including corner-finding, edges, junctions, t-junctions,
and the like.
 Connected component, canny edge extraction, deep learning, genetic algorithms
are all appropriate feature analysis technologies.
 Importing and exporting feature extraction data to and from Cassandra is easy
and makes sustainable experimentation possible.
 We can migrate from ontology to POJO to schema based design with ease as our
software can handle each of these representations.

Technology Stack, Stereo Vision Platform

Data Flow in the Robotics System
 Data flow in the robotics system is primarily mediated by Apache Kafka
 It turns out much of the extraneous ‘image information’ may be ‘thrown away’ at
least temporarily --- for purposes of specific data flows
 Data flows may be mediated by Hadoop, Spark, or Flink based components
 Some machine learning components used in the data pipeline processing are now
technology agnostic : for example, Mahout supports Spark
 All of the image processing components have been found to work extremely well
and seamlessly with Cassandra

Conclusions, Case Study II
 Cassandra was a key component in our robotics software system
 We were able to run a Cassandra “image database”, “control database”, and
“environmental database” concurrently on three different servers, and coordinate
the data effectively for sensor analysis, fusion, and control purposes.
 We found we were able to do “rapid prototyping” of robotics software using
Cassandra as our primary data store technology

Some Observations about the Technologies
 In spite of the fact that the two applications we discussed here were so different, we evolved both
of the projects in parallel in order to leverage the common elements of the projects with a mind
towards development an ‘image as big data’ toolkit, with which it would be possible to develop a
wide range of domain applications, using the idea of ‘images as big data’ --- and the secondary
technologies of sensor fusion --- as unifying themes.
 Appropriate database technologies are key to the success of the POCs shown here, and Cassandra
proved to be the superior choice for a number of compelling reasons.
 While the idea of distributed image processing is by no means new --- a variety of
implementations on different hardware configurations including Symbolics LISP machines,
Connection Machine, and other parallel/multiprocessor hardware were implemented throughout
the 1990s
 These older distributed concepts have been replaced by Hadoop, Spark, Flink, and multicore/GPU
based applications at the low level, and, to a certain extent, by semantic web/ontology/reasoning
engines at the high level vision processing phase.

Conclusions and Future Work
 We intend to pursue our concept of innovation + software archaeology : best of
breed new implementation --- with inspiration from quality software systems of
past and present
 Applying it to new domain areas and POCs
 Particularly sensor fusion, ”smart sensors”, drone and mobile robotics applications
 We intend to pursue the research aspects of “image as big data”, and to
implement POCs, products, and patents accordingly

Thank You!
kkoitzsch@kildane.com

References and Citations
 [1] Forbus, Kenneth D., and de Kleer, Johan. Building Problem Solvers, Cambridge,
MA: MIT Press, 1993.
 [2] Balakumar, P. et al. VANTAGE: A Frame-Based Geometric Modeling System,
Programmer/Users Manual V1.0. Robotics Institute, Carnagie Mellon University,
1988.
 [3] Barry, Michele, Cyrluk, David, Kapur, Deepak, Mundy, Joseph, and Nguyen,
Van-Duc. A Multi-Level Geometric Reasoning System for Vision. In Geometric
Reasoning, Depak Kapur and Joseph Mundy, eds. Cambridge, MA: MIT Press,
1989.
 [4] Patent, OCR Enabled Management of Accounts Payable and/or Accounts
Receivable Auditing Data, Patent Number 20110213685
 [5] Hartley, Richard, and Zisserman, Andrew. Multiple View Geometry in Computer
Vision. Cambridge University Press, 2000.
 [6] Faugeras, Olivier. Three Dimensional Computer Vision: A Geometric Viewpoint.
Cambridge MA: MIT Press, 1993.

Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016

Ähnlich wie Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016 (20)

Mehr von DataStax

Mehr von DataStax (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) | C* Summit 2016

Hinweis der Redaktion