Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Improving computer vision models at scale presentation

12 Aufrufe

Veröffentlicht am

Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. When testing data is present at the petabyte scale, the ability to seamlessly access all the images that have been assigned specific labels poses a technical challenge by itself.

Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Images and labels are stored in HBase. The model is encapsulated in a PySpark program, while the images are indexed with Solr and can be accessed from a Hue dashboard.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Improving computer vision models at scale presentation

  1. 1. 1© Cloudera, Inc. All rights reserved. Improving computer vision models at scale Jan Kunigk | Principal Solutions Architect Dr. Mirko Kämpf | Senior Solutions Architect Marton Balassi | Solutions Architect
  2. 2. 2© Cloudera, Inc. All rights reserved. The slide deck is an updated version of our talk @Strata2018 in London
  3. 3. 3© Cloudera, Inc. All rights reserved. Motivation
  4. 4. 4© Cloudera, Inc. All rights reserved. Imagine the possibilities... • detect dangerous situations in traffic • detect a fire in a forest or a landfill via infrared drones early • detect extremely hard to find tumors • detect combatants in satellite data • detect violence in subway station • detect broken parts in a manufacturing line ... all that @ scale!
  5. 5. 5© Cloudera, Inc. All rights reserved. Requirements • Fast random access to images • Free text search for labels • Visual user interface • Execute existing Python and Scala deep learning pipelines at scale • Automatic indexing of labels • Easy model comparison • Search for complex scenarios
  6. 6. 6© Cloudera, Inc. All rights reserved. Building blocks of our solution • Fast random access of images • HBase is used for storing both the images and the corresponding labels • Free text search of labels • Solr indexes are used to query the data • Enrichment and augmentation with secondary data sources (e.g. GPS, CANbus) • Hive/Impala tables are used to store enrichment data • Visual interface • A Hue dashboard provides the UI • Execute existing Python and Scala deep learning pipelines at scale • (Py)Spark is used to scale out the computation • Automatic indexing of labels • The Lily indexer is used to automatically populate the Solr collection
  7. 7. 7© Cloudera, Inc. All rights reserved. Solution overview Main users: Data Scientist and Domain Experts
  8. 8. 8© Cloudera, Inc. All rights reserved. Data Engineering and Model Lifecycle
  9. 9. 9© Cloudera, Inc. All rights reserved. Classifying an image [1] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna: " Rethinking the Inception Architecture for Computer Vision” https://arxiv.org/abs/1512.00567 [2] Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio: " The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation”, https://arxiv.org/abs/1611.09326 http://mi.eng.cam.ac.uk/projects/segnet/demo.php#demo [3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár: "Focal Loss for Dense Object Detection”, https://arxiv.org/abs/1708.02002., https://github.com/fizyr/keras-retinanet [4] https://github.com/facebookresearch/Detectron • Object detection • InceptionV3 [1] • Semantic Segmentation • SegNet [2] • Bounding Boxes • RetinaNet [3] • Object masking • Detectron [4]
  10. 10. 10© Cloudera, Inc. All rights reserved. Typical use case for model improvement Consider using a tool that visualizes layer activation, like https://github.com/raghakot/keras-vis.
  11. 11. 11© Cloudera, Inc. All rights reserved. • Models are trained on GPUs • Since 1.1 Cloudera Data Science Workbench natively supports GPUs • Your Ops people will appreciate it Data Science workflow with CDSW https://blog.cloudera.com/blog/2017/07/prophecy-fulfilled-keras-and-cloudera-data-science-workbench/ https://blog.cloudera.com/blog/2017/09/customizing-docker-images-in-cloudera-data-science-workbench/
  12. 12. 12© Cloudera, Inc. All rights reserved. Access Patterns & Model Lifecycle with CDSW CDH CDSW IT-Crowd Data Crowd GPU GPU GPU GPU img img img img img img img img tags tr if Search for images by properties and context Access Compliance and Governance data Search for time series patterns Cloudera Data Science Workbench Algorithm prototyping & model training HUE Query Editor / Dashboards Ad-hoc analysis using SQL and Search img img img imgimg img img img img img img img img img img img img img img img img img img imgimg img img img img img img img img img img img img img img img HUE Query Editor / Dashboards Data engineering & curation Solr Augmentation HBase img img img img img img img img img img img img awesome .py Tenzing if if GPU Depends on YARN resource types
  13. 13. 13© Cloudera, Inc. All rights reserved. Full Data Pipeline ffmpeg img img img img 9.2 9.1 lon timestamp 20180428152138 area | tunnel | bridgel | Geodata Stadium | no | no … 9.0 lat 48.1 48.3 48.5 NMEA AVRO B14 | yes | no20180428152330 20180428152831 B14 | no | yes gps2avro pynmea2 overpy Image Data CF:tagsCF:img_all jpg imagenet img stop-sign person img truck … retinanet tiny-yolo … boatperson person bicycle person traffic light boat img img img img CF:geo 20180428152330 20180428152330 Key: 30 30 30 30 Key 20180428152330 20180428152330 HBaseStorageHandlerNMEA OpenStreetmap / overpass API 30 30 31 31 hbase-indexer-mr-job.jar Lily NMEA Tenzing if
  14. 14. 14© Cloudera, Inc. All rights reserved. Time domains / resolution SELECT rnk,system_id,speed,time_gap FROM (SELECT row_number() over ( partition by system_id order by time_gap asc) 'rnk', system_id,speed,time_gap FROM ( SELECT img_domain.system_id as system_id, speed_domain.speed as speed, abs(img_domain.time_s - speed_domain.time_s) as time_gap FROM img_domain JOIN speed_domain ON img_domain.system_id = can.car_id ) t ) t WHERE rnk = 1 SELECT system_id,speed FROM img_domain.time_s, speed_domain.time_s WHERE WITHIN (img_domain.time_s, speed_domain.time_s, 1.5s) GPS Image Speed SQL: A cool language that supports range queries, not yet existing:
  15. 15. 15© Cloudera, Inc. All rights reserved. PySpark implementation (Keras) def predict(iterator): model = InceptionV3(weights=None) model.load_weights(FLAGS.weights_file) return [(x[0], run_inference_on_image(model, x[1])) for x in iterator] def main(): sc = SparkContext(conf=conf) hbase_io = common.HbaseIO(FLAGS) out_format = common.OutputFormatter(FLAGS, MODEL_NAME) hbase_images = hbase_io.load_from_hbase(sc) classified_images = hbase_images.mapPartitions(predict) .map(out_format.imagenet_format) classified_images.foreachPartition(hbase_io.put_to_hbase)
  16. 16. 16© Cloudera, Inc. All rights reserved. • The Python environment with tensorflow is distributed to the executors at runtime, it is not preinstalled on the nodes • The individual models only need to implement the following functions: • prepare • predict • output_format • Conceptually this is very close to the scikit-learn or Spark ML Pipelines approach • Deep Learning Pipelines can be a way to streamline the implementation PySpark implementation (Keras) https://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-scale-deep-learning.html
  17. 17. 17© Cloudera, Inc. All rights reserved. Spark implementation (dl4j / Scala) def predict(pairs: Iterator[(String, (INDArray, Int, Int))]) = { val model = ModelSerializer.restoreComputationGraph(modelLoc) pairs.map{ case (name, image) => (name, run_inference_on_image(model, image) } } def main(args: Array[String]) = { val sc = SparkContext(conf=conf) val hbase_io = common.HbaseIO(args) val out_format = common.OutputFormatter(args) val hbase_images = hbase_io.load_from_hbase(sc) val classified_images = hbase_images.mapPartitions(predict) .map(out_format.imagenet_format) val classified_images.foreachPartition(hbase_io.put_to_hbase) }
  18. 18. 18© Cloudera, Inc. All rights reserved. Demo
  19. 19. 19© Cloudera, Inc. All rights reserved. Moving further Label Quality Inspection
  20. 20. 20© Cloudera, Inc. All rights reserved. Visual label inspection via HUE: Label quality & Relations between objects Index contains: - object relations - predicted labels - object statistics Rendered BoundingBoxes are key to visual inspection. >>> easy comparison of multiple model classes (A,B) or model versions (C1, C2). Model BModel A
  21. 21. 21© Cloudera, Inc. All rights reserved. From labels to meaning ... Person in front of car ... bounding boxes overlap, ... property of the object-pair becomes a fact. Facts, are added into a multivalue field of a document in a Solr index. Query: q=overlap_category:car-person OR overlap_category:person-car
  22. 22. 22© Cloudera, Inc. All rights reserved. Moving further Semantic Search
  23. 23. 23© Cloudera, Inc. All rights reserved. How to identify relations? 1. Build ontology for traffic scenes or any domain you work on. 2. Map statistical object properties to RDF graph using heuristics 3. Combine scene-graphs in a triple store 4. Enable search with SPARQL
  24. 24. 24© Cloudera, Inc. All rights reserved. How to identify semantic relations? • Build Ontology for Traffic Scenes • Map statistical object properties to RDF graph using heuristics • Combine scene-graphs (triple store) • Search with SPARQL • Object detection • Deep neural networks • Bounding Box analysis • Rendering of BBs with labels • Geometry based heuristics • Overlap ratios • Orientation analysis • SOLR Search by • Label • Relation
  25. 25. 25© Cloudera, Inc. All rights reserved. Why search on a knowledge base? • This approach allows to search easily for complex scenarios: THINGS (pedestrian, stop sign, hot spot, gun, …) RELATIONSHIPS (close by, in front of, above, underneath, ...) ACTIVITIES (danger, theft, evasion, escape) SITUATIONS (combinations of THINGS, RELATIONS, and ACTIVITIES) • ... very fast, even in huge image collections. • Knowledge graphs remove the need to know Solr schema details.
  26. 26. 26© Cloudera, Inc. All rights reserved. Implementation of complementary search channels ... Triplification using local graphs
  27. 27. 27© Cloudera, Inc. All rights reserved. Summary What we can do with images today: • Search for combinations and amounts of objects at scale: „at least 5 cars and 2 trucks” • Search for basic relationship among those things: „In front of”, ”In a line” • Enrich the search experience with other domains: geospatial, sensor data, etc. This helps to: • Gain better understanding of the quality of our CV models/apps • Discover corner cases, improve model-lifecycle and build new (data) products faster In the future: • Focus on semantic search, advanced visualization and improved model lifecycles
  28. 28. 28© Cloudera, Inc. All rights reserved. Thank you jk@cloudera.com mirko@cloudera.com mbalassi@cloudera.com
  29. 29. 29© Cloudera, Inc. All rights reserved. Appendix: Getting data There are many great datasets out there for research purposes: • Cityscapes, https://www.cityscapes-dataset.com/ • COCO, http://cocodataset.org/#home • YouTube-8M, https://research.google.com/youtube8m/

×