Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Siddha Ganju. Deep learning on mobile

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 71 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (16)

Ähnlich wie Siddha Ganju. Deep learning on mobile (20)

Anzeige

Weitere von Lviv Startup Club (20)

Aktuellste (20)

Anzeige

Siddha Ganju. Deep learning on mobile

  1. 1. 1 Deep Learning On Mobile A PRACTITIONER’S GUIDE
  2. 2. 2
  3. 3. 3 Deep Learning On Mobile A PRACTITIONER’S GUIDE
  4. 4. @MeherKasam @AnirudhKoul @SiddhaGanju 4
  5. 5. Why Deep Learning on Mobile? 5 ReliabilityPrivacy Cost Latency
  6. 6. Latency Is Expensive! 6 100 ms 1% loss [Amazon 2008]
  7. 7. Latency Is Expensive! 7 >3 sec load time 53% bounce Mobile Site Visits [Google Research, Webpagetest.org]
  8. 8. 8
  9. 9. Power of 10 9 0.1s Seamless Uninterrupted flow of thought 1s 10s Limit of attention [Miller 1968; Card et al. 1991; Nielsen 1993]
  10. 10. 10 Efficient Mobile Inference Engine Efficient Model = DL App High Quality Dataset + Hardware + +
  11. 11. 11 How do I train my model?
  12. 12. 12 Learn to Play Melodica 3 Months
  13. 13. Already Play Piano? 13
  14. 14. 14 FINE TUNE your skills 3 months 1 week
  15. 15. Fine tuning 15 Assemble a dataset Find a pre- trained model Fine-tune a pre-trained model Run using existing frameworks “Don’t Be A Hero” — Andrej Karpathy
  16. 16. CustomVision.ai 16 Use Fatkun Browser Extension to download images from Search Engine, or use Bing Image Search API to programmatically download photos with proper rights
  17. 17. 17
  18. 18. 18
  19. 19. Demo 19
  20. 20. 20 How do I run my models?
  21. 21. 21 Core ML ML KitTF Lite
  22. 22. Apple Ecosystem 22 Metal 2014 BNNS + MPS 2016 Core ML 2017 Core ML 2 2018 Core ML 3 2019 § Tiny models (~ KB)! § 1-bit model quantization support § Batch API for improved performance § Conversion support for MXNet, ONNX § tf-coreml
  23. 23. Apple Ecosystem 23 Metal 2014 BNNS + MPS 2016 Core ML 2017 Core ML 2 2018 Core ML 3 2019 § On-device training § Personalization § Create ML UI
  24. 24. Core ML Benchmark 538 129 75 557 109 7877 44 3674 35 3071 33 2926 18 15 0 100 200 300 400 500 600 ResNet-50 MobileNet SqueezeNet EXECUTION TIME (MS) ON APPLE DEVICES iPhone 5s (2013) iPhone 6 (2014) iPhone 6s (2015) iPhone 7 (2016) iPhone X (2017) iPhone XS (2018) 24 https://heartbeat.fritz.ai/ios-12-core-ml-benchmarks- b7a79811aac1 GPUs became a thing here!
  25. 25. TensorFlow Ecosystem 25 TensorFlow 2015 TensorFlow Mobile 2016 TensorFlow Lite 2018 Smaller Faster Minimal dependencies Allows running custom operators
  26. 26. TensorFlow Lite is small 26 1.5MB TensorFlow Mobile 300KB Core Interpreter + Supported Operations
  27. 27. TensorFlow Lite is Fast 27 Takes advantage of on- device hardware acceleration FlatBuffers Reduces code footprint, memory usage Reduces CPU cycles on serialization and deserialization Improves startup time Pre-fused activations Combining batch normalization layer with previous convolution Static memory and static execution plan Decreases load time
  28. 28. TensorFlow Ecosystem 28 TensorFlow 2015 TensorFlow Mobile 2016 TensorFlow Lite 2018 Smaller Faster Minimal dependencies Allows running custom operators
  29. 29. TensorFlow Ecosystem 29 TensorFlow 2015 TensorFlow Mobile 2016 TensorFlow Lite 2018 $ tflite_convert --keras_model_file = keras_model.h5 --output_file=foo.tflite
  30. 30. TensorFlow Ecosystem 30 TensorFlow 2015 TensorFlow Mobile 2016 TensorFlow Lite 2018 Trained TensorFlow Model TF Lite Converter .tflite model Android App iOS App
  31. 31. ML Kit 31 Easy to use Abstraction over TensorFlow Lite Built-in APIs for image labeling, OCR, face detection, barcode scanning, landmark detection, smart reply Model management with Firebase A/B testing var vision = Vision.vision() let faceDetector = vision.faceDetector(options: options) let image = VisionImage(image: uiImage) faceDetector.process(visionImage) { // callback }
  32. 32. 32 How do I keep my IP safe?
  33. 33. Fritz Full fledged mobile lifecycle support Deployment, instrumentation, etc. from Python 33
  34. 34. 35 Does my model make me look fat? § Apple does not allow apps over 200 MB to be downloaded over cellular network. § Download on demand, and interpret on device instead.
  35. 35. 36 What effect does hardware have on performance?
  36. 36. Big Things Come In Small Packages 37
  37. 37. Effect of Hardware L-R: iPhone XS, iPhone X, iPhone 5 38 https://twitter.com/matthieurouif/status/1126575118812110854?s=11
  38. 38. TensorFlow Lite Benchmarks Alpha Lab releases Numericcal: http://alpha.lab.numericcal.com/
  39. 39. 40 TensorFlow Lite Benchmarks Crowdsourcing AI Benchmark App by Andrey Ignatov from ETH Zurich. http://ai-benchmark.com/
  40. 40. 41 Alchemy by Fritz https://alchemy.fritz.ai/ Python library to analyze and estimate mobile performance No need to deploy on mobile
  41. 41. 42 Which devices should I support? § To get 95% device coverage, support phones released in the last 4 years. § For unsupported phones, offer graceful degradation (lower frame rate, cloud inference, etc.)
  42. 42. 43 Could all of this result in heavy energy use?
  43. 43. Energy Considerations 44 § You don’t usually run AI models constantly; you run it for a few seconds. § With a modern flagship phone, running MobileNet at 30 FPS should burn battery in 2–3 hours. § Bigger question — do you really need to run it at 30 FPS? Could it be run at 1 FPS?
  44. 44. Energy reduction from 30 FPS to 1 FPS 45 iPad Pro 2017
  45. 45. 46 What exciting applications can I build?
  46. 46. Seeing AI Audible Barcode recognition Aim: Help blind users identify products using barcode Issue: Blind users don’t know where the barcode is Solution: Guide user in finding a barcode with audio cues 47
  47. 47. AR Hand Puppets Hart Woolery from 2020CV Object Detection (Hand) + Key Point Estimation 48 [https://twitter.com/2020cv_inc/status/1093219359676280832] AR Hand Puppets, Hart Woolery from 2020CV, Object Detection (Hand) + Key Point Estimation
  48. 48. Remove objects Brian Schulman, Adventurous Co. Object Segmentation + Image In Painting 50 https://twitter.com/smashfactory/status/1139461813710442496
  49. 49. Magic Sudoku App Edge Detection + Classification + AR Kit 51 https://twitter.com/braddwyer/status/910030265006923776
  50. 50. Snapchat 52 Face Swap GANs
  51. 51. 53 Can I make my model even more efficient?
  52. 52. How To Find Efficient Pre-Trained Models 54 Papers with Code https://paperswithcode.com/sota Model Zoo https://modelzoo.co
  53. 53. 55 What you can affordWhat you want
  54. 54. Model Pruning Aim: Remove all connections with absolute weights below a threshold 56 Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
  55. 55. Pruning in Keras model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) 57 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)), tf.keras.layers.Dropout(0.2), prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax)) ])
  56. 56. Model Quantization 58 Quantized to Percent size reduction (approx) Percent match with 32-bit results linear linear_lut kmeans_lut 16-bit 50% 100% 100% 100% 8-bit 75% 88.37% 80.62% 98.45% 4-bit 88% 0% 0% 81.4% 2-bit 94% 0% 0% 10.08% 1-bit 97% 0% 0% 7.75% 1 0.0039 0.00780 0.9921 0.9961 Quantized Value Intervals 0 … … 1 254 255
  57. 57. So many techniques — so little time! Channel pruning Model quantization ThiNet (Filter pruning) Weight sharing Automatic Mixed Precision Network distillation 59
  58. 58. Pocket Flow – 1 Line to Make a Model Efficient Tencent AI Labs created an Automatic Model Compression (AutoMC) framework 60
  59. 59. 61 Can I design a better architecture myself? Maybe? But AI can do it much better!
  60. 60. AutoML – Let AI Design an Efficient Arch 62 §Neural Architecture Search (NAS) — An automated approach for designing models using reinforcement learning while maximizing accuracy. §Hardware Aware NAS = Maximizes accuracy while minimizing run-time on device §Incorporates latency information into the reward objective function §Measure real-world inference latency by executing on a particular platform §1.5x faster than MobileNetV2 (MnasNet) §ResNet-50 accuracy with 19x less parameters §SSD300 mAP with 35x fewer FLOPs
  61. 61. 63 Evolution of Mobile NAS Methods Method Top-1 Acc (%) Pixel-1 Runtime Search Cost (GPU Hours) MobileNetV1 70.6 113 Manual MobileNetV2 72.0 75 Manual MnasNet 74.0 76 40,000 (4 years+) ProxylessNas-R 74.6 78 200 Single-Path NAS 74.9 79.5 3.75 hours
  62. 62. ProxylessNAS – Per Hardware Tuned CNNs 64 Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019
  63. 63. 65
  64. 64. On-Device Training in Core ML 67 let updateTask = try MLUpdateTask( forModelAt: modelUrl, trainingData: trainingData, configuration: configuration, completionHandler: { [weak self] self.model = context.model context.model.write(to: newModelUrl) }) § Core ML 3 introduced on device learning. § Never have to send training data to the server with the help of MLUpdateTask. § Schedule training when device is charging to save power.
  65. 65. 68 FEDERATED LEARNING!!! https://federated.withgoogle.com/
  66. 66. 69 TensorFlow Federated Train a global model using 1000s of devices without access to data Encryption + Secure Aggregation Protocol Can take a few days to wait for aggregations to build up https://github.com/tensorflow/federated
  67. 67. 70 1. Collect Data 2. Label Data 3. Train Model 4. Convert Model 5. Optimize Performance 6. Deploy 7. Monitor Mobile AI Development Lifecycle
  68. 68. What we learned today 71 § Why deep learning on mobile? § Building a model § Running a model § Hardware factors § Benchmarking § State-of-the-art applications § Making a model more efficient § Federated learning
  69. 69. How do I access the slides instantly? http://PracticalDeepLearning.ai @PracticalDLBook Icons credit - Orion Icon Library - https://orioniconlibrary.com
  70. 70. 73 @MeherKasam @AnirudhKoul @SiddhaGanju Releases October 2019

×