This document discusses developing robust training systems for machine learning models. It covers topics like data handling, validation, visualization, training, evaluation, debugging, and libraries to help with experiments. The goal is to create standardized, reproducible systems for developing and testing models. Sections provide guidance on storing data in a unified format, validating data quality, visualizing results, configuring training processes, evaluating performance, and open source tools to aid experimentation.
2. About me
Illarion Khlestov, Senior Research Engineer
GitHub: https://github.com/ikhlestov
Blog: https://medium.com/@illarionkhlestov
Facebook: https://www.facebook.com/i.khlestov
Welcome everyone, My name is Larry. Few words about me. I'm working as a Seniour Research Engineer. I'm enaged in developing of various computer vision algorithms for videos and images processing. Mainly I make cloud based solutions. Sometimes I post my ideas or notes to Medium blog. Also there are a few interesting projects on my gitHub. So - subscribe.
Moreover it's always cool to work with clever and talanted people. That's why if you want to join the team - feel free to have a talk after the lecture.
But let’s turn back to presentation. As it was described in the title we are going to talk about developing training systems. You may ask - what is the reason? From my point of view the main motivation is that there are a lot of tutorials how to build and train simple networks. Many sources try to give you a first impression what the Machine learning is. Mainly they are like: just take already existing code, try to find pretrained weights, tune them a little bit. And you are ready for production. Nothing complicated there.
Unfortunatelly, there is not a lot info about how to build large systems that can be updated, reused and trained with upcoming data. On the other hand such systems should be easy to use by many people if you have a team with more than one person.
Today I’ll try to show that such projects shouldn’t be very complicated and with some simple architecture decisions you can speedup and improve your development process.
And of course, you may not build such systems by your own but at least you will know what should you are looking for.
During the lecture I’ll cover three main chapters, such as
* How to store, handle and prepare data for training
* How to manage training itself
* How we should analyze results after training
Additionally in the end I’ll discuss some libraries and packages ready to use as a components.
Lecture will take not very long. I’ll be happy to answer all questions in the end. And I'm going to provide link to slides in the end, that's why there is no reason to photo all links or library names. So let’s start.