Invezz.com - Grow your wealth with trading signals
Machine Learning Use Case - Agriculture
1. Proposal for a Machine Learning Application
Use Case: Classify Agricultural fields based on crops from images
Infrastructure of choice: Amazon Web Services
Training Data Acquisition:
For an effective machine learning model, we need a significant number of images of different fields,
growing various variety of crops. The best cost-effective way would be to build a web crawler
application in Python, which searches for images on google, or image websites and downloads images
based on individual classes of crops that we have identified. The problem with this is web crawled
images are often inaccurate and from a sample of what I have seen for agricultural fields most of the
images are difficult to tell, what crops they are growing. Hence there can be two possible work-arounds
– buy labelled dataset from 3rd
party – image brokers or universities or crawl and collect the image or
use services like AWS Sagemaker Groundtruth to label them. AWS Groundtruth provides human labelled
image tagging service at $0.012 per image per labeler. So, if we roughly gather 50000 images, the cost to
label them would be = $600
The labelled images are stored in an AWS S3 bucket which is a file-based object store provided by S3.
The web crawler can be hosted on an AWS EC2 instance.
Model Development:
Now that we have the labelled image dataset in S3 bucket, we can use AWS SageMaker Notebook to
train our models. Now for Image classification we should be using a Deep Learning with Convolutional
Neural Network. There are standard Sagemaker Image Classification algorithms (RestNet) which can be
used in our case. If pre-built algorithms are used, Sagemaker will fetch pre-built Image Classifier docker
container from EC2 Container Registry from the particular region. We can set the training instance
configurations, the number of instances – single or distributed. We can tune the model as required by
setting the model hyperparameters. Once satisfactory model is produced, we can validate the model
with a small number of validation images, separately kept aside to check the validity.
Instead of using standard AWS Image classification algorithms, we can use our own models, separately
creating docker images and uploading them to EC2 Container Registry, and use them in SageMaker.
If we use a GPU Instance for example - p2.xlarge costs around $1.26 /Hour
Assuming an approximate speed of 20 images/second ~ 45 min for each training run ~ $1
Model Deployment in Production:
Once the model is ready, the Sagemaker API for deployment can be called within the development
notebook, mentioning the number of instances, the type of instances and along with that a Sagemaker
HTTPS endpoint is also created. This endpoint can be invoked to use the model for prediction purposes.
In our case the model deployment will be for On-demand Real time inference. Taking two moderately
powerful instances m5.2xlarge will cost
2 x $0.538/Hr x 24 Hrs x 30 Days ~ $775/ month
Frontend – Serving Prediction to Client
Now when our HTTPS endpoint from Sagemaker is ready, we can invoke the same from a web
application and the users can interact with the model. For doing the same we will develop our data
transformations and code for invocation in a local virtual environment. Once completed this can be
deployed in AWS Lambda using command line tool Zappa. This will create an API Gateway Endpoint
2. attached to the Lambda Function. This API Endpoint Gateway can be used to POST data from User Web
or Mobile application. Once the request is posted with input data, the lambda function will be triggered
and SageMaker endpoint invoked. Sagemaker model will return the classification data based on the
pixels passed.
Web Browser/
Mobile Application API Gateway Lambda Sagemaker Endpoint
Now since we are dealing with image here, there can be multiple options where to convert the image
data into vectorized format. This can be done locally in the client application or the image/images can
be uploaded in a S3 bucket itself, and at a later stage another lambda function could be invoked to do
the transformation. Anothet case could be we can build such a Inference Docker image, where we just
need to pass the image URL and it will return the classification.
Costwise AWS Lambda is really economical : 1 million free requests a month , $0.20 per 1 million
requests thereafter which is quite negligible.
Similar is the case witl API Gateway, 1 million API calls free, negligible rate thereafter.
Data Curation:
Now that our model is running in an interactive mode, we have to make sure that we are not incurring
extra cost from the 50,000 images in training dataset. S3 is in any case quite in-expensive storage,
assuming 1MB as an avergage image size, 50GB of data should’nt incurr more than $2 a month.
However, if there are more images and the storage cost increases, images can be archived from AWS S3
High Availability storage to AWS Glacier, using the Lifecycle settings. For example we can configure, after
7 days move all the images into archive.
3. Thus to summarize lets look at the model on a whole:
Lambda Invocation
Web Browser/ API Gateway
Mobile Application
Temporary S3 Bucket
Web Crawler
Few Additional Points :
1. Multiple SageMaker Instances can be configured to feed to a single HTTPS endpoint. This can be
used as a Load Balancer to divert network traffic in an uniform way
2. What have been described here is a simple deployment. Often Machine Learning models get
updated and there can be requirements to do a trial run or a A/B testing or decomission the old
model and replace this with new one. The best way to do this is configuring the Sagemaker
Endpoint to shift partial traffic on the new model deployed in new instance. Once the trials are
successful, full traffic can be diverted, or distributed accordingly.
Conclusion: The proposed architecture uses AWS Services for convenience, cost-effectiveness and high
availability. This minimizes manual intervention and maintenance of infrastructure. There are several
alternatives to this approach. This would largely depend on complexity of requirements, compliance and
budget.
A
AWS Ground Truth