Precision Agriculture Data Ingestion Using Kafka

PRECISION AGRICULTURE SUPPORT
USING SCALA/SPARK
Project Report
SRIRAM RV
SPRING SEMESTER
ADVISOR: PROFESSOR BRAD RUBIN

2
Table of Contents
1.0 PURPOSE OF PROJECT.......................................................................................................................4
2.0 PROJECT DESCRIPTION.....................................................................................................................4
2.0 Why Agriculture Data.........................................................................................................................5
3.0 DATASET ..............................................................................................................................................5
3.1 Data Source:........................................................................................................................................5
3.2 Details about Dataset: .........................................................................................................................5
3.3 Sample Data........................................................................................................................................5
Weather data .........................................................................................................................................5
Moisture Data........................................................................................................................................5
Image Data............................................................................................................................................6
3.4 Schema................................................................................................................................................6
Weather data .........................................................................................................................................6
Moisture Data........................................................................................................................................6
3.5 Data Description: ................................................................................................................................7
Weather Data: .......................................................................................................................................7
Moisture Data........................................................................................................................................7
4.0 PROJECT IMPLEMENTATION...........................................................................................................8
4.1 Data Ingestion using Kafka.................................................................................................................8
4.2 Kafka producer....................................................................................................................................8
4.4 Kafka Broker.......................................................................................................................................9
4.5 Kafka Consumer ...............................................................................................................................10
5.0 ADDITIONAL TOOLS........................................................................................................................10
5.1 Maven ...............................................................................................................................................10
5.2 Scala Build tool.................................................................................................................................11
5.3 Git .....................................................................................................................................................11
6.0 OUTPUT INTERPRETATION............................................................................................................12
7.0 IMPROVING THE KAFKA ARCHITECTURE.................................................................................12
7.1. Making kafka architecture more robust ...........................................................................................12
7.2. Having dedicated Kafka Broker to improve performance ...............................................................13

3
8.0 FUTURE RESEARCH.........................................................................................................................13
9.0. CONCLUSION....................................................................................................................................13
BIBLIOGRAPHY.......................................................................................................................................14

4
1.0 PURPOSE OF PROJECT
Big data tools over last few years has been focused on both structured and unstructured data.
However, image processing is one area where it needs more of attention and it has been my
area of interest too. With the help of this project, I will get an opportunity to experiment with
streaming images and weather data captured in the UST greenhouse, and also get a feel for
image processing with Scala/Spark on Hadoop more generally.
I will gain experience in technologies such as Scala, Spark, Spark streaming, and image
processing in the domain of food technology that will give me skills that I cannot otherwise
obtain in the GPS curriculum.
2.0 PROJECT DESCRIPTION
The purpose of the project is to stream real-time weather data captured by both direct sensors
and RGB images captured by the drones to perform image processing and weather data
analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image
processing and streaming with Spark are knew technologies to GPS, part of the project will
focus on experimenting with different tools and find out more reliable way of storing images
and streamed data in HDFS.
The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST
School of Engineering. The greenhouse has a local weather station that will be broadcasting
weather data such as temperature, humidity, light intensity, barometric pressure, position
(latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at
10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field
use which is useful for both analysis of plant health and creating a model for each of the six
plant species that will be grown. In addition, high resolution images will be taken of the plants
in the visible and near IR regions of the light spectrum. The periodicity of these images will be
every couple of days.

5
2.0 Why Agriculture Data
With the help of agricultural data, I will get an opportunity to experiment withstreaming images
and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed
and gives me experience on working with data from food technology.
3.0 DATASET
3.1 Data Source:
The data source used for this project is the live streaming of weather and moisture data captured using
sensors through Arduino chip and Streamed using Kafka producer.
3.2 Details about Dataset:
 The sensor data were captured for every second.
 Total number of days of weather data stored in HDFS is 90 days.
 Total number of days of moisture data stored in HDFS is 85days.
 Total number of days of image data stored is 90 days.
3.3 Sample Data
Weather data
Fig 1: Sensor Weather data from Arduino
Moisture Data

6
Fig 2: Sensor data from Arduino
Image Data
Image data was captured every alternative day over a period of 90 days .
Fig 3: Images from the greenhouse
3.4 Schema
Weather data
Date Time Wind
direction
Wind
Speed
Humidity Temperature Rain Pressure Battery Light
Level
Table 1 : Weather Data Schema
Moisture Data
Date Time Moist
2
Moist
6
Moist
8
Moist
11
Moist
10
Moist
1
Moist
9
Moist
7
Moist
5
Temp Par
Table 2 : Weather Data Schema

7
3.5 Data Description:
Weather Data:
Date & time : Timestamp of the recording
Wind Direction: Direction of wind
Wind Speed: Speed of wind
Wind Gust: Gust of wind
Humidity: Percentage of water in air
Temperature: Temperature
Rain: Rain percentage
Pressure: Air pressure
Battery: Battery of Arduino
Light: Light exposure
Moisture Data
Moist 2: Moisture of plot 2
Temp: Soil temperature
PAR: Moisture metrics

8
4.0 PROJECT IMPLEMENTATION
4.1 Data Ingestion using Kafka
Kafka is the distributed messaging system which allows to transmit moisture and weather data from
Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and
consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker.
The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams
the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the
addresses of the broker thought zookeeper.
Fig 4: Kafka Architectural Diagram
4.2 Kafka producer
Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to
their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka
brokers which is attached to the header of packet while sending the data.
The weather data, moisture data and image data differentiated using different topics such as “weather-data”,
”moisture-data” and “image-data”.
Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap
server is the broker ID list of the Kafka broker.

9
Fig 5 : Configuring the kafka producer
Below is the snippet that is used to create message object which contains topic and messages to be sent to
the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages,
sends it to the broker.
Fig 6: Sending the message to kafka broker
4.4 Kafka Broker
Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling
hundreds and hundreds of read and write operation per second. It can elastically expand without downtime.
Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than
capability of single machine. The Kafka broker can be monitored using Zookeeper using port number
2181.By default Kafka broker comes with retention period of 168 hours.

10
Fig 7 : Monitoring the messages using Zookeeper
4.5 Kafka Consumer
Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise
from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing.
Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that
we are interested to fetch from the broker.
Fig 8 : Configuring Kafka Consumer
5.0 ADDITIONAL TOOLS
5.1 Maven
Maven was used as the dependency management to bring in all the jar from the server to the local repository.
This dependency injection help to develop the code from the windows environment .Maven helped to
specify the version of spark and kafka that was used and all the jar files related that version of spark was
stored in the local repository.

11
Fig 9 : Dependency Injection
5.2 Scala Build tool
Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm
using winscp.
Fig 10 : SBT build
5.3 Git
Git is online code repository for storing all the code related to project. It offers all of the distributed
revision control and source code management (SCM). Git was used for precision agriculture project
repository to store the code online and share with team.
Below is the git link for the precision agriculture.
https://github.com/sri303030/Data-Ingestion-using-Kafka

12
6.0 OUTPUT INTERPRETATION
The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to
distinguish between weather data and moisture data.
Below is the output from the weather data folder
Fig 11: weather data folder
Below is output from the moisture data folder
Fig 12: Moisture Data Folder
7.0 IMPROVING THE KAFKA ARCHITECTURE
Kafka Architecture can be improved in two ways:
1. Making kafka architecture more robust.
2. Having dedicated Kafka Broker to improve performance
7.1. Making kafka architecture more robust
In precision agriculture project, both broker and consumer were running on the same system as the
requirement of data ingestion was to store data in HDFS. In order make the architecture more robust,
consumer system must be a remote system or cluster which have the access to kafka broker this way the
architecture will be more robust and in case of failure in kafka broker the data can be retrived from
consumer.

13
7.2. Having dedicated Kafka Broker to improve performance
Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the
cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that
kafka broker has got over hadoop environment and speeds up all the processes.
8.0 FUTURE RESEARCH
1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive.
2. Implement real time machine learning using Spark Mllib
3. Connect the live data to the reporting tool and analyze live data and create useful reports.
9.0. CONCLUSION
Kafka is rapidly growing distributed messaging system having various application in the field of
engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was
captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure
to handle different big data problems in real time situation and helped me understand kafka architecture.

14
BIBLIOGRAPHY
http://kafka.apache.org/
Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark
Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter
Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of
Korea. http://www.aclweb.org/anthology/P12-3020

Precision Agriculture Data Ingestion Using Kafka

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Precision Agriculture Data Ingestion Using Kafka

Ähnlich wie Precision Agriculture Data Ingestion Using Kafka (20)

Precision Agriculture Data Ingestion Using Kafka