Anzeige

Precision Agriculture Data Ingestion Using Kafka

3. Dec 2016
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Anzeige
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Anzeige
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Precision Agriculture Data Ingestion Using Kafka
Nächste SlideShare
Deep Dive into Docker Swarm ModeDeep Dive into Docker Swarm Mode
Wird geladen in ... 3
1 von 14
Anzeige

Más contenido relacionado

Similar a Precision Agriculture Data Ingestion Using Kafka(20)

Anzeige

Precision Agriculture Data Ingestion Using Kafka

  1. PRECISION AGRICULTURE SUPPORT USING SCALA/SPARK Project Report SRIRAM RV SPRING SEMESTER ADVISOR: PROFESSOR BRAD RUBIN
  2. 2 Table of Contents 1.0 PURPOSE OF PROJECT.......................................................................................................................4 2.0 PROJECT DESCRIPTION.....................................................................................................................4 2.0 Why Agriculture Data.........................................................................................................................5 3.0 DATASET ..............................................................................................................................................5 3.1 Data Source:........................................................................................................................................5 3.2 Details about Dataset: .........................................................................................................................5 3.3 Sample Data........................................................................................................................................5 Weather data .........................................................................................................................................5 Moisture Data........................................................................................................................................5 Image Data............................................................................................................................................6 3.4 Schema................................................................................................................................................6 Weather data .........................................................................................................................................6 Moisture Data........................................................................................................................................6 3.5 Data Description: ................................................................................................................................7 Weather Data: .......................................................................................................................................7 Moisture Data........................................................................................................................................7 4.0 PROJECT IMPLEMENTATION...........................................................................................................8 4.1 Data Ingestion using Kafka.................................................................................................................8 4.2 Kafka producer....................................................................................................................................8 4.4 Kafka Broker.......................................................................................................................................9 4.5 Kafka Consumer ...............................................................................................................................10 5.0 ADDITIONAL TOOLS........................................................................................................................10 5.1 Maven ...............................................................................................................................................10 5.2 Scala Build tool.................................................................................................................................11 5.3 Git .....................................................................................................................................................11 6.0 OUTPUT INTERPRETATION............................................................................................................12 7.0 IMPROVING THE KAFKA ARCHITECTURE.................................................................................12 7.1. Making kafka architecture more robust ...........................................................................................12 7.2. Having dedicated Kafka Broker to improve performance ...............................................................13
  3. 3 8.0 FUTURE RESEARCH.........................................................................................................................13 9.0. CONCLUSION....................................................................................................................................13 BIBLIOGRAPHY.......................................................................................................................................14
  4. 4 1.0 PURPOSE OF PROJECT Big data tools over last few years has been focused on both structured and unstructured data. However, image processing is one area where it needs more of attention and it has been my area of interest too. With the help of this project, I will get an opportunity to experiment with streaming images and weather data captured in the UST greenhouse, and also get a feel for image processing with Scala/Spark on Hadoop more generally. I will gain experience in technologies such as Scala, Spark, Spark streaming, and image processing in the domain of food technology that will give me skills that I cannot otherwise obtain in the GPS curriculum. 2.0 PROJECT DESCRIPTION The purpose of the project is to stream real-time weather data captured by both direct sensors and RGB images captured by the drones to perform image processing and weather data analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image processing and streaming with Spark are knew technologies to GPS, part of the project will focus on experimenting with different tools and find out more reliable way of storing images and streamed data in HDFS. The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST School of Engineering. The greenhouse has a local weather station that will be broadcasting weather data such as temperature, humidity, light intensity, barometric pressure, position (latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at 10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field use which is useful for both analysis of plant health and creating a model for each of the six plant species that will be grown. In addition, high resolution images will be taken of the plants in the visible and near IR regions of the light spectrum. The periodicity of these images will be every couple of days.
  5. 5 2.0 Why Agriculture Data With the help of agricultural data, I will get an opportunity to experiment withstreaming images and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed and gives me experience on working with data from food technology. 3.0 DATASET 3.1 Data Source: The data source used for this project is the live streaming of weather and moisture data captured using sensors through Arduino chip and Streamed using Kafka producer. 3.2 Details about Dataset:  The sensor data were captured for every second.  Total number of days of weather data stored in HDFS is 90 days.  Total number of days of moisture data stored in HDFS is 85days.  Total number of days of image data stored is 90 days. 3.3 Sample Data Weather data Fig 1: Sensor Weather data from Arduino Moisture Data
  6. 6 Fig 2: Sensor data from Arduino Image Data Image data was captured every alternative day over a period of 90 days . Fig 3: Images from the greenhouse 3.4 Schema Weather data Date Time Wind direction Wind Speed Humidity Temperature Rain Pressure Battery Light Level Table 1 : Weather Data Schema Moisture Data Date Time Moist 2 Moist 6 Moist 8 Moist 11 Moist 10 Moist 1 Moist 9 Moist 7 Moist 5 Temp Par Table 2 : Weather Data Schema
  7. 7 3.5 Data Description: Weather Data: Date & time : Timestamp of the recording Wind Direction: Direction of wind Wind Speed: Speed of wind Wind Gust: Gust of wind Humidity: Percentage of water in air Temperature: Temperature Rain: Rain percentage Pressure: Air pressure Battery: Battery of Arduino Light: Light exposure Moisture Data Moist 2: Moisture of plot 2 Moist 6: Moisture of plot 6 Moist 8: Moisture of plot 8 Moist 11: Moisture of plot 5 Moist 10: Moisture of plot 10 Moist 1: Moisture of plot 1 Moist 9: Moisture of plot 9 Moist 7: Moisture of plot 7 Moist 5: Moisture of plot 5 Temp: Soil temperature PAR: Moisture metrics
  8. 8 4.0 PROJECT IMPLEMENTATION 4.1 Data Ingestion using Kafka Kafka is the distributed messaging system which allows to transmit moisture and weather data from Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker. The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the addresses of the broker thought zookeeper. Fig 4: Kafka Architectural Diagram 4.2 Kafka producer Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka brokers which is attached to the header of packet while sending the data. The weather data, moisture data and image data differentiated using different topics such as “weather-data”, ”moisture-data” and “image-data”. Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap server is the broker ID list of the Kafka broker.
  9. 9 Fig 5 : Configuring the kafka producer Below is the snippet that is used to create message object which contains topic and messages to be sent to the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages, sends it to the broker. Fig 6: Sending the message to kafka broker 4.4 Kafka Broker Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling hundreds and hundreds of read and write operation per second. It can elastically expand without downtime. Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than capability of single machine. The Kafka broker can be monitored using Zookeeper using port number 2181.By default Kafka broker comes with retention period of 168 hours.
  10. 10 Fig 7 : Monitoring the messages using Zookeeper 4.5 Kafka Consumer Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing. Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that we are interested to fetch from the broker. Fig 8 : Configuring Kafka Consumer 5.0 ADDITIONAL TOOLS 5.1 Maven Maven was used as the dependency management to bring in all the jar from the server to the local repository. This dependency injection help to develop the code from the windows environment .Maven helped to specify the version of spark and kafka that was used and all the jar files related that version of spark was stored in the local repository.
  11. 11 Fig 9 : Dependency Injection 5.2 Scala Build tool Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm using winscp. Fig 10 : SBT build 5.3 Git Git is online code repository for storing all the code related to project. It offers all of the distributed revision control and source code management (SCM). Git was used for precision agriculture project repository to store the code online and share with team. Below is the git link for the precision agriculture. https://github.com/sri303030/Data-Ingestion-using-Kafka
  12. 12 6.0 OUTPUT INTERPRETATION The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to distinguish between weather data and moisture data. Below is the output from the weather data folder Fig 11: weather data folder Below is output from the moisture data folder Fig 12: Moisture Data Folder 7.0 IMPROVING THE KAFKA ARCHITECTURE Kafka Architecture can be improved in two ways: 1. Making kafka architecture more robust. 2. Having dedicated Kafka Broker to improve performance 7.1. Making kafka architecture more robust In precision agriculture project, both broker and consumer were running on the same system as the requirement of data ingestion was to store data in HDFS. In order make the architecture more robust, consumer system must be a remote system or cluster which have the access to kafka broker this way the architecture will be more robust and in case of failure in kafka broker the data can be retrived from consumer.
  13. 13 7.2. Having dedicated Kafka Broker to improve performance Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that kafka broker has got over hadoop environment and speeds up all the processes. 8.0 FUTURE RESEARCH 1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive. 2. Implement real time machine learning using Spark Mllib 3. Connect the live data to the reporting tool and analyze live data and create useful reports. 9.0. CONCLUSION Kafka is rapidly growing distributed messaging system having various application in the field of engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure to handle different big data problems in real time situation and helped me understand kafka architecture.
  14. 14 BIBLIOGRAPHY http://kafka.apache.org/ Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea. http://www.aclweb.org/anthology/P12-3020
Anzeige