1. Page 1 of 1
PRASHANT AGRAWAL
Email: prashanttct07@gmail.com
Mobile: +91-8097606642
Professional Summary:
● Innovative Software Professional with 5+ years of progressive experience and continued
success as a Big Data Analyst.
● Day to day experience in working with Agile/Scrum methodology.
● Vast experience in search engine solution viz. E-commerce search, Enterprise Search,
Log Analytics and Monitoring etc.
● Hands on Exposure on Log analytics for various logs such as syslog, authlog, Postfix
logs, Router logs, apache logs, netflow logs, Application logs using ELK
● Hands on experience on ETL using Spark, Spark Streaming and Spark SQL
● Full Text Search Solution with analytics and visualization using Elasticsearch, Logstash
and Kibana.
● Good knowledge on distributed computing system such as Hadoop Eco System, Spark,
Flume etc. to analyze the network logs and perform ETL on various data sets.
● Hands on experience on working with HDP (Horton Works) cluster with various
components such as Spark, HDFS, Flume, Kafka, Yarn, Oozie, Phoenix, Presto etc.
● Good hands on with SVN, VSS, Git and build tools like Maven.
● Hands on experience in developing the product for capturing, intercepting and
monitoring the internet traffic for LEA’s (Law Enforcement Agency’s).
● Good exposure in handling the big data (in TB’s) with Elasticsearch using cluster of 12
nodes at deployment level.
● Good exposure on writing the spark application using Scala
Domain and Skill Set:
Domain
Engineering and Network forensics, Big Data Analytics , Digital
Marketing and Advertisement
Programming Languages Core Java, Scala , C#, PHP
Operating System Mac, Linux (Ubuntu ,Red Hat, Cent OS) , Windows7
Tools /DB/Packages
Elasticsearch, Logstash, Kibana, X-pack, Spark, Flume, Oozie,
Intellij Idea, Visual Studio 2010, MySQL, Version
Control(Perforce, SVN, VSS, GIT), Defect Tracker(Jira, Fogbugz)
2. Page 2 of 2
Professional Project Details:
Project - 1
Project Name Log Analytics and Visualization using ELK Team Size 1
Start Date Jan 2016 End Date Till Now
Description
This Project Involves Log analytics using ELK which also caters writing of
Elastic queries for E-commerce and Enterprise search
Role &
Contribution
● ELK 5.x Setup and Configuration on various OS such as Mac, Linux,
Windows
● Migration of data from ELK 2.x to 5.x
● Well verse in setting up the various nodes in prod cluster like master, data
and client node.
● Implementation of shards and replica (to avoid single node failure) for better
management of indexes.
● Preparation of schema and analyzers (through template) to store the data in
elasticsearch
● Written Elasticsearch query to support various search features such as Auto
Complete, Synonyms, Grammar Based Search, Exact and Non Exact search,
misspelled search, aggregation, Boolean search,aggregations etc.
● Build and development of Logstash plugin to support specific requirement or
feature.
● Data extraction using various logstash Input Plugin such as JDBC, File, TCP,
UDP, S3 etc.
● Data filtration using Logstash Filter plugin (CSV, Grok, Mutate, Date, Geo
etc.)
● Data indexing to Elasticsearch using Logstash output plugin
● Used various beats as data shipper to Logstash.Includes various beats such
as File beat,Metric beat etc.
● Data visualization and dashboard reporting using Kibana
● Setting up backup and restore using snapshot and restore
● Fine tuning and optimization of queries in order to get response faster.
● Preparation of multi index architecture (Time series Indexing arch) in order
to perform faster search and get the response as quick as possible.
Technologies JSON
Tools/Tool
chain
Elasticsearch 5.x/2.x, Logstash 5.x/2.x, Kibana 5.x/4.x, Beats, X-pack,
Head Plug-in, Kopf Plug-in, Carrot2, Lingo3g, Putty, Win SCP
3. Page 3 of 3
Project - 2
Project Name Data Lake Modules in Spark Team Size 1
Start Date June 2016 End Date Till Now
Description This project involves creating a generic data lake solution to migrate the
data from one SQL/No SQL database to another.
Role &
Contribution
● Created a module on spark which reads data from SQL or Hbase and dumps
the same to HDFS as AVRO or ORC
● Module is created in a way where user can specify their input type if its has
to be Hbase or SQL and output type as to be ORC or AVRO in HDFS
● Implemented the data lake modules to run periodically using the oozie
scheduler, where job runs every hour and dump the data.
● Added a functionality to auto clean up the older dumps which is X version
and Y days are old.
● Created an offline index module which acts as secondary index to hbase table
with required fields only to speedup the select query.
Technologies Scala
Tools/Tool
chain
Kafka, Spark, Spark Streaming and Spark SQL, Phoenix, Hbase , Hive ,
Maven, Git
Project - 3
Project Name Spark ETL for fact and dim types of data Team Size 2
Start Date Jan 2016 End Date May 2016
Description
This project involves ETL processing with Spark which involves pulling up the
information from a variety of sources, transforms the data, and then
pushes to Presto for OLTP/OLAP analytics for dim and fact type of data.
Role &
Contribution
● Consume the dim/fact Kafka message in spark using Kafka consumer which
are produced by Kafka producer
● Extract the messages consumed by Kafka consumer
● Perform business logic and transformation on those messages
● Push the transformed data to either Hive or Presto for further OLTP and
OLAP analytics
4. Page 4 of 4
Technologies Scala
Tools/Tool
chain
Maven, Git, Kafka, Spark, Spark Streaming and Spark SQL, Phoenix, Presto
Project - 4
Project Name
Deployment of Elasticsearch to handle big
data in IIMS
Team Size 2
Start Date Feb 2014 End Date Dec 2015
Description
This Project Involves deployment of Elasticsearch to handle data in GB’s (in
a day)
Role &
Contribution
● Deployment architecture preparation to handle such a big data using
cluster with 12 nodes.
● Data visualization and dashboard loading using Kibana
● Well verse in setting up the various nodes in deployment such as
Master, Data and client node.
● Hands on experience with various plug-in such as Mapper
Attachment, Head, Big desk, Carrot2 (with lingo3g categorization
algorithm)
● Preparation of schema to store the data in Elasticsearch
● Preparation of search query to be used for retrieval of data from
Elasticsearch using Query String, Match, Boolean, Aggregation etc.
● Though its big data still backup and restore is required in case of
any failure hence setup the Snapshot and restore process to take
daily backup of the same.
● Implementation of shards and replica (to avoid single node failure)
● Fine tuning and optimization of queries in order to get response
faster.
● Preparation of multi index architecture in order to perform faster
search and get the response as quick as possible.
● Implementation of feature like Synonyms, Stemming, Grammar
extension, wild card search etc.
Technologies C#, Elasticsearch(NoSQL Database)
Tools/Tool
chain
Elasticsearch 1.5.x, Mapper Attachment, Head Plug-in, Big desk Plug-in,
Carrot2, Lingo3g, Putty, WinSCP
5. Page 5 of 5
Project – 5
Project Name Big Data Platform Development Team Size 2
Start Date June 2015 End Date Dec 2015
Description
This project involves processing of the Logs being generated from various
system and devices and perform predictive analysis to form the pattern and
trace the attacking device or user
Role &
Contribution
● Collecting the high speed logs coming from various devices and
inject the same using Flume
● Pass on the logs data from flume to spark streaming and spark SQL
so as to store the same onto Memory (As Spark is being known for in
memory data processing)
● Perform predictive analysis onto the log as per the defined
algorithm, also perform the computation with self derived algorithm
as well
● Persist the log, in memory for specific time duration using spark
streaming and then persist the same permanently to Elastic
● Performed all above operation and computation using distributed
computing. Which includes setting up the 5 node cluster of HDFS
using Horton Works Development Platform
Technologies
Core Java, Flume, HDFS, HDP Clustering, Spark, Spark Streaming and Spark
SQL, Elasticsearch
Tools/Tool
chain
Maven, Git, MySQL
6. Page 6 of 6
Educational Qualifications:
Course Board/University Year of Passing Percentage
10th
CBSE 2005 79.20
12th
CBSE 2007 76.00
B.E.(Computer Science) RGPV Bhopal 2011 76.44
GATE - 2011 91 Percentile
CAT - 2010 85 Percentile
Personal Profile:
Date of Birth : 23rd Nov, 1989
Passport No : J3031277
Willing to re-allocate : Depends upon Location
Willingness for Onsite : Yes
PAN : ATXPA9120F
Declaration:
I hereby declare that the information provided above is correct and true to the best of my
knowledge and believe.
Date: 09 Jan 2017
Place: Pune (Prashant Agrawal)