1. Summary
Having an overall 2.8 years of IT experience in software design and development with major focus in
Big Data, Hadoop, Core Java/J2EE and Analytics. I ama Big Data Enthusiast and gradually delving into
Data Science and Machine Learning on top of various technologies across the Hadoop stack. Worked
on multiple projects including web application development, ingesting the data into analytics pipeline,
providing both real time and batch analysis over Data across domains like Insurance, Media & Retail.
Good programming knowledge in Object Oriented Programming Languages like Java. Self motivated
and a constant learner. Having excellent communication and interpersonal skills. I ama Certified
Cloudera Hadoop Developer(CCDH).I amalso having an OCJP certification. I hold a masters in
computer applications.
Objective: To seek a challenging position in Big data and Data science field, with an opportunity
for learning , growth and career advancement as successful achievements.
Profile
Worked on Big Data and Data Science related streams.
Experienced in handling large scale data applications and performing analytics on them.
Have the Experience in working in CDH4 and its configurations.
Have the experience in monitoring and administering Cloudera Manager.
Extensively used Hue for exploration, parsing and analysis and interacting with the cluster.
Applied knowledge of Hadoop and its ecosystem (Flume,Sqoop,Hive,Hbase,Pig,MapReduce,Oozie) to
design and develop Big Data solutions.
Specialized in Big Data Analysis using python, spark and on their dependent libraries like
numpy,pandas,sckit-learn,matplotlib,sparkSQL,Streaming and MLib.
Dealt with text mining,NLP, click stream analysis and recommendation engine using pyspark based on
various machine learning techniques like Regression,Classification,KNN,SVM,Naives Bayes etc.
Worked with Hadoop file formats like Avro and Parquet
Worked on unix and linux flavours.
Worked across multiple domains like Insurance ,Retail and Media.
Involved in developing web applications using Java and related frameworks like Struts, Spring and
Sudipto Saha
Hadoop Solutions Engineer
Email: sudipto1989@hotmail.com
Mob: 91 7278625693
2. 2 | P a g e
Junit.
Worked on Web and Application Servers like Apache tomcat, WebLogic
Interacted with clients for requirement gathering; prepared functional specifications and developed
applications based on the specs using Java and frameworks like struts and spring and also
responsible for providing support and enhancement to the applications.
Experience in both Waterfall model and Agile (Scrum) for software development.
Possess good communication and interpersonal skills.
Confident and enthusiastic with a zeal to learn new technologies and skills.
Professional Affiliation
Tata ConsultancyServicesa.k.aTCS (2013- Present)
Work Experience
Client : The Nielsen Company.
Client Description: The Nielsen Company, and formerly known as AC Nielsen is a global marketing
research firm, with worldwide headquarters in New York City, United States. One of Nielsen's best known
creations is the Nielsen ratings, an Audience measurement systemthat measures television, radio and
newspaper audiences in their respective media markets. Another market research tool is the Homescan
program where sample members track and report all grocery and retail purchases, allowing purchasing
patterns to be related to household demographics. Homescan covers several countries including Australia,
Canada, the United Kingdom, and the United States.
Project: DMLE (Digital MachineLearningEnterprise)
Description:Acommonplatformwherevariousmachinelearningsolutionsarecontributedfromacross
all Nielsentowerswherethey arepreservedandmaintainedso that it couldbeused acrossall Nielsen
applicationswithoutbuildingitfromscratch.
Assignment: To builda Product SimilarityEngineService.
Duration:Mar’2016 -Present
TeamSize: 3
3. 3 | P a g e
Objective: To sell the service to a XXX supply chain store. (* Name of the supply chain withheld for confidentiality
reasons)
Tools Used : Hadoop 2.3, Spark 1.5.6,Python2.7, Cloudera v4.3,Hue,YARN,Ananconda,Ipython,Keppelin,Spark
Libraries:SQL,Streaming,MLib.Visualisation:MatplotLib,Bokeh. Rest WS. Persistance :HBase
Responsibilties :
Worked as a developer with the lead and the Architect to implement the solution.
Worked in a cluster of 18 heterogeneous worker nodes and 1 master node across different geographies.
Used the combination of python and spark to implement the POC.
Used Rest Service to retrieve the feed.
Preprocessed and transformed data using pyspark.
Built the similarity model using Spark MLib implementing various machine learning techniques like KNN,K-
means,SVM and decision trees.
Used Hbase as the NO-SQL store.
Used Spark SQL forexploration.
Validation the model against the live feed using Spark Streaming.
Visualizations using Matplotlib.
SuccessStory : The Similarity Engine was consolidated withBFM(Best Fit Mapping),a productfrom Nielsen Watch
tower and was deployed as a servicewhich was embraced not by the stakeholder but is known to be one of most
heavily accessed application amongst all Nielsen towers.
Project : Media Events
DescriptionAmoduleunderNielsenRatingsWing.
Assignment : Analysis on Media Events data.
Duration : Dec`15 – Feb `16
Team Size : 6
Objective: To remove the time slot conflictsbetween satellite data and stations data
Tools Used : Hadoop v2,Flume,Sqoop,Avro,Hbase,Pig,Hive,Spark,python,oozie.
Responsibilties :
To collectdata from stations servers via Flume.
Used Avro forsource.
4. 4 | P a g e
Used interceptors forprocessing on the fly.
Using Pigfor parsing and metadata inference.
Used Spark and pythonfor processing the logs.
Used Sqoop forMedia markets records from MYSQLto HDFS whichis referred to when validating
stations and satalletite data forconficts.
Used Hive for running batch reports.
Used HBASE for storing keys forrecords for Localand National data which is looked up while
determining the conflicts.
Used Oozie forautomation.
Client : United India Insurance Company Limited
Role: Java Developer
Domain:Insurance
Duration: Oct 13-Nov-15
Team Size: 9
Client Description: United India Insurance Company Limited is a public sector General Insurance
Company of India and one of the top General Insurers in Asia. Offers a wide variety of Non-Life insurance
products such as Automobile, Fire, Burglary etc. etc.
Tools used:Java 1.6, Struts 1.3, Spring 2.5, Eclipse Kepler ,Oracle12c ,Toad,Maven,Tortoise SVN ,JSTL
,JavaScript Apache Tomcat,Junit.
Responsibilities:
Understanding the scope of the project and requirement gathering.
Followed J2EE specifications in the project.
Developed web tier using JSP & JSTL and Struts MVC.
Developed web components using Struts and spring with some use of REST WS.
Created and maintained the configurations of the Spring Applications Framework (IOC).
Implemented various design patterns such as Singleton,Business Delegate,Value Object and Spring
DAO.
Created bean classes for communicating with database.
Used Spring JDBC for Database connectivity.
Involved in writing Spring Configuration XML files that contains declarations and other dependent
objects declaration.
Used Tomcat web server for development purposes.
Used Oracle as Database and used Toad for queries execution.
Write SQL scripts and PL/SQL code for procedures.
Involved in creation of test cases and unit testing using Junit.
5. 5 | P a g e
Technologies Used :
Operating Systems Windows, Unix
Languages Jdk 1.4/1.5/1.6/1.7, JavaScript, SQL, Unix Shell(basics),
Phython
Hadoop Distribution Apache, CDH
Big Data Technologies Apache Hadoop (MRv1,MRv2),Spark, Hive, Pig, Sqoop,
HBase, Flume, Zookeeper, Oozie.
Web Technologies HTML, JSP,CSS, JavaScript, JSON& AJAX
Server-side Frameworks Struts 1.3, Hadoop,Spring
IDEs Eclipse, NetBeans, IntellijIDE
Build Tools Maven
Web Servers/App
Servers
Apache Tomcat 6.0/7.0, WebLogic
Databases Oracle 8i/9i/10g/11g, HDFS, HBASE, Hive
Reporting Tools Tableau(preliminaries)
6. 6 | P a g e
Certifications
ClouderaHadoop DeveloperCertifications(CCDH) [Nov-2015]
Oracle CertifiedJavaDeveloper(OCJP) [Aug-2014]
Qualifications
Degree Institute University Year
MCA Heritage Institute of
Technology
WBUT 2010-2013
BCA Syamaprasad Instituteof
Technology
WBUT 2007-2010
Personal Details
Date of Birth: 21st April 1989
Nationality: Indian
Address: 62, R.N Tagore Road, Thakurpukur, Kolkata: 700063
Languages Known: English, Hindi and Bengali
Passport Number: L9646873
Pan No : BQLPS4414A