SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
 zingy.rajeev@gmail.com  +31645503407
Rajeev Kumar
Apache Spark and Scala Developer
Amsterdam, NL
8+ years experienced & result-oriented ETL Developer , Apache Spark & Scala,Java, possessing a proven track record in
software development using Cloudera Hadoop, Apache Spark, Scala,Java and ETL tools, Proficient in processing
structured/unstructured data & deploying Apache Spark to analyse huge datasets, identify patterns & gain valuable
insights. Demonstrated capability in accomplishing project life cycle management of client server & web applications.
Adept at exporting & importing data using Hadoop clusters & implementing in-depth knowledge of web server using
Java/J2EE. Highly skilled at data management & capacity planning for end-to-end data management & performance
optimization. Expertise in Data analysis, Data Integration, Data Modelling, Data Warehousing, Holds expertise in
various ETL and Reporting tools. - Expertise in Teradata, DB2, SQL Server, SQL, PL/SQLa and Big Data domain. -
Automated various processes through UNIX - Domain Expertise : Banking, Financial Services, Insurance and Health
Care
PROFESSIONAL EXPERIENCE
Data Engineer  Oct '13 - Present
Infosys Ltd. Amsterdam, NL
Infosys Limited is an Indian multinational corporation that provides business consulting, information technology and outsourcing
services.
photo_camera
Expertise with the tools in Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie,
and Zookeeper.
Implemented spark using scala and spark sql for faster testing and processing of data.
Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence
solutions using Data Warehouse/Data Mart Design, ETL, OLAP, BI, Client/Server applications.
Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with
Hive and SQL/Oracle.
Strong experience in writing applications using scala swing core libraries.
Knowledge on MongoDB No SQL database data modelling, tuning and backup.
Strong experience in Dimensional Modeling using Star and Snow Flake Schema, Identifying Facts and Dimensions,
Physical and logical data modeling using ERwin and ER-Studio.
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and
unstructured data.
Used Scala libraries to process XML data that we store in HDFS.
Load the data in Spark RDD and do in memory data computation and generate output response.
Involved in converting map reduce program to spark transformation using spark RDD and scala.
Developed spark script using REPl and Scala Eclipse IDE.
Used Sqoop to extract data from RDBMS databases.
Involved in moving log files from generated from different data source to HDFS for further processing through
flume.
Data sanitation using INFORMATICA Test data management tool and CA Fast data masking tool.
Expertise in working with relational databases such as Oracle 11g/10g/9i/8x, SQL Server 2008/2005, DB2 8.0/7.0,
UDB, MS Access and Teradata, Hadoop,PrgSql.
Strong experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses
and Data Marts using Informatica Power Center (Repository Manager, Designer, Workflow Manager, Workflow
Analytics & Performance Enhancement
Leadership & Team Management
Data Management & Reporting
Key Achievements
Sr. Software Engineer  Feb '10 - Oct '13
Syntel ltd Pune, IN
Syntel, Inc. is a U.S.-based multinational provider of integrated technology and business services.
Key Achievements
Monitor, Metadata Manger), Power Exchange, Power Connect as ETL tool on Oracle, DB2 and SQL Server
Databases.
Tool creation for test data on demand for test data mining for testing.
Test data setup to copy data from production environment to test environment after data sensitization.
Using Jenkins continuously improve the test automation code.
Wrote automated load unload job creation for data sanitation for mainframe ZOs.
Experience in all phases of Data warehouse development from requirements gathering for the data warehouse to
develop the code, Unit Testing and Documenting.
Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle10g/9i,
MS SQL Server, DB2, Teradata, VSAM files and Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
Experience in using Automation Scheduling tools like Autosys and Control-M.
Worked extensively with slowly changing dimensions.
Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement
analysis, data mapping, build, unit testing, systems integration and user acceptance testing.
Excellent interpersonal and communication skills, and is experienced in working with senior level managers,
business people and developers across multiple disciplines.
Deployed Scala shell commands to develop Spark scripts in accordance with client requirements
Enhanced performance of Spark applications for fixing accurate batch interval time and memory tuning
Utilised memory computing of Spark via Scala to perform advanced procedures like text analytics & processing
Working with team of 7 Data Scientists & Software Engineers to develop scalable distributed data solutions via
Hadoop
Coordinated with multiple internal departments to evaluate existing Predictive Models & enhance efficiency
Liaised with stakeholders to perform data manipulation, transformation, hypothesis testing & predictive modelling
Created Scala/SQL codes to extract data from multiple databases & conceptualised ideas for Advanced Data
Analytics
Utilised Spark MLLIB Libraries to design recommendation engines & implement data processing & retrieval
initiatives
Deployed Scala to generate complex client-specific reports, render data analysis & create statistical models
Initiated automation & directed performance optimisation to boost traffic by 38% and advertising revenue by 16%
Selected out of 3000+ employees to receive the Star Performer of the Year Award '16 for extraordinary
performance
Hands on experience in Informatica powercenter tool for data integration.
Supervised data file objects & sizing, monitored database usage/growth and strategised & executed standby
database
Maintained logins, DB yield, table space development, user profiles/indexes, storage parameters, etc.
Analyzed Log files & conducted Root Cause Analysis to diagnose & resolve 150+ issues
Achieved a reduction in data processing time by 78% by designing automated solutions
INTERNSHIPS
Summer Intern  Jan '09 - Jan '10
S Tel Mumbai, IN
S Tel Private Limited was a GSM based cellular operator in India.
EDUCATION
B.Tech - Electronic and Tele communication  Apr '04 - Jul '08
GH RAI SONI COLLEGE OF ENGINEERING Nagpur, IN
PROJECTS
PROJECT 1: ABN AMRO Bank Risk data Aggregation project | Amsterdam, '18 till today
Client: ABN AMRO Bank
Brief: ABN AMRO Bank Risk data Aggregation project successfully demonstrated the use of big data technologies to
implement complete, end to end data warehouse and business intelligence operation and usage of EDH(Enterprise
data hub) as centralized,unified data source. This is an implementation of existing data ware house application based
on Oracle-Db2 into hadoop,enterprise data hub. Apart from EDH development, this project also demonstrate the
replacement of existing ETL and reporting operation with Hadoop ETL and reporting tools.
Environment: HDFS (for storage), Spark SQL (for transformation), Spark MLlib (for ML), Zeppelin (for visualization)
PROJECT 2: On-Demand Test Environments and Data for Development and QA | Amsterdam, '17_18
Client: ABN AMRO Bank
Brief: To reduce the bank cost on test environment infrastructure, ABN Amro came up with the project of ondemand
creation of test environment using docker hub images. With the just single click database instance of Db2 , Oracle ,
Mysql database would be created in unix containers. The containers can be started to start the services and destroyed
based on need basis
Environment: Windows ,Docker hub, Nexus repository , Docker images
Worked as a Web Developer & Mobile Application Intern to develop web pages by using scripting languages
CGPA: 7 / 10
Overcame challenges of storing & processing structured/semi-structured data via Hadoop Framework & Apache
Spark
Transferred data into HDFS & deployed Scala to analyze voting patterns across multiple sources and channels
Converted semi-structured XML data into a structured format to enable further processing using Apache spark
Delivered the output into RDBMS via Sqoop & achieved real-time processing of the website on Python-based server
Designed Ooze workflow to automate the entire process of pulling the data from core tables to build aggregate
tables that help the business user in decision making.
Validating the data between source and destination.
Involved in development of an ingestion framework (ELF) to extract data from SQL Server and ingest the source
data into Hadoop DataLake using Sqoop.
Data mining from json,xml, and csv files using spark.
Pulled the docker images from docker hub.
Started/Stop the docker containers.
Build a Java swing application to automate the docker commands.
Exported the right data to containers for development and testing.
PROJECT 3: Teradata migration| Amsterdam, '17_18
Client: ABN AMRO Bank
Brief: As a part of this project report build in Teradata were migrated from Teradata to Hadoop, as few heavy tables
were moved in hive. This includes migration of Teradata procedure to hive queries, along with implementation as well
as validation of same in Hadoop platfrom and giving warranty support to the deployed report.
Environment: HDFS (for storage), Spark SQL (for transformation), Sqoop, Mainframe,Oracle, Db2, Teradata
PROJECT 4: Data Mart Creation| RBS UK, '15_16
Client: Royal Bank of Scotland
Brief: Design a system to replay the real time Data mart creation of transactions in various up and down stream
systems.
Environment: Mainframe,Teradata, Informatica power center, Talend Data integration tool
PROJECT 5: : Integrated Date Warehousing (IDW) | Bengaluru, '15
Client: : Gander Mountain | Bengaluru, '15
Brief:
Build java swing application to automate the data mining from various data sources using JDBC url based on
various user test conditions.
Created data pipeline for CI/CD pipeline integration
Overcame challenges of storing & processing structured data via Cloudera Hadoop Framework & Apache Spark
Understanding of existing stored procedure in teradata and re engineerthat in Hive and Oozie.
Transferred data into HDFS & deployed Scala for consistency check across dataframes.
Involved in development of an ingestion framework (ELF) to extract data from Teradata and ingest the source data
into Hadoop MiniBank using Sqoop.
Designed Ooze workflow to automate the entire process of pulling the data from core tables
Validating the data between source and destination.
Understanding the Business requirements based on Functional specification to design the ETL methodology in
technical specifications.
Developed data conversion/quality/cleansing rules and executed data cleansing activities such as data
Consolidation, standardization, matching Trillium for the unstructured flat file data
Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using
Informatica Power Center 8.5
Experience in integration of heterogeneous data sources like Oracle, DB2, SQL Server and Flat Files (Fixed &
delimited) into Staging Area.
Designed and developed mappings using Source Qualifier, Expression, Lookup, Router, Aggregator, Filter,
Sequence Generator, Stored Procedure, Update Strategy, joiner and Rank transformations.
Managed the Metadata associated with the ETL processes used to populate the Data Warehouse
Implemented complex business rules in Informatica Power Center by creating re-usable transformations, and
robust Mapplets.
Worked with Functional team to make sure required data has been extracted and loaded and performed the Unit
Testing and fixed the errors to meet the requirements.
Copied/Exported/Imported the mappings/sessions/ worklets /workflows from development to Test Repository and
promoted to Production.
Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of
workflows based on changing variable values
Used PMCMD command to automate the Power Center sessions and workflows through UNIX.
Consolidating all data marts into an Integrated Date Warehouse (IDW) to reduce costs & increase efficiency
Deployed Informatica PowerCenter 10.X as the ETL tool for implementation of ETL processes in the project
Overcame challenges w.r.t multiple brand acquisition like silo functioning, data redundancy, multiple data marts,
etc.
Replaced multiple reporting tools, resolved integration & delivery issues & created a common 'enterprise' data
model
Resolved issues w.r.t data quality & departmental coordination to reduce cost of maintaining multiple data marts
Reduced cost of training multiple teams & deployed a single tool for data analysis & systems integration
Minimised data replication across multiple data marts & maintained a single true source of extracting data
Environment: Informatica power center,Oracle 11C, Mysql,Db2 V10
PROJECT 6: Android Application development| Bangalore, '13
Client: R N Technologies.
Brief: Building App on life and teaching of great saint Swami Vivekakanada
Environment: Eclipse Andriod IDE, Andriod operating system
PROJECT 7: TDM AND UAT BATCH EXECUTION | Bangalore, '10-13
Client: HUMANA , USA.
Environment:JCL, COBOL, DB2, CICS, FILE-AID ,INFOPAC ,VSAM, Rexx,
Easytrieve, INFOMATICA, ILM TDM TOOL,JAVA,TOAD
Brief: This project involves development of new processes, enhancement
to the existing process and operations support using ILM TDM Tool.
This project involve data sub setting, data masking and data mining
for various platform of Humana health care (Eg. Claims,
Authorization, CI-Medicare etc) based on release cycle.
Analyzed Business Requirements.
Integration Testing.
KEY SKILLS
• Data Processing • Big Data Analytics • Apache Spark Framework • Scala Programming • AWS
• Installation, Configuration & Testing • Product Design & Development • Systems Architecture Support
• Client Relationship Managemen •DUTCH langauage A1 level • Project Management • Quality Assurance • Research,
Reporting & Documentation
• Leadership & Team Management • Strategy • Software Development • Service-oriented Architecture
TECHNICAL SKILLS
Resolved issues pertaining to multiple vendor database, ETL & reporting platforms
Integrated the acquired business data into heterogeneous platforms to consolidate all the data marts into an IDW
Created Andriod application bases on life and teaching of Swami Vivekakanada
Designed and developed functional specifications and detailed design for new processes/programs.
Performed Coding, Unit, Integration and support for User Acceptance testing
Designed and developed Test Cases, Test Scripts and Test Plans for base line applications and document the test
results.
Written and executed oracale, DB2 Queries to extract data from tables.
Developed various audit reports in COBOL and Cobol-DB2 used by clients to make decisions critical to the smooth
migration of legacy data to APS.
Wrote complex SQLs to accomplish various business logics.
Executed batch jobs and tested the applications for
Developed Batching monitoring tool for batch cycle monitoring.
Developing new programs with COBOL, DB2, CICS, MQ-Series and maintaining the existing programs for volume
testing.
Involved in Abend solving and bug fixing and provided technical support for the team.
Designing workflows in informatica ETL and Masking test data using ILM tool.
Big Data Ecosystem: Hadoop, Hive, Flume, MapReduce, Apache Spark, Oozie, Kafka, AWS
Spark Framework: Spark RDD's, SparkSQL, Spark Streaming, Spark Mlib, Apache Kafka & Architecture
ETL Tools : Informatica power center, Talend data integration , CA Test data manager, Informatica Test data
masking,Oracle developer
Languages: Java, Scala, Java,HTML, C, JavaScript, Cobol, JCl, Rexx
TRAINING & CERTIFICATIONS
Software: Apache, Nginx
Database/Server: Tomcat, MySQL, MongoDB,DB2,Oracle 12g
Apache Spark and Scala Certification Training | Edureka | '18
DUTCH langauage A1 level
Certified DB2 Admin | Java Developer- Professional | ‘16

Weitere ähnliche Inhalte

Was ist angesagt?

DataLogicIT - Steve Renalds 2016
DataLogicIT - Steve Renalds 2016DataLogicIT - Steve Renalds 2016
DataLogicIT - Steve Renalds 2016
Steve Renalds
 
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram Parida
 
Maharshi_Amin_416
Maharshi_Amin_416Maharshi_Amin_416
Maharshi_Amin_416
mamin1411
 

Was ist angesagt? (20)

SANTOSH_V
SANTOSH_VSANTOSH_V
SANTOSH_V
 
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Sandeep Grandhi (1)
Sandeep Grandhi (1)Sandeep Grandhi (1)
Sandeep Grandhi (1)
 
DataLogicIT - Steve Renalds 2016
DataLogicIT - Steve Renalds 2016DataLogicIT - Steve Renalds 2016
DataLogicIT - Steve Renalds 2016
 
Richard Clapp Mar 2015 short resume
Richard Clapp Mar 2015 short resumeRichard Clapp Mar 2015 short resume
Richard Clapp Mar 2015 short resume
 
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
 
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum Master
 
Pallavi_Resume
Pallavi_ResumePallavi_Resume
Pallavi_Resume
 
Maharshi_Amin_416
Maharshi_Amin_416Maharshi_Amin_416
Maharshi_Amin_416
 
Mohamed sakr Senior ETL Developer
Mohamed sakr   Senior ETL Developer Mohamed sakr   Senior ETL Developer
Mohamed sakr Senior ETL Developer
 
Cascading: Enterprise Data Workflows based on Functional Programming
Cascading: Enterprise Data Workflows based on Functional ProgrammingCascading: Enterprise Data Workflows based on Functional Programming
Cascading: Enterprise Data Workflows based on Functional Programming
 
James Henry Robinson
James Henry RobinsonJames Henry Robinson
James Henry Robinson
 
Database@Home - Maps and Spatial Analyses: How to use them
Database@Home - Maps and Spatial Analyses: How to use themDatabase@Home - Maps and Spatial Analyses: How to use them
Database@Home - Maps and Spatial Analyses: How to use them
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
 

Ähnlich wie Rajeev kumar apache_spark & scala developer (20)

Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_Resume
 
Shrikanth
ShrikanthShrikanth
Shrikanth
 
Resume_Informatica&IDQ_4+years_of_exp
Resume_Informatica&IDQ_4+years_of_expResume_Informatica&IDQ_4+years_of_exp
Resume_Informatica&IDQ_4+years_of_exp
 
Munir_Database_Developer
Munir_Database_DeveloperMunir_Database_Developer
Munir_Database_Developer
 
ananth_resume
ananth_resumeananth_resume
ananth_resume
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
 
PLSQL - Raymond Wu
PLSQL - Raymond WuPLSQL - Raymond Wu
PLSQL - Raymond Wu
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
YUVAM17_BIGDATA
YUVAM17_BIGDATAYUVAM17_BIGDATA
YUVAM17_BIGDATA
 
Poorna Hadoop
Poorna HadoopPoorna Hadoop
Poorna Hadoop
 
davidson resume
davidson resumedavidson resume
davidson resume
 
Resume_kallesh_latest
Resume_kallesh_latestResume_kallesh_latest
Resume_kallesh_latest
 
SandhyaRani
SandhyaRaniSandhyaRani
SandhyaRani
 
Sanath pabba hadoop resume 1.0
Sanath pabba hadoop resume 1.0Sanath pabba hadoop resume 1.0
Sanath pabba hadoop resume 1.0
 
Veera Narayanaswamy_PLSQL_Profile
Veera Narayanaswamy_PLSQL_ProfileVeera Narayanaswamy_PLSQL_Profile
Veera Narayanaswamy_PLSQL_Profile
 
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
 
SivakumarS
SivakumarSSivakumarS
SivakumarS
 
Jyothi_Ganta_Oracle_BI_Developer
Jyothi_Ganta_Oracle_BI_DeveloperJyothi_Ganta_Oracle_BI_Developer
Jyothi_Ganta_Oracle_BI_Developer
 
Renu_Resume
Renu_ResumeRenu_Resume
Renu_Resume
 

Kürzlich hochgeladen

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Kürzlich hochgeladen (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Rajeev kumar apache_spark & scala developer

  • 1.  zingy.rajeev@gmail.com  +31645503407 Rajeev Kumar Apache Spark and Scala Developer Amsterdam, NL 8+ years experienced & result-oriented ETL Developer , Apache Spark & Scala,Java, possessing a proven track record in software development using Cloudera Hadoop, Apache Spark, Scala,Java and ETL tools, Proficient in processing structured/unstructured data & deploying Apache Spark to analyse huge datasets, identify patterns & gain valuable insights. Demonstrated capability in accomplishing project life cycle management of client server & web applications. Adept at exporting & importing data using Hadoop clusters & implementing in-depth knowledge of web server using Java/J2EE. Highly skilled at data management & capacity planning for end-to-end data management & performance optimization. Expertise in Data analysis, Data Integration, Data Modelling, Data Warehousing, Holds expertise in various ETL and Reporting tools. - Expertise in Teradata, DB2, SQL Server, SQL, PL/SQLa and Big Data domain. - Automated various processes through UNIX - Domain Expertise : Banking, Financial Services, Insurance and Health Care PROFESSIONAL EXPERIENCE Data Engineer  Oct '13 - Present Infosys Ltd. Amsterdam, NL Infosys Limited is an Indian multinational corporation that provides business consulting, information technology and outsourcing services. photo_camera Expertise with the tools in Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper. Implemented spark using scala and spark sql for faster testing and processing of data. Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, OLAP, BI, Client/Server applications. Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle. Strong experience in writing applications using scala swing core libraries. Knowledge on MongoDB No SQL database data modelling, tuning and backup. Strong experience in Dimensional Modeling using Star and Snow Flake Schema, Identifying Facts and Dimensions, Physical and logical data modeling using ERwin and ER-Studio. Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data. Used Scala libraries to process XML data that we store in HDFS. Load the data in Spark RDD and do in memory data computation and generate output response. Involved in converting map reduce program to spark transformation using spark RDD and scala. Developed spark script using REPl and Scala Eclipse IDE. Used Sqoop to extract data from RDBMS databases. Involved in moving log files from generated from different data source to HDFS for further processing through flume. Data sanitation using INFORMATICA Test data management tool and CA Fast data masking tool. Expertise in working with relational databases such as Oracle 11g/10g/9i/8x, SQL Server 2008/2005, DB2 8.0/7.0, UDB, MS Access and Teradata, Hadoop,PrgSql. Strong experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center (Repository Manager, Designer, Workflow Manager, Workflow
  • 2. Analytics & Performance Enhancement Leadership & Team Management Data Management & Reporting Key Achievements Sr. Software Engineer  Feb '10 - Oct '13 Syntel ltd Pune, IN Syntel, Inc. is a U.S.-based multinational provider of integrated technology and business services. Key Achievements Monitor, Metadata Manger), Power Exchange, Power Connect as ETL tool on Oracle, DB2 and SQL Server Databases. Tool creation for test data on demand for test data mining for testing. Test data setup to copy data from production environment to test environment after data sensitization. Using Jenkins continuously improve the test automation code. Wrote automated load unload job creation for data sanitation for mainframe ZOs. Experience in all phases of Data warehouse development from requirements gathering for the data warehouse to develop the code, Unit Testing and Documenting. Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting. Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle10g/9i, MS SQL Server, DB2, Teradata, VSAM files and Flat Files into the staging area, ODS, Data Warehouse and Data Mart. Experience in using Automation Scheduling tools like Autosys and Control-M. Worked extensively with slowly changing dimensions. Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing. Excellent interpersonal and communication skills, and is experienced in working with senior level managers, business people and developers across multiple disciplines. Deployed Scala shell commands to develop Spark scripts in accordance with client requirements Enhanced performance of Spark applications for fixing accurate batch interval time and memory tuning Utilised memory computing of Spark via Scala to perform advanced procedures like text analytics & processing Working with team of 7 Data Scientists & Software Engineers to develop scalable distributed data solutions via Hadoop Coordinated with multiple internal departments to evaluate existing Predictive Models & enhance efficiency Liaised with stakeholders to perform data manipulation, transformation, hypothesis testing & predictive modelling Created Scala/SQL codes to extract data from multiple databases & conceptualised ideas for Advanced Data Analytics Utilised Spark MLLIB Libraries to design recommendation engines & implement data processing & retrieval initiatives Deployed Scala to generate complex client-specific reports, render data analysis & create statistical models Initiated automation & directed performance optimisation to boost traffic by 38% and advertising revenue by 16% Selected out of 3000+ employees to receive the Star Performer of the Year Award '16 for extraordinary performance Hands on experience in Informatica powercenter tool for data integration. Supervised data file objects & sizing, monitored database usage/growth and strategised & executed standby database Maintained logins, DB yield, table space development, user profiles/indexes, storage parameters, etc. Analyzed Log files & conducted Root Cause Analysis to diagnose & resolve 150+ issues Achieved a reduction in data processing time by 78% by designing automated solutions
  • 3. INTERNSHIPS Summer Intern  Jan '09 - Jan '10 S Tel Mumbai, IN S Tel Private Limited was a GSM based cellular operator in India. EDUCATION B.Tech - Electronic and Tele communication  Apr '04 - Jul '08 GH RAI SONI COLLEGE OF ENGINEERING Nagpur, IN PROJECTS PROJECT 1: ABN AMRO Bank Risk data Aggregation project | Amsterdam, '18 till today Client: ABN AMRO Bank Brief: ABN AMRO Bank Risk data Aggregation project successfully demonstrated the use of big data technologies to implement complete, end to end data warehouse and business intelligence operation and usage of EDH(Enterprise data hub) as centralized,unified data source. This is an implementation of existing data ware house application based on Oracle-Db2 into hadoop,enterprise data hub. Apart from EDH development, this project also demonstrate the replacement of existing ETL and reporting operation with Hadoop ETL and reporting tools. Environment: HDFS (for storage), Spark SQL (for transformation), Spark MLlib (for ML), Zeppelin (for visualization) PROJECT 2: On-Demand Test Environments and Data for Development and QA | Amsterdam, '17_18 Client: ABN AMRO Bank Brief: To reduce the bank cost on test environment infrastructure, ABN Amro came up with the project of ondemand creation of test environment using docker hub images. With the just single click database instance of Db2 , Oracle , Mysql database would be created in unix containers. The containers can be started to start the services and destroyed based on need basis Environment: Windows ,Docker hub, Nexus repository , Docker images Worked as a Web Developer & Mobile Application Intern to develop web pages by using scripting languages CGPA: 7 / 10 Overcame challenges of storing & processing structured/semi-structured data via Hadoop Framework & Apache Spark Transferred data into HDFS & deployed Scala to analyze voting patterns across multiple sources and channels Converted semi-structured XML data into a structured format to enable further processing using Apache spark Delivered the output into RDBMS via Sqoop & achieved real-time processing of the website on Python-based server Designed Ooze workflow to automate the entire process of pulling the data from core tables to build aggregate tables that help the business user in decision making. Validating the data between source and destination. Involved in development of an ingestion framework (ELF) to extract data from SQL Server and ingest the source data into Hadoop DataLake using Sqoop. Data mining from json,xml, and csv files using spark. Pulled the docker images from docker hub. Started/Stop the docker containers. Build a Java swing application to automate the docker commands. Exported the right data to containers for development and testing.
  • 4. PROJECT 3: Teradata migration| Amsterdam, '17_18 Client: ABN AMRO Bank Brief: As a part of this project report build in Teradata were migrated from Teradata to Hadoop, as few heavy tables were moved in hive. This includes migration of Teradata procedure to hive queries, along with implementation as well as validation of same in Hadoop platfrom and giving warranty support to the deployed report. Environment: HDFS (for storage), Spark SQL (for transformation), Sqoop, Mainframe,Oracle, Db2, Teradata PROJECT 4: Data Mart Creation| RBS UK, '15_16 Client: Royal Bank of Scotland Brief: Design a system to replay the real time Data mart creation of transactions in various up and down stream systems. Environment: Mainframe,Teradata, Informatica power center, Talend Data integration tool PROJECT 5: : Integrated Date Warehousing (IDW) | Bengaluru, '15 Client: : Gander Mountain | Bengaluru, '15 Brief: Build java swing application to automate the data mining from various data sources using JDBC url based on various user test conditions. Created data pipeline for CI/CD pipeline integration Overcame challenges of storing & processing structured data via Cloudera Hadoop Framework & Apache Spark Understanding of existing stored procedure in teradata and re engineerthat in Hive and Oozie. Transferred data into HDFS & deployed Scala for consistency check across dataframes. Involved in development of an ingestion framework (ELF) to extract data from Teradata and ingest the source data into Hadoop MiniBank using Sqoop. Designed Ooze workflow to automate the entire process of pulling the data from core tables Validating the data between source and destination. Understanding the Business requirements based on Functional specification to design the ETL methodology in technical specifications. Developed data conversion/quality/cleansing rules and executed data cleansing activities such as data Consolidation, standardization, matching Trillium for the unstructured flat file data Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Power Center 8.5 Experience in integration of heterogeneous data sources like Oracle, DB2, SQL Server and Flat Files (Fixed & delimited) into Staging Area. Designed and developed mappings using Source Qualifier, Expression, Lookup, Router, Aggregator, Filter, Sequence Generator, Stored Procedure, Update Strategy, joiner and Rank transformations. Managed the Metadata associated with the ETL processes used to populate the Data Warehouse Implemented complex business rules in Informatica Power Center by creating re-usable transformations, and robust Mapplets. Worked with Functional team to make sure required data has been extracted and loaded and performed the Unit Testing and fixed the errors to meet the requirements. Copied/Exported/Imported the mappings/sessions/ worklets /workflows from development to Test Repository and promoted to Production. Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values Used PMCMD command to automate the Power Center sessions and workflows through UNIX. Consolidating all data marts into an Integrated Date Warehouse (IDW) to reduce costs & increase efficiency Deployed Informatica PowerCenter 10.X as the ETL tool for implementation of ETL processes in the project Overcame challenges w.r.t multiple brand acquisition like silo functioning, data redundancy, multiple data marts, etc. Replaced multiple reporting tools, resolved integration & delivery issues & created a common 'enterprise' data model Resolved issues w.r.t data quality & departmental coordination to reduce cost of maintaining multiple data marts Reduced cost of training multiple teams & deployed a single tool for data analysis & systems integration Minimised data replication across multiple data marts & maintained a single true source of extracting data
  • 5. Environment: Informatica power center,Oracle 11C, Mysql,Db2 V10 PROJECT 6: Android Application development| Bangalore, '13 Client: R N Technologies. Brief: Building App on life and teaching of great saint Swami Vivekakanada Environment: Eclipse Andriod IDE, Andriod operating system PROJECT 7: TDM AND UAT BATCH EXECUTION | Bangalore, '10-13 Client: HUMANA , USA. Environment:JCL, COBOL, DB2, CICS, FILE-AID ,INFOPAC ,VSAM, Rexx, Easytrieve, INFOMATICA, ILM TDM TOOL,JAVA,TOAD Brief: This project involves development of new processes, enhancement to the existing process and operations support using ILM TDM Tool. This project involve data sub setting, data masking and data mining for various platform of Humana health care (Eg. Claims, Authorization, CI-Medicare etc) based on release cycle. Analyzed Business Requirements. Integration Testing. KEY SKILLS • Data Processing • Big Data Analytics • Apache Spark Framework • Scala Programming • AWS • Installation, Configuration & Testing • Product Design & Development • Systems Architecture Support • Client Relationship Managemen •DUTCH langauage A1 level • Project Management • Quality Assurance • Research, Reporting & Documentation • Leadership & Team Management • Strategy • Software Development • Service-oriented Architecture TECHNICAL SKILLS Resolved issues pertaining to multiple vendor database, ETL & reporting platforms Integrated the acquired business data into heterogeneous platforms to consolidate all the data marts into an IDW Created Andriod application bases on life and teaching of Swami Vivekakanada Designed and developed functional specifications and detailed design for new processes/programs. Performed Coding, Unit, Integration and support for User Acceptance testing Designed and developed Test Cases, Test Scripts and Test Plans for base line applications and document the test results. Written and executed oracale, DB2 Queries to extract data from tables. Developed various audit reports in COBOL and Cobol-DB2 used by clients to make decisions critical to the smooth migration of legacy data to APS. Wrote complex SQLs to accomplish various business logics. Executed batch jobs and tested the applications for Developed Batching monitoring tool for batch cycle monitoring. Developing new programs with COBOL, DB2, CICS, MQ-Series and maintaining the existing programs for volume testing. Involved in Abend solving and bug fixing and provided technical support for the team. Designing workflows in informatica ETL and Masking test data using ILM tool. Big Data Ecosystem: Hadoop, Hive, Flume, MapReduce, Apache Spark, Oozie, Kafka, AWS Spark Framework: Spark RDD's, SparkSQL, Spark Streaming, Spark Mlib, Apache Kafka & Architecture ETL Tools : Informatica power center, Talend data integration , CA Test data manager, Informatica Test data masking,Oracle developer Languages: Java, Scala, Java,HTML, C, JavaScript, Cobol, JCl, Rexx
  • 6. TRAINING & CERTIFICATIONS Software: Apache, Nginx Database/Server: Tomcat, MySQL, MongoDB,DB2,Oracle 12g Apache Spark and Scala Certification Training | Edureka | '18 DUTCH langauage A1 level Certified DB2 Admin | Java Developer- Professional | ‘16