SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
DATA EXPERTS 
We accelerate research and transform data to help you create actionable insights 
WE 
MINE 
WE 
ANALYZE 
WE 
VISUALIZE
Data Scientists 
Software Engineers 
Domain Analysts
Data Mining 
Mining longitudinal and linked datasets from web and other archives
Web Data Mining 
Indian Patent Data 
Task in hand was to write a code to harvest Indian Patent Data from Indian Patent Office Website on an ongoing basis 
Code was prepared to extract information on all patents filed between date A and date B defined by user. 
Further, a data quality tool was written to clean data from the harvested pool of data. 
A scheduling code was prepared to run harvesting code followed by data quality code every week (as Indian Patent office publishes patents every week). 
This master scheduler code runs every week and updates all fields of new patents in the master pool of database.
Task in hand was to write a code to harvest latest earning conference calls of all NASDAQ listed firms and parse each call at individual speaker level 
Earning Conference calls were harvested from NASDAQ, Thomson StreetEvents and other sources on a daily basis 
Automation code was written to parse those conference calls at individual speaker level with meta data like speaker name, designation, company and speech text 
A scheduling code was prepared to run harvesting code and parsing code followed by harvesting every day (from NASDAQ) 
This master scheduler code runs every day and updates all fields of new conference calls in the master pool of database. 
Web Data Mining 
Earning Conference Calls
Task in hand was to create a longitudinal data-set containing details of CEO Compensation 
DEF 14-A Proxy Filings at SEC Edgar for each company contains details of how the CEO was compensated and how much 
Automation code was written to parse those proxy filings and pull out total compensation as well as break-up including basic salary, bonus, EPS, etc. 
A team of financial domain analysts manually studied proxy filings and created dummy variables for various kinds of compensation strategies used in practice to compensate CEO/Key Executives 
Finally, the data-set was cleaned up, standardized and sent to quality team for vetting. 
Archive Data Mining 
CEO Compensation
Statistical Modeling 
Curating statistical models or running statistical analysis on real or simulated data
Task in hand was to prepare, implement, and realize statistical models to optimize school transportation and logistics costs for real cities based on real data 
A Professor was appointed by NPO and a government to prepare statistical models aiming at city planning for optimizing school costs 
Our statisticians used clustering techniques, optimization techniques, and multi-processing to minimize total transportation + operational cost on schools for a city. 
Various constraints and difficulties were taken care of like bus capacity, total riding time, mixed loading, land and sea routes, different types of buses and classroom sizes, and many more. 
A graphical user interface with flexibility was delivered to client to view output on maps, spreadsheets, manually change routes, upload input data etc. 
Statistical Modeling - Optimization 
Logistics Optimization
Using Hidden Markov Models to predict various hidden layers of consumer behavior 
Using a hypothesis that consumers happen to have sequential hidden phases of behavior towards a product ecosystem over time, our task was to validate that there are hidden layers and understanding their transitions with time. 
Using data on adoption and rejection of various kinds of product in a product ecosystem by a consumer from a panel data having more than 10,000 products and 100,000 consumers, we prepared a hidden markov model with maximum 8 states and 20 time-periods. 
Computation time for this analysis was of exponential order with each iteration of optimization had to store Gigabytes of data. Thus an Hadoop layer on R was used to reduce time and memory complexity on a single machine using HDFS on five machines. 
Statistical Modeling 
Hidden Markov Model
Data Analysis – Predictive Analysis Predicting Hospital Readmission Rates 
Predicting and Benchmarking Re-admission rates using Re-Admission data and Performance and cause factors. 
With the implementation of Obama care bill hospitals now have a pressing need to significantly reduce their readmission rates or end up paying huge penalties. A comprehensive solution requires collating all readmission data available from various sources and then identifying the causal factors affecting Re-Admission. 
AIC + BIC Analysis was carried out for each disease to remove variables that share insignificant dependence with Re-Admission Rates, leaving us with key variables for each disease. After running Multiple Linear Regression with Re-Admission Rates as Response Variable and other variables as Explanatory Variables, the results met the expectation of common logic. 
We further developed a BI tool using a comprehensive model to successfully predict readmission rates and their dependence on various KPIs.
Big Data Analysis 
Implementing distributed computing methods and cutting edge statistical models to analyze huge chunks of data
Data Analysis – Contextual Analysis 
Sentiment Analysis 
Navigating through millions of tweets to understand sentimental behavior of people 
Millions of tweets for various people were extracted using data harvesting techniques. 
Using Naïve Bayes Analysis and a training set of 100,000 tweets, a supervised classifier was established to decide sentiment of any tweet. This analysis involved prior cleaning of stop words and substituting prolonged chat versions of words with their English word counterparts. 
Sentiments of people were analyzed in various control groups. 
It was not possible to complete this project in a record two month timeframe without use of big data technologies.
Data Analysis – Big Data Analysis 
Analyzing high-frequency Trading data 
Querying high frequency data from limit order book for all NYSE-traded securities 
The project involved analyzing high frequency datasets obtained from NYSE Open Book and Open Book Ultra. 
One of the biggest challenge in analyzing this data was the scale of data, the total size of data was 30 TB and increasing. Performing simple analysis involved querying the database which took several hours at time. 
We distributed the computation using algorithms based on MapReduce Technology and implemented the same using python. This enabled near 
real-time querying. 
This setup was later used for analyzing price variation and impact on market liquidity. The distributed computing approach saved lot of analysis time.
Data Analysis – Prescriptive Analysis 
Personality Analysis 
Personality of Key Executives of various companies was examined using data on text-recorded public speeches of these key executives 
Having access to parsed database of speeches of key executives of NASDAQ traded firms, we performed personality analysis of over 30,000 people (including chief executives and analysts) and 10 million words using pysho- linguistic databases of words and SVM technique. 
Each person was given a score from 1 to 8 in big five personality traits. 
These traits were correlated with following quarters conference calls and patterns were observed in personality of chief executives and their effect on financial (especially stock) performance in the following quarter.
Data Standardization 
Converting raw data into cleaned and standardized formats
Merging company names in financial and patent data-sets 
Task was to merge US financial data-set having 15,000 unique standardized company names and US patent data-set having 800,000+ unstandardized unique company names. 
After a thorough fine-tuning of our hybrid model for company name matching, we reduced those 800,000+ unstandardized unique company names to 200,000+ standardized company names and mapped all 15,000 company names in finance data-set to patent data-set. 
Results were spectacular worth mentioning that we found 105 types of spelling mistakes of IBM which we grouped in one standardized name. 
Data Standardization 
Merging Patent and Finance Data
Standardizing names of plaintiffs and defendants across a litigation data- set containing approximately 4.7 million unique names 
Used benchmark algorithms on Levenshtein & N-gram to perform standardization of names, and implemented them in Python 
Prepared and optimized hybrid parameter models for the above algorithms 
Implemented the above on a five-node cluster using Hadoop Distributed File System technology 
Data Standardization 
Standardizing Names of individuals
Technology Implementation 
Implementing customized mobile and software tools to make research faster, scalable and economical
Developing Mobile and cloud based survey application for researchers for conducting, analyzing and reporting surveys 
Technology Implementation 
Developing Mobile Applications 
The problem of capturing data in surveys, digitizing results, performing analyses was a long drawn error prone manual process. 
We worked with researchers at the university to develop an android application for collecting data on the smartphones on the on ground surveyors. 
The result – no skip logics, no digitization, audio and video responses, an analysis dashboard, GPS location reliability, and all within a click of a button.
Building a platform to assist in collecting, tracking, and analysis of large data-sets 
Technology Implementation 
Developing Desktop Applications 
Developed a login based offline .NET framework technology which collects survey data, syncs local data with cloud, and allows users to analyze data, all on a continual basis using the most reliable and scalable cloud-based database technology, Google SQL 
Considering physical constraints like electricity and internet shortages, our team developed a technology which can be installed at any remote terminal where they would operate their data collection activities 
It was made sure that all local copies of the data is kept in their local machines until an internet connection is activated which then syncs data to a cloud database which can be accessed by client’s core team members from anywhere any time
Programmed the mail-engine to send and monitor distributed email campaigns 
Technology Implementation 
Developing Email Delivery and Analytics Engine 
One of the major challenges faced by the researcher was the lack of a comprehensive solution to disseminate information, to the defined target segment. Conventional programs were cluttered, repeatedly hit by delivery failures and could not provide the key insights. 
The application was hosted on appengine with a scalable infrastructure to enable real-time monitoring and delivery, whereas the database was centrally managed on Google Cloud SQL 
The application was programmed using distributed loading techniques to send more than 1 million emails per day.
Data Visualization 
Creating customized visualization for your data-sets to tell a compelling story
Data Visualization 
Industry Turbulence
Data Visualization 
Countries in Motion
Data Visualization 
Web Interactive Customizable World Maps
Clients 
Researchers from
Contact 
We like challenges 
Keep on throwing questions and hear from our engagement managers on how we can help. 
Write to us at info@innovaccer.com 
Or have a look at us www.innovaccer.com 
Or Call us +91 120 431 1139

Weitere ähnliche Inhalte

Was ist angesagt?

An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
 
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17redpel dot com
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentAlexander Decker
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlationMahdi Sayyad
 
11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data miningAlexander Decker
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsCambridge Semantics
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLJongwook Woo
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...riyaniaes
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformSavita Yadav
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
A Case Study of Innovation of an Information Communication System and Upgrade...
A Case Study of Innovation of an Information Communication System and Upgrade...A Case Study of Innovation of an Information Communication System and Upgrade...
A Case Study of Innovation of an Information Communication System and Upgrade...gerogepatton
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceCambridge Semantics
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill MapR Technologies
 

Was ist angesagt? (19)

An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream Clustering
 
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlation
 
11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
A Case Study of Innovation of an Information Communication System and Upgrade...
A Case Study of Innovation of an Information Communication System and Upgrade...A Case Study of Innovation of an Information Communication System and Upgrade...
A Case Study of Innovation of an Information Communication System and Upgrade...
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
FC Brochure & Insert
FC Brochure & InsertFC Brochure & Insert
FC Brochure & Insert
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 

Andere mochten auch

Population Health Management 101
Population Health Management 101Population Health Management 101
Population Health Management 101Innovaccer
 
The analytics journey to population health management
The analytics journey to population health managementThe analytics journey to population health management
The analytics journey to population health managementIBM Analytics
 
Population Health Management: Where are YOU?
Population Health Management: Where are YOU?Population Health Management: Where are YOU?
Population Health Management: Where are YOU?Phytel
 
Population Health Management Case Studies
Population Health Management Case StudiesPopulation Health Management Case Studies
Population Health Management Case StudiesPhytel
 
Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...
Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...
Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...Phytel
 
HIMSS 2016 "Predictive Analytics & Genomics in Population Health Management
HIMSS 2016 "Predictive Analytics & Genomics in Population Health ManagementHIMSS 2016 "Predictive Analytics & Genomics in Population Health Management
HIMSS 2016 "Predictive Analytics & Genomics in Population Health ManagementDaniel F. Hoemke
 
Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...
Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...
Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...Epstein Becker Green
 
Population Health Management
Population Health ManagementPopulation Health Management
Population Health ManagementVitreosHealth
 
Population Health Management - Angus McCann
Population Health Management - Angus McCannPopulation Health Management - Angus McCann
Population Health Management - Angus McCannNapier University
 
Harnessing Population Health Management to Promote Quality Improvement in Hea...
Harnessing Population Health Management to Promote Quality Improvement in Hea...Harnessing Population Health Management to Promote Quality Improvement in Hea...
Harnessing Population Health Management to Promote Quality Improvement in Hea...Queena Deschene, RCFE
 
Best Practices in Implementing Population Health
Best Practices in Implementing Population Health Best Practices in Implementing Population Health
Best Practices in Implementing Population Health Health Catalyst
 
The 6 Critical Components of Population Health
The 6 Critical Components of Population HealthThe 6 Critical Components of Population Health
The 6 Critical Components of Population HealthHealth Catalyst
 
What Is Population Health And How Does It Compare to Public Health
What Is Population Health And How Does It Compare to Public HealthWhat Is Population Health And How Does It Compare to Public Health
What Is Population Health And How Does It Compare to Public HealthHealth Catalyst
 
The Formula for Optimizing the Value-Based Healthcare Equation
The Formula for Optimizing the Value-Based Healthcare EquationThe Formula for Optimizing the Value-Based Healthcare Equation
The Formula for Optimizing the Value-Based Healthcare EquationHealth Catalyst
 

Andere mochten auch (15)

Population Health Management 101
Population Health Management 101Population Health Management 101
Population Health Management 101
 
The analytics journey to population health management
The analytics journey to population health managementThe analytics journey to population health management
The analytics journey to population health management
 
Population Health Management: Where are YOU?
Population Health Management: Where are YOU?Population Health Management: Where are YOU?
Population Health Management: Where are YOU?
 
Population Health Management Case Studies
Population Health Management Case StudiesPopulation Health Management Case Studies
Population Health Management Case Studies
 
Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...
Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...
Using Patient Registries and Evidence-Based Guidelines to Overcome Declining ...
 
Population Health Management
Population Health ManagementPopulation Health Management
Population Health Management
 
HIMSS 2016 "Predictive Analytics & Genomics in Population Health Management
HIMSS 2016 "Predictive Analytics & Genomics in Population Health ManagementHIMSS 2016 "Predictive Analytics & Genomics in Population Health Management
HIMSS 2016 "Predictive Analytics & Genomics in Population Health Management
 
Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...
Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...
Integrating Behavioral Health into Primary Care – Thought Leaders in Populati...
 
Population Health Management
Population Health ManagementPopulation Health Management
Population Health Management
 
Population Health Management - Angus McCann
Population Health Management - Angus McCannPopulation Health Management - Angus McCann
Population Health Management - Angus McCann
 
Harnessing Population Health Management to Promote Quality Improvement in Hea...
Harnessing Population Health Management to Promote Quality Improvement in Hea...Harnessing Population Health Management to Promote Quality Improvement in Hea...
Harnessing Population Health Management to Promote Quality Improvement in Hea...
 
Best Practices in Implementing Population Health
Best Practices in Implementing Population Health Best Practices in Implementing Population Health
Best Practices in Implementing Population Health
 
The 6 Critical Components of Population Health
The 6 Critical Components of Population HealthThe 6 Critical Components of Population Health
The 6 Critical Components of Population Health
 
What Is Population Health And How Does It Compare to Public Health
What Is Population Health And How Does It Compare to Public HealthWhat Is Population Health And How Does It Compare to Public Health
What Is Population Health And How Does It Compare to Public Health
 
The Formula for Optimizing the Value-Based Healthcare Equation
The Formula for Optimizing the Value-Based Healthcare EquationThe Formula for Optimizing the Value-Based Healthcare Equation
The Formula for Optimizing the Value-Based Healthcare Equation
 

Ähnlich wie Innovaccer service capabilities with case studies

GERSIS INDUSTRY CASES
GERSIS INDUSTRY CASESGERSIS INDUSTRY CASES
GERSIS INDUSTRY CASESSergej Markov
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative DataDatabricks
 
Syed babar resume
Syed babar resumeSyed babar resume
Syed babar resumesyedbabar9
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptxamitparashar42
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Resume_Partha_Data Consultant_23_July_2016
Resume_Partha_Data Consultant_23_July_2016Resume_Partha_Data Consultant_23_July_2016
Resume_Partha_Data Consultant_23_July_2016Partha Sarathi Pattnaik
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureA Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureFlurry, Inc.
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Martin Gramlich Resume - Latest Feb 2016
Martin Gramlich Resume - Latest Feb 2016Martin Gramlich Resume - Latest Feb 2016
Martin Gramlich Resume - Latest Feb 2016Martin Gramlich
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of DataDaniel Saito
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resumevenkata sateeshs
 
Kumar Godasi - Resume
Kumar Godasi - ResumeKumar Godasi - Resume
Kumar Godasi - ResumeKumar Godasi
 
PeterBarnumGenResume
PeterBarnumGenResumePeterBarnumGenResume
PeterBarnumGenResumePeter Barnum
 

Ähnlich wie Innovaccer service capabilities with case studies (20)

GERSIS INDUSTRY CASES
GERSIS INDUSTRY CASESGERSIS INDUSTRY CASES
GERSIS INDUSTRY CASES
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative Data
 
Syed babar resume
Syed babar resumeSyed babar resume
Syed babar resume
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Resume_Partha_Data Consultant_23_July_2016
Resume_Partha_Data Consultant_23_July_2016Resume_Partha_Data Consultant_23_July_2016
Resume_Partha_Data Consultant_23_July_2016
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureA Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning Architecture
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Martin Gramlich Resume - Latest Feb 2016
Martin Gramlich Resume - Latest Feb 2016Martin Gramlich Resume - Latest Feb 2016
Martin Gramlich Resume - Latest Feb 2016
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of Data
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resume
 
Kumar Godasi - Resume
Kumar Godasi - ResumeKumar Godasi - Resume
Kumar Godasi - Resume
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
PeterBarnumGenResume
PeterBarnumGenResumePeterBarnumGenResume
PeterBarnumGenResume
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Innovaccer service capabilities with case studies

  • 1. DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE
  • 2. Data Scientists Software Engineers Domain Analysts
  • 3. Data Mining Mining longitudinal and linked datasets from web and other archives
  • 4. Web Data Mining Indian Patent Data Task in hand was to write a code to harvest Indian Patent Data from Indian Patent Office Website on an ongoing basis Code was prepared to extract information on all patents filed between date A and date B defined by user. Further, a data quality tool was written to clean data from the harvested pool of data. A scheduling code was prepared to run harvesting code followed by data quality code every week (as Indian Patent office publishes patents every week). This master scheduler code runs every week and updates all fields of new patents in the master pool of database.
  • 5. Task in hand was to write a code to harvest latest earning conference calls of all NASDAQ listed firms and parse each call at individual speaker level Earning Conference calls were harvested from NASDAQ, Thomson StreetEvents and other sources on a daily basis Automation code was written to parse those conference calls at individual speaker level with meta data like speaker name, designation, company and speech text A scheduling code was prepared to run harvesting code and parsing code followed by harvesting every day (from NASDAQ) This master scheduler code runs every day and updates all fields of new conference calls in the master pool of database. Web Data Mining Earning Conference Calls
  • 6. Task in hand was to create a longitudinal data-set containing details of CEO Compensation DEF 14-A Proxy Filings at SEC Edgar for each company contains details of how the CEO was compensated and how much Automation code was written to parse those proxy filings and pull out total compensation as well as break-up including basic salary, bonus, EPS, etc. A team of financial domain analysts manually studied proxy filings and created dummy variables for various kinds of compensation strategies used in practice to compensate CEO/Key Executives Finally, the data-set was cleaned up, standardized and sent to quality team for vetting. Archive Data Mining CEO Compensation
  • 7. Statistical Modeling Curating statistical models or running statistical analysis on real or simulated data
  • 8. Task in hand was to prepare, implement, and realize statistical models to optimize school transportation and logistics costs for real cities based on real data A Professor was appointed by NPO and a government to prepare statistical models aiming at city planning for optimizing school costs Our statisticians used clustering techniques, optimization techniques, and multi-processing to minimize total transportation + operational cost on schools for a city. Various constraints and difficulties were taken care of like bus capacity, total riding time, mixed loading, land and sea routes, different types of buses and classroom sizes, and many more. A graphical user interface with flexibility was delivered to client to view output on maps, spreadsheets, manually change routes, upload input data etc. Statistical Modeling - Optimization Logistics Optimization
  • 9. Using Hidden Markov Models to predict various hidden layers of consumer behavior Using a hypothesis that consumers happen to have sequential hidden phases of behavior towards a product ecosystem over time, our task was to validate that there are hidden layers and understanding their transitions with time. Using data on adoption and rejection of various kinds of product in a product ecosystem by a consumer from a panel data having more than 10,000 products and 100,000 consumers, we prepared a hidden markov model with maximum 8 states and 20 time-periods. Computation time for this analysis was of exponential order with each iteration of optimization had to store Gigabytes of data. Thus an Hadoop layer on R was used to reduce time and memory complexity on a single machine using HDFS on five machines. Statistical Modeling Hidden Markov Model
  • 10. Data Analysis – Predictive Analysis Predicting Hospital Readmission Rates Predicting and Benchmarking Re-admission rates using Re-Admission data and Performance and cause factors. With the implementation of Obama care bill hospitals now have a pressing need to significantly reduce their readmission rates or end up paying huge penalties. A comprehensive solution requires collating all readmission data available from various sources and then identifying the causal factors affecting Re-Admission. AIC + BIC Analysis was carried out for each disease to remove variables that share insignificant dependence with Re-Admission Rates, leaving us with key variables for each disease. After running Multiple Linear Regression with Re-Admission Rates as Response Variable and other variables as Explanatory Variables, the results met the expectation of common logic. We further developed a BI tool using a comprehensive model to successfully predict readmission rates and their dependence on various KPIs.
  • 11. Big Data Analysis Implementing distributed computing methods and cutting edge statistical models to analyze huge chunks of data
  • 12. Data Analysis – Contextual Analysis Sentiment Analysis Navigating through millions of tweets to understand sentimental behavior of people Millions of tweets for various people were extracted using data harvesting techniques. Using Naïve Bayes Analysis and a training set of 100,000 tweets, a supervised classifier was established to decide sentiment of any tweet. This analysis involved prior cleaning of stop words and substituting prolonged chat versions of words with their English word counterparts. Sentiments of people were analyzed in various control groups. It was not possible to complete this project in a record two month timeframe without use of big data technologies.
  • 13. Data Analysis – Big Data Analysis Analyzing high-frequency Trading data Querying high frequency data from limit order book for all NYSE-traded securities The project involved analyzing high frequency datasets obtained from NYSE Open Book and Open Book Ultra. One of the biggest challenge in analyzing this data was the scale of data, the total size of data was 30 TB and increasing. Performing simple analysis involved querying the database which took several hours at time. We distributed the computation using algorithms based on MapReduce Technology and implemented the same using python. This enabled near real-time querying. This setup was later used for analyzing price variation and impact on market liquidity. The distributed computing approach saved lot of analysis time.
  • 14. Data Analysis – Prescriptive Analysis Personality Analysis Personality of Key Executives of various companies was examined using data on text-recorded public speeches of these key executives Having access to parsed database of speeches of key executives of NASDAQ traded firms, we performed personality analysis of over 30,000 people (including chief executives and analysts) and 10 million words using pysho- linguistic databases of words and SVM technique. Each person was given a score from 1 to 8 in big five personality traits. These traits were correlated with following quarters conference calls and patterns were observed in personality of chief executives and their effect on financial (especially stock) performance in the following quarter.
  • 15. Data Standardization Converting raw data into cleaned and standardized formats
  • 16. Merging company names in financial and patent data-sets Task was to merge US financial data-set having 15,000 unique standardized company names and US patent data-set having 800,000+ unstandardized unique company names. After a thorough fine-tuning of our hybrid model for company name matching, we reduced those 800,000+ unstandardized unique company names to 200,000+ standardized company names and mapped all 15,000 company names in finance data-set to patent data-set. Results were spectacular worth mentioning that we found 105 types of spelling mistakes of IBM which we grouped in one standardized name. Data Standardization Merging Patent and Finance Data
  • 17. Standardizing names of plaintiffs and defendants across a litigation data- set containing approximately 4.7 million unique names Used benchmark algorithms on Levenshtein & N-gram to perform standardization of names, and implemented them in Python Prepared and optimized hybrid parameter models for the above algorithms Implemented the above on a five-node cluster using Hadoop Distributed File System technology Data Standardization Standardizing Names of individuals
  • 18. Technology Implementation Implementing customized mobile and software tools to make research faster, scalable and economical
  • 19. Developing Mobile and cloud based survey application for researchers for conducting, analyzing and reporting surveys Technology Implementation Developing Mobile Applications The problem of capturing data in surveys, digitizing results, performing analyses was a long drawn error prone manual process. We worked with researchers at the university to develop an android application for collecting data on the smartphones on the on ground surveyors. The result – no skip logics, no digitization, audio and video responses, an analysis dashboard, GPS location reliability, and all within a click of a button.
  • 20. Building a platform to assist in collecting, tracking, and analysis of large data-sets Technology Implementation Developing Desktop Applications Developed a login based offline .NET framework technology which collects survey data, syncs local data with cloud, and allows users to analyze data, all on a continual basis using the most reliable and scalable cloud-based database technology, Google SQL Considering physical constraints like electricity and internet shortages, our team developed a technology which can be installed at any remote terminal where they would operate their data collection activities It was made sure that all local copies of the data is kept in their local machines until an internet connection is activated which then syncs data to a cloud database which can be accessed by client’s core team members from anywhere any time
  • 21. Programmed the mail-engine to send and monitor distributed email campaigns Technology Implementation Developing Email Delivery and Analytics Engine One of the major challenges faced by the researcher was the lack of a comprehensive solution to disseminate information, to the defined target segment. Conventional programs were cluttered, repeatedly hit by delivery failures and could not provide the key insights. The application was hosted on appengine with a scalable infrastructure to enable real-time monitoring and delivery, whereas the database was centrally managed on Google Cloud SQL The application was programmed using distributed loading techniques to send more than 1 million emails per day.
  • 22. Data Visualization Creating customized visualization for your data-sets to tell a compelling story
  • 25. Data Visualization Web Interactive Customizable World Maps
  • 27. Contact We like challenges Keep on throwing questions and hear from our engagement managers on how we can help. Write to us at info@innovaccer.com Or have a look at us www.innovaccer.com Or Call us +91 120 431 1139