SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Experience of big data analytics:
1. Data analysis for students’ performance and behaviors
To collect students’ performance and behavior data, I have designed and developed a service
oriented system in PHP and Flex. Teaching staff can manage question storage and start a test for
the students. The joined students access the system via Web browsers and mobile devices (such as,
iOS devices and Android devices). The system can be utilised during classes and after classes in
order to test students’ performance. I have been involved in the design and development of the
whole service oriented system.
Currently, the system is using MySQL as the data storage. After a test or some period of time,
the new students’ scores and behavior data (such as, each question answering time and
modification time) are extracted from MySQL and saved to csv files. I use MATLAB and R
programming languages to perform data visualization (such as, create histograms, density plots,
etc.) to analyze the relationship between these features and their scores. Additionally, I utilise
Python programming language to implement similarity computation algorithms (such as,
Euclidean distance and Pearson correlation coefficient) to compute similarity between students. At
the same time, K-Means clustering approach is employed to analyze clusters of the students. From
the analysed results, students can be divided into several clusters with similar scores and
behaviors. Since the questions in the system are designed for different knowledge categories and
each category has more than one question, teachers can create teaching strategies to guide the
students with similar performance. To test the students and realize whether they have learnt the
knowledge that they have failed to answer, similar questions answered by similar students in the
same cluster can be automatically recommended by the system. I have finished the service in
Python.
2. Data analysis for home automation
A prototype of home automation is implemented with 51 single chip microcomputers.
Temperature, humidity, ultrasonic wave and light sensors connected to the single chip
microcomputers automatically collect environment data sent to a service implemented in PHP in
real-time via HTTP protocol. The service can send the data to HBase via Thrift deployed on
OpenStack. I am mainly responsible for the implementation of the service, data storage and setup
of Hadoop, HBase and OpenStack.
I have implemented a naïve Bayes classifier in Python. I divide the data in HBase into two
categories: one is for training the classifier and the other is for testing. I have set up some
categories and they can separately invoke different services for the response to humans. After
training of the classifier, it can predict possible services that individuals may need.
3. Data analysis for train derailment
Data sets about train derailment can be collected from some companies. However, sometimes
they are HTML and XML formats. Therefore, I use universal feed parser and beautiful soap,
packages of Python, to extract valuable data and save them into a csv or txt file. Additionally, train
derailment may be related with weather information. I have downloaded open data sets from some
Websites (such as, Met Office, etc.) describing past UK climates.

Weitere ähnliche Inhalte

Ähnlich wie PhD experience and skills

Multi-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data EnvironmentMulti-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data Environment
IJCSIS Research Publications
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
Ricky Bilakhia
 
NEXT- A System for Real-World Development, Evaluation, and Application of Act...
NEXT- A System for Real-World Development, Evaluation, and Application of Act...NEXT- A System for Real-World Development, Evaluation, and Application of Act...
NEXT- A System for Real-World Development, Evaluation, and Application of Act...
Nicholas Glattard
 
Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...
TELKOMNIKA JOURNAL
 
Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labeling
eSAT Publishing House
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Mumbai Academisc
 

Ähnlich wie PhD experience and skills (20)

Multi-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data EnvironmentMulti-Tier Sentiment Analysis System in Big Data Environment
Multi-Tier Sentiment Analysis System in Big Data Environment
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Skills_Details
Skills_DetailsSkills_Details
Skills_Details
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
 
Paper ijert
Paper ijertPaper ijert
Paper ijert
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
NEXT- A System for Real-World Development, Evaluation, and Application of Act...
NEXT- A System for Real-World Development, Evaluation, and Application of Act...NEXT- A System for Real-World Development, Evaluation, and Application of Act...
NEXT- A System for Real-World Development, Evaluation, and Application of Act...
 
E0322035037
E0322035037E0322035037
E0322035037
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
 
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATIONUSING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
 
Open domain question answering system using semantic role labeling
Open domain question answering system using semantic role labelingOpen domain question answering system using semantic role labeling
Open domain question answering system using semantic role labeling
 
Patient-Like-Mine
Patient-Like-MinePatient-Like-Mine
Patient-Like-Mine
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureA Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning Architecture
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 
Phan cl-data scientist-1 july-2016
Phan cl-data scientist-1 july-2016Phan cl-data scientist-1 july-2016
Phan cl-data scientist-1 july-2016
 

PhD experience and skills

  • 1. Experience of big data analytics: 1. Data analysis for students’ performance and behaviors To collect students’ performance and behavior data, I have designed and developed a service oriented system in PHP and Flex. Teaching staff can manage question storage and start a test for the students. The joined students access the system via Web browsers and mobile devices (such as, iOS devices and Android devices). The system can be utilised during classes and after classes in order to test students’ performance. I have been involved in the design and development of the whole service oriented system. Currently, the system is using MySQL as the data storage. After a test or some period of time, the new students’ scores and behavior data (such as, each question answering time and modification time) are extracted from MySQL and saved to csv files. I use MATLAB and R programming languages to perform data visualization (such as, create histograms, density plots, etc.) to analyze the relationship between these features and their scores. Additionally, I utilise Python programming language to implement similarity computation algorithms (such as, Euclidean distance and Pearson correlation coefficient) to compute similarity between students. At the same time, K-Means clustering approach is employed to analyze clusters of the students. From the analysed results, students can be divided into several clusters with similar scores and behaviors. Since the questions in the system are designed for different knowledge categories and each category has more than one question, teachers can create teaching strategies to guide the students with similar performance. To test the students and realize whether they have learnt the knowledge that they have failed to answer, similar questions answered by similar students in the same cluster can be automatically recommended by the system. I have finished the service in Python. 2. Data analysis for home automation A prototype of home automation is implemented with 51 single chip microcomputers. Temperature, humidity, ultrasonic wave and light sensors connected to the single chip microcomputers automatically collect environment data sent to a service implemented in PHP in real-time via HTTP protocol. The service can send the data to HBase via Thrift deployed on OpenStack. I am mainly responsible for the implementation of the service, data storage and setup of Hadoop, HBase and OpenStack. I have implemented a naïve Bayes classifier in Python. I divide the data in HBase into two categories: one is for training the classifier and the other is for testing. I have set up some categories and they can separately invoke different services for the response to humans. After training of the classifier, it can predict possible services that individuals may need. 3. Data analysis for train derailment Data sets about train derailment can be collected from some companies. However, sometimes they are HTML and XML formats. Therefore, I use universal feed parser and beautiful soap, packages of Python, to extract valuable data and save them into a csv or txt file. Additionally, train derailment may be related with weather information. I have downloaded open data sets from some Websites (such as, Met Office, etc.) describing past UK climates.